Special Edition Using HTML 4

Previous chapterNext chapterContents


- 39 -
Verifying and Testing HTML Documents

by Jerry Honeycutt and Mark R. Brown

Validating Your HTML Documents

In the world of software development, programs are built in essentially three phases: design, programming, and testing. The purpose of each phase is self-explanatory.

When building HTML documents, you probably work with the same sort of phases. You design your HTML document, even if you just make a mental note of the document's general layout, before you begin working on it. Then you implement the HTML document by writing individual lines of HTML code. Finally, you test your Web pages to make sure they work as you planned and that they're syntactically correct.

Using tools that make that last phase, testing and verification, more productive is the topic of this section. You can find various tools on the Internet to help you test your HTML files. Some of the tools you learn about in this section verify the syntax (form) of your HTML files. Other tools in this section just verify the links in your HTML files. Regardless of their function, these tools enable you to verify that Web browsers can understand your HTML files and that your files provide a positive experience to the user.

Doctor HTML

Doctor HTML is a verification service that analyzes the contents of a Web page. For example, you can use it to spell-check a Web page, verify the syntax of the HTML in a Web page, or even check a Web page for broken links. You can also use Doctor HTML to verify your entire Web site, but site verification is a commercial service to which you must subscribe.

Doctor HTML isn't a program that you download onto your computer before using. It's a service (implemented as a CGI script) that you access on the Web at http://www2.imagiware.com/rRxHTML/. Click Single Page Analysis in the left frame to see the Web page shown in Figure 39.1. Table 39.1 describes each test shown in the figure.

Table 39.1  Doctor HTML Tests

Test Description
Spelling Removes the tags and accented text from the HTML file and scans it for spelling errors.
Image Analysis Loads all the images to which the HTML file is linked, then determines the bandwidth required by each image and reports any images that require excessive download time. In addition, Doctor HTML reports the size of each image, as well as the number of colors.
Document Structure Tests the structure of the HTML file, including unclosed HTML tags.
Image Syntax Makes sure you're using the HEIGHT, WIDTH, and ALT attributes within the IMG tag. These attributes give the browser hints that help it load the HTML document faster.
Table Structure Checks the structure of each table in the HTML document and looks for any unclosed TR, TH, and TD tags.
Verify Hyperlinks Reports each invalid link that your HTML file contains. Just because Doctor HTML reports a link as being "dead" doesn't mean that the link is invalid; the server may be running slowly.
Form Structure Verifies the structure of each form in the HTML file. It looks only at INPUT tags.
Show Commands Displays an indented list of HTML commands that shows the structure of your HTML document.

FIG. 39.1
You can order all tests by selecting Do All Tests or order individual tests by selecting Select From List Below.

Using Doctor HTML is straightforward. Type the URL of the Web page you want to verify in the URL field. To specify the individual tests you want to run, choose Select From List Below, select each test, and then click Go. Figure 39.2 shows the results of the tests I performed on my home page.


NOTE: Doctor HTML looks only at Web pages. You provide the URL of a Web page or site that you want Doctor HTML to check, and it analyzes the files it finds. You can't use Doctor HTML to verify small bits of HTML that you type in a form or HTML files contained in your local file system.

FIG. 39.2
Doctor HTML's output is easy to read.

Verifying Links Within a Web Page  One of the best uses for Doctor HTML is to verify the links that your Web page contains. Here's how to perform the test:

1. Type the URL of your Web page in the URL text box.

2. Choose Long.

3. Choose Select From List Below.

4. Select Verify Hyperlinks from the list of tests to run.

Figure 39.3 shows sample output from Doctor HTML. The table lists each link found in the Web page. For each link it identifies the link's URL; the type, size, and change date of the file to which the link points; the line numbers on which the link is used; and any additional comments regarding the link.


NOTE: Doctor HTML doesn't verify the links contained within image maps.

Checking the Performance of Your Images  One of the biggest user complaints is the time required to download Web pages that contain many images. You can get a realistic view of how long a typical user will spend downloading your Web page by using Doctor HTML to check the download time for each image on the page. Here's how:

1. Type the URL of your Web page in the URL text box.

2. Choose Long.

3. Choose Select From List Below.

4. Select Image Analysis from the list of tests to run.

FIG. 39.3
Doctor HTML lists only those links for which it finds warnings or errors.

You'll see results similar to Figure 39.4. The most important column to note is the Time column, which indicates how long the image takes to download using a 14.4K modem.

FIG. 39.4
The summary indicates the total download time for all of the images on the Web page.

Weblint

Like Doctor HTML, Weblint is a Web-based HTML verification service. It checks the syntax and style of the Web page to which you point it. Some of the things that Weblint checks are:

You can access Weblint from various gateways (Web pages that provide access to Weblint). Each gateway might provide a form you use to point Weblint to your Web page and set the options you want to use with Weblint. Note that the gateways do not necessarily provide identical forms; some forms are quite complex, whereas others ask only for the Web page's URL. Here's a list of Weblint gateways:

http://www.fal.de/cgi-bin/WeblintGateway

http://online.anu.edu.au/CNIP/weblint/weblint.html

http://www.cen.uiuc.edu/cgi-bin/weblint

http://www.ts.umu.se/~grape/weblint.html

http://www.netspot.unisa.edu.au/weblint/

http://www.unipress.com/cgi-bin/WWWeblint


TIP: The most comprehensive Weblint gateway is at http://www.fal.de/cgi-bin/WeblintGateway. This site enables you to configure Weblint exactly as you want.

Using the Fal Weblint Gateway  Figure 39.5 shows the gateway at http://www.gal.de/cgi-bin/WeblintGateway, known as the Fal Weblint gateway. Type the URL of the Web page in URL, select the options you want, and click Check HTML.


NOTE: If you specify the path to the root of a home page when you open it in your Web browser, the Web server automatically opens an HTML file named INDEX.HTM. The validation services require you to explicitly specify the filename, however, as they won't look for INDEX.HTM on their own.

Using the UniPress WWWeblint Gateway  Figure 39.6 shows a much simpler gateway. The UniPress WWWeblint gateway (Unipress is the author of Weblint) at http://www.unipress.com/cgi-bin/WWWeblint is a very simple form that only collects the URL of the Web page. After providing the URL, click Check It to verify the Web page.

FIG. 39.5
Click Simple if you want to use a version of the Fal Weblint gateway that provides fewer options.

FIG. 39.6
In contrast to the Fal Weblint gateway, the UniPress WWWeblint gateway doesn't support Internet Explorer extensions.

Figure 39.7 shows sample output from Weblint. The top portion of the output lists any warnings and errors Weblint found in the Web page. The bottom portion is a formatted listing of the HTML. The listing is formatted so that the structure of the HTML and the URLs in the HTML are easy to identify.

FIG. 39.7
In addition to listing warnings and errors separately, Weblint embeds them within the formatted listing.


TIP: You can use Netscape's source viewer to get a better view of the format of an HTML document. Choose View, Source from Netscape's main menu. The viewer will highlight the tags and attributes, as well as each URL, contained in the HTML file.

WebTechs

The WebTechs Validation Service checks the conformance of one or more Web pages to the HTML standards you choose. You can also give WebTechs a fragment of HTML to validate by typing it directly into the form. You'll find WebTechs at http://www.webtechs.com/html-val-svc/ (see Figure 39.8).

Manually Submitting a Web Page to WebTechs  To submit your Web pages to WebTechs for validation, select the level of conformance at the top of the form, type a list of URLs in the space provided, and click Submit URLs for Validation. Figure 39.9 shows some sample output.

Automatically Submitting a Web Page to WebTechs  You don't have to visit the WebTechs Web site to submit a Web page for validation. You can add a button to the bottom of a Web page that automatically submits that Web page for validation. This approach is particularly handy if you're working on a Web site and frequently submitting it for validation. Add the form (see Listing 39.1) to the end of your Web page. Then anytime you want to validate the page, click Submit for Validation. Note that you must change the URL pointed to by the URL's input element to point to the Web page that contains it. You might also change the value of the level field to that of your browser.

FIG. 39.8
While visiting this site, check out Web Apps Magazine, an online magazine for Web professionals.

FIG. 39.9
For a better understanding of the WebTechs output, see the FAQ at
http://www.cs.duke.edu/~dsb/wt-faq.html.

Listing 39.1  Form to automatically submit a URL to WebTechs.

<FORM METHOD=POST ACTION="http://www.webtechs.com/cgi-bin/html-check.pl">
<INPUT TYPE=HIDDEN NAME="recommended" VALUE=0>
<INPUT TYPE=HIDDEN NAME="level" VALUE="IE3.0">
<INPUT TYPE=HIDDEN NAME="input" VALUE=1>
<INPUT TYPE=HIDDEN NAME="esis" VALUE=0>
<INPUT TYPE=HIDDEN NAME="render" VALUE=0>
<INPUT TYPE=HIDDEN NAME="URLs"  VALUE="http://rampages.onramp.net/~jerry/index.htm">
<INPUT TYPE=SUBMIT VALUE="Validate this URL">
</FORM>

Here's what each input value contains:
recommended 0 = standard; 1 = strict
level See Table 39.2
input 1 = show input; 0 = don't show input
esis 1 = show parser input; 0 = don't show parser output
render 1 = render HTML; 0 = don't render HTML
URLs URL to submit for verification

Table 39.2  Values for the Level Input Element

Value Description
2 Level 2
3 Level 3
Wilbur Level 3.2 Wilbur
Cougar Level 3.2 Cougar
Mozilla Mozilla (Netscape)
SQ SoftQuad
AdvaSoft AdvaSoft
IE Microsoft IE
IE3.0 Microsoft IE 3.0 Beta

Other Verification Services

You'll also find a handful of other useful verification services on the Web. None of the services described in this section are as comprehensive as the services you learned about earlier. Nevertheless, each provides some sort of unique or useful verification service.

For example, you can use URL-Minder to catch changes to URLs that your Web page references. In addition, the Slovenian HTMLchek is a decent alternative to the other validation services if you're having trouble connecting to them.

Slovenian HTMLchek  HTMLchek is a verification service created at the University of Texas at Austin. The online version is available on the Web at http://www.ijs.si/cgi-bin/htmlchek (see Figure 39.10). HTMLchek does just about the same thing as Weblint, but its output is considerably harder to read and understand.

FIG. 39.10
HTMLchek hasn't been updated in a while; it doesn't provide support for HTML 3.2 or other browser extensions.

U.S.M.A. (West Point)  Figure 39.11 shows the U.S. Military Academy's verification service, called HTMLverify (http://cgi.usma.edu/cgi-bin/HTMLverify). You can specify an URL for HTMLverify to test, or you can type some HTML in HTML Source. Click Verify to start the test.

FIG. 39.11
HTMLverify is a modified version of Weblint.

Harbinger  Harbinger is a Web site that contains the WebTechs verification service. You can find it at http://www.harbinger.net/html-val-svc. The interface is very similar to the WebTechs interface described earlier in the chapter. If you can't access WebTechs, try this site instead.

URL-Minder  URL-Minder is a Web-based service that notifies you when the Web page at an URL changes. You give it your e-mail address and a list of URLs. The service then notifies you when the content of one of the URLs you specified has changed. The address for URL-Minder is http://www.netmind.com/html/url-minder.html.


NOTE: You can also embed a form in your Web page that enables your users to receive e-mail notification when your Web site changes. See the URL-Minder Web site for an example.


Previous chapterNext chapterContents


Macmillan Computer Publishing USA

© Copyright, Macmillan Computer Publishing. All rights reserved.