Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp .com.

Notice: This material is excerpted from Special Edition Using CGI, ISBN: 0-7897-0740-3. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

CHAPTER 15 - Generating HTML Documents in Real Time

HyperText Markup Language (HTML) lets you publish text and graphics in a platform-independent way. Using HTML, you can easily, via embedded links, weave a world-full of sites together.

In this chapter, you examine static and dynamic HTML, concentrating on the latter. Dynamic, or real-time, HTML extends the viability of the Web far beyond its original conception.

You learn what makes real-time HTML tick and how to produce it in a variety of ways. Specifically, this chapter provides

Static HTML

Need to review the complete works of Mark Twain? Want to find the address of a manufacturer in Taiwan? Need the phone number for the White House? Ever wondered how to spell floccinaucinihilipilificatrix? Or what it means? (Yes, that's a real word. You won't find it in any dictionary except the Old English Dictionary., though, so put away your Webster's Collegiate.)

The answers are only as far away as your favorite search engine. These types of references are perfectly suited to the Web. They seldom, if ever, need revision; after they're written and thrown on a page, other sites can establish links to them, search engines can catalog them, and you can find them-today, tomorrow, next week, or next year. Because the markup language used to create these pages is HTML and the content of the pages is static (relatively unchanging), such pages are called static HTML.

But what if you want to know the stock prices-not 10 hours ago or 10 days ago, but right now? What if you want to know the arrival time of American Airlines Flight 101? What if you need to know the ambient temperature in Brisbane as of 30 seconds ago?

In these cases, static documents just won't do. Not even if a diligent, never-sleeping Webmaster does his level best to keep the documents updated. For these sorts of applications, you need real-time, or dynamic, HTML.

Real-Time HTML

All CGI-generated HTML is technically "real-time" in that it's generated on the fly, right when it's needed. In data processing circles, however, the term refers more to the data itself than the production thereof.

Therefore, a CGI program that talks to a hardware port and retrieves the current temperature and then generates HTML to report it would be considered real-time. A CGI program that looks up your birthday in a database wouldn't.

In this chapter, I don't worry too much about the technical definitions. I call all CGI programs that produce time-sensitive or user-sensitive output "real-time." This includes uses such as the following:

Benefits of Real-Time HTML

The prime, and most immediately apparent, benefit of real-time HTML is that the information is fresh. Getting the stock market report from yesterday's closing is one thing; finding the value of a specific stock right this minute is something else altogether. The information has different value to the consumer. People pay for up-to-the-minute information.

Another, somewhat less obvious, benefit is that real-time HTML can make your pages seem livelier. For example, in the next chapter you examine a page counter and a random-quote generator. You can put them together on a page to produce output like this:

And so on. Granted, this particular example is rather trivial. Many readers may not even notice that the wording changes each time, and those who do won't have their lives, careers, or religion changed by it. But this example should give you an idea of the sorts of pages you can make by using real-time document generation.

Methods of Generating Real-Time HTML

The following are the four main methods of generating dynamic pages:

In the following sections, you tackle them in order.

Scheduled Jobs

A scheduled job is a batch file, shell script, or other program that runs at a regular interval. These jobs usually run in the background-that is, invisibly and independent of the foreground task-and may run once a month, once a day, or once a minute. The interval is up to you. A special case is the program that runs continuously (called a daemon in the UNIX world, and a service in the Windows NT world), spending most of its time asleep, and waking up only periodically to accomplish some task. Usually, though, background jobs are scheduled. They run at the appointed time, do their jobs, and quit, only to repeat at the next scheduled time.

The method of scheduling varies from operating system to operating system. In UNIX, you find the cron utility most appropriate. Under Windows NT, the AT command makes the most sense.

Scheduled jobs are useful for information that changes infrequently but regularly. A quote-of-the-day program is probably the best example. You don't need to invoke a CGI program to retrieve or regenerate a program that changes only once a day. It's far better to write a program that updates your HTML at midnight and then let the page get retrieved normally.

Regular CGI or SSI

For page counters and similar programs, either CGI or SSI (see the next chapter for examples of SSI) make the most sense. The kind of information being generated is what drives your choice. Because a page count changes only when a page is retrieved, updating it then makes sense. A scheduled job is clearly inadequate for up-to-the-moment data, and the remaining methods-Client Pull and Server Push-are inappropriate because you don't want a continuous update.

A trivial, but nonetheless useful, example of using CGI to provide dynamic HTML is a CGI program that redirects the browser to a static page appropriate for that browser. For this example, assume that you want to provide different pages for each of the following browser types: Netscape, Microsoft Internet Explorer, and Lynx. Any browser that can't be identified as one of these three gets redirected to a generic page.

ByAgent is a complete working sample of using CGI to provide a dynamic response. You should be able to compile it for any platform. You can find the source, plus sample HTML files and a compiled executable for the 32-bit Windows NT/Windows 95 environment.

Compile the code (as shown in listing 15.6) and name it byagent.exe. Put the compiled executable in your CGI-BIN directory. If you're using a 32-bit Windows environment, you can skip the compile step.

To test this program, you need to create a number of static HTML files. The first will be used to demonstrate the others. Call it default.htm.

Listing 15.1 default.htm: HTML to Demonstrate ByAgent
<h1>ByAgent Test Page</h1>
This page demonstrates the ByAgent CGI program.  Click <a href="/cgi-bin/byagent.exe?">here</a> to test.

As you can see, this code is fairly straightforward. If your CGI-BIN directory is called something else, correct the link in the preceding code.

Now you can create four individual pages: one for Netscape, called netscape.html (see listing 15.2); one for Lynx, called lynx.html (see listing 15.3); one for Microsoft Internet Explorer, called msie.html (see listing 15.4); and one for everyone else, called generic.html (see listing 15.5).

Listing 15.2 netscape.html: Target Page for Netscape Browsers
Congratulations! You got to this page because your browser identified itself as a Netscape (or compatible) browser.
Listing 15.3 lynx.html: Target Page for Lynx Browsers
Congratulations! You got to this page because your browser identified itself as a Lynx (or compatible) browser.
Listing 15.4 msie.html: Target Page for MSIE Browsers
Congratulations! You got to this page because your browser identified itself as a Microsoft Internet Explorer (or compatible) browser.
Listing 15.5 generic.html: Target Page for Generic Browsers
Congratulations! You got to this page because your browser identified itself as a something other than Netscape, Lynx, or Microsoft Internet Explorer.

Put these files together in a directory, and load default.htm into your browser. Click the test link. You should see the page corresponding to your browser. Listing 15.6 shows the actual code to accomplish the redirection.

Listing 15.6 byagent.c: Source Code for ByAgent CGI Program
// This program demonstrates how to redirect
// a browser to a page that matches the browser.
// It depends on the browser's self-identification,
// so a browser that lies can get the wrong page.
// In general, most programs that claim to be
// "Mozilla" are either Netscape, fully compatible
// with Netscape, or Microsoft Internet Explorer.
// The special case of MSIE can be identified
// because although it says "Mozilla," it also
// says "MSIE."

#include <windows.h>
#include <string.h>
#include <stdio.h>

void main() {

     // First declare our variables.
     // We'll use three pointers and a character
     // array.  The pointers are UserAgent, a
     // pointer to the CGI environment variable
     // HTTP_USER_AGENT; Referer, a pointer to
     // the CGI environment variable
     // HTTP_REFERER; and p, a generic pointer
     // used for string manipulation.  The
     // remaining variable, szNewPage, is where
     // we build the URL of the page to which
     // the browser gets redirected.

     char     *UserAgent;
     char     *Referer;
     char     *p;
     char     szNewPage[128];

     // Turn buffering off for stdout

     // Get the HTTP_REFERER, so we know our directory
     Referer = getenv("HTTP_REFERER");

     // Get the user-agent, so we know which pagename to
// supply
     UserAgent = getenv("HTTP_USER_AGENT");

     // If either user agent or http referer not available,
// die here
     if ((Referer==NULL) | (UserAgent==NULL)) {
          printf("Content-type:  text/html\n\n"
             "<h1>Pick your browser</h1>\n"
            "ByAgent could not find either the "
            "HTTP_REFERER or the HTTP_USER_AGENT "
            "environment variable.  "
            "Please pick your browser from this list:\n"
             "<li><a href=\"generic.html\">Generic</a>\n"
             "<li><a href=\"lynx.html\">Lynx</a>\n"
             "<li><a href=\"msie.html\">Microsoft</a>\n"
             "<li><a href=\"netscape.html\">Netscape</a>\n"

     // This program assumes that the browser-specific pages
     // are in the same directory as the page calling this
     // program.  Therefore, we'll use the HTTP_REFERER to
     // get our URL, then strip the HTTP_REFERER's page
// name, and add the proper browser-specific page name
// to the end.

     // First, copy the HTTP_REFERER value to szNewPage, so
     // we have something to work on.

     // Find the last forward slash in the URL.  This is
     // the separator between the directory and the page
     // name.
     p = strrchr(szNewPage,'/');

     // If we found no forward slash, assume some sort of
     // weird server and hope a relative path will work by
     // chopping off the entire URL.
     if (p==NULL) p = szNewPage;

     // Mark the end of the string, so we can concatenate
     // to it from that point on.
     *p = '\0';

     // Convert to lower-case so we can do more efficient
     // searches.

     // We are now ready to output a redirection header.
     // This header tells the browser to go elsewhere
     // for its next page.  A redirection header is
     // nothing more than a standard content type
     // followed by "Location: " and an URL.  The
     // content type is separated from the redirection
     // by a single newline; the entire header is
     // terminated by a blank line (two newlines).

     // If user agent is Microsoft Internet Explorer,
     // redirect to msie.html
     if (strstr(UserAgent,"msie")) {
          printf("Location: %s/msie.html\n\n",szNewPage);

     // If user agent is Lynx,
     // redirect to lynx.html
     if (strstr(UserAgent,"lynx")) {
          printf("Location: %s/lynx.html\n\n",szNewPage);

     // If user agent is Netscape,
     // redirect to netscape.html
     if (strstr(UserAgent,"mozilla")) {
          printf("Location: %s/netscape.html\n\n",szNewPage);

     // If none of the above,
     // use generic.html
     printf("Location: %s/generic.html\n\n",szNewPage);

As you can see, the preceding code is fairly simple. The comments far outweigh the lines of code. The only tricky bits to this program are (a) remembering to format the redirection header correctly, and (b) remembering that Microsoft Internet Explorer claims to be "Mozilla" (Netscape) if you don't look carefully.

In your own program, you may want to incorporate some mechanism to allow the secondary pages to live in a different directory, or even on a different server, just by changing the Location information. You may also consider generating the correct HTML on the fly rather than redirect the browser to an existing static page. Now that you know how to identify the browser and do redirection, your imagination is the only limit.

Client Pull

Client Pull is a Netscape enhancement. Several other browsers now support Client Pull, but you should be careful when writing your HTML to include options for browsers that can't deal with it.

In typical browsing, a user clicks a link and retrieves a document. With Client Pull, that document comes back with extra instructions-directives to reload the page or to go to another URL altogether.

Client Pull works via the META HTTP-EQUIV tag, which must be part of the HTML header (that is, before any text or graphics are displayed). When the browser sees the META tag, it interprets the contents as an HTTP header. Because HTTP headers already support automatic refresh and redirection, not much magic is involved at all. Normally, the server or CGI program is responsible for sending the HTTP headers. Netscape's clever idea was to allow additional HTTP headers inside a document.

Say you have a Web page that reports election returns. A background process of some sort reads the precinct numbers from a Reuters connection (why not?) and once every 10 seconds rewrites your Web page with the current data. The client can hit the reload button every ten seconds to see the new data, but you want to make that process automatic. Listing 15.7 shows how to do it.

Listing 15.7 default.htm: Demonstration of Client Pull
<title>Election Returns</title>
<h1>Election Returns</h1>
This document refreshes itself once every ten seconds.  Sit back and watch!

Note the META HTTP-EQUIV line. This line causes the browser to refresh the page once every 10 seconds. Of course, for this example to be useful, you need to have some other process updating the page in the background, but this example works-it will reload the page once every 10 seconds.

Why once every 10 seconds? Because each time it fetches the document, the browser sees the instruction to load it again ten seconds later. The instruction is a "one-shot" instruction. It doesn't tell the browser to load the page every ten seconds from now until doomsday; it just says to load the page again ten seconds from now.

You also can use Client Pull to redirect the browser to another page. In listing 15.8, the browser goes to Microsoft after 5 seconds.

Listing 15.8 takeride.htm: Take a Ride to Microsoft with Client Pull
<title>Take a Ride</title>
<h1>Take a Ride to Microsoft</h1>
This page takes you to Microsoft's Web server in five seconds.
If your browser doesn't support META commands, click <a href="">here</a> to go there manually.

This example uses the URL= syntax to tell the browser to go to the specified URL. The delay is set to five seconds. Note also that text is included to explain what's going on, and a manual link is included for people who have browsers that don't support Client Pull.

You can set the refresh delay to zero. This tells the browser to go to the designated URL (or, if no URL is specified, to reload the current page) as soon as it possibly can. You can create crude animations this way.

You can set up a chain of redirection, too. In the simplest configuration, this chain would be two files that refer to each other, as listing 15.9 shows.

Listing 15.9 page1.html and page2.html: Two pages that refer to each other


<title>Page One</title>
<h1>Page One</h1>
This page takes you to Page Two.


<title>Page Two</title>
<h1>Page Two</h1>
This page takes you to Page One.

When the user first loads page1.html, he or she gets to see page1.html for one second. Then the browser fetches page2.html. Page2.html sticks around for one second and then switches back to page1.html. This process continues until the user goes elsewhere or shuts down his or her browser.

The META tag requires a fully qualified URL for redirection; that is, you must include the site part of the URL. Relative URLs don't work, because your browser, just like the server, is stateless at this level. The browser doesn't remember where it got the redirection instruction from, so a relative URL is meaningless here.

Also, you're not limited to redirecting to a page of static HTML text. Your URL can point to an audio clip or a video file.

Server Push

Server Push works with more browsers than does Client Pull, but it's still limited. If you use this technique, be aware that some users can't see your splendid achievements.

Server Push relies on a variant of the MIME type multipart/mixed called multipart/x-mixed-replace. Like the standard multipart/mixed, this MIME type can contain an arbitrary number of segments, each of which can be almost any type of information. You accomplish Server Push by outputting continuous data using this MIME type, thus keeping the connection to the browser open and continuously refreshing the browser's display.

Server Push isn't a browser trick; you need to write a CGI program that outputs the correct HTTP headers, MIME headers, and data. Server Push isn't for the faint-hearted. To pull it off, you need to understand and use just about every CGI trick in the book.

A Server Push continues until the client clicks the STOP button or until the CGI program outputs a termination sequence. Because the connection is left open all the time, a Server Push is more efficient than a Client Pull. On the other hand, your CGI program is running continuously, consuming bandwidth on the network pipe and resources on the server.

In a standard multipart/mixed document, the headers and data would look something like listing 15.10.

Listing 15.10 Example of Multipart/Mixed Headers
Content-type: multipart/mixed;boundary=BoundaryString

Content-type: text/plain

Some text for part one.

Content-type: text/plain

Some text for part two.


The boundary is an arbitrary string of characters used to demarcate the sections of the multipart document. You use whatever you specify on the first header for the remainder of the document. In this example, BoundaryString is the boundary marker.

The blank lines in listing 15.10 aren't there to make the text more readable-they're part of the headers. Your program will fail if you don't follow this syntax exactly!

Each section of the document begins with two dashes and the boundary marker on a line by itself. Immediately thereafter, you must specify the content type for that section. Like a normal header, the content type is followed by one blank line. You then output the content for that section. The last section is terminated by a standard boundary marker with two dashes at the beginning and end of the line.

Server Push uses the same general format but takes advantage of the MIME type multipart/x-mixed-replace. The x means that the MIME type is still experimental; the replace means that each section should replace the previous one rather than be appended to it. Here's how the preceding example looks using multipart/x-mixed-replace:

Content-type: multipart/x-mixed-replace;boundary=BoundaryString

Content-type: text/plain

Original text.

Content-type: text/plain

This text replaces the original text.


In a typical Server Push scenario, the CGI program sends the first header and first data block, and then leaves the connection open. Because the browser hasn't seen a terminating sequence yet, it knows to wait around for the next block. When the CGI program is ready, it sends the next block, which the browser dutifully uses to replace the first block. The browser then waits again for more information.

This process can continue indefinitely, which is how the Server Push animations you've seen are accomplished. The individual sections can be any MIME format. Although the example in this chapter uses text/plain for clarity, you may well choose to use image/jpeg instead in your program. The data in the block would then be binary image data. Each block you send would be a frame of the animation.

ServPush is a complete working sample of a Server Push program. You should be able to compile it for any platform. You can find the source, plus a compiled executable for the 32-bit Windows NT/Windows 95 environment.

Compile the code for ServPush (see listing 15.11) and name it servpush.exe. Put the compiled executable in your CGI-BIN directory, and test it with <a href="/cgi-bin/servpush.exe?">Test Server Push</a>.

Listing 15.11 servpush.c: Demonstration of Server Push
// This program demonstrates SERVER PUSH of text
// strings.  It outputs a header, followed by 10
// strings.  Each output is an x-mixed-replace
// section.  Each section replaces the previous
// one on the user's browser.
// Long printf lines in this listing have been broken
// for clarity.

#include <windows.h>
#include <stdio.h>

void main() {
     // First declare our variables.  We'll use "x"
     // as a loop counter.  We'll use an array of
     // pointers, called *pushes[], to hold 10 strings.
     // These strings will get pushed down the pipe,
     // one at a time, during the operation of our
     // program.
     int     x;
     char     *pushes[10] = {
               "Did you know this was possible?",
               "Did you know this was <i>possible</i>?",
               "Did you know this was <b>possible?</b>",
               "<font size=+1>Did you know this was "
               "<font size=+2>Did you know this was "
               "<font size=+3>Did you know this was "
               "<font size=+4>Did you know this was "
               "<font size=+5><i>DID YOU KNOW THIS WAS "
               "<font size=+6><b>DID YOU KNOW THIS WAS "
               "<b><i>Now you do!</i></b>"

     // Turn buffering off for stdout

     // Output the main HTTP header
     // Our boundary string will be "BoundaryString"
     // Note that like all headers, it must be
     // terminated with a blank line (the \n\n at
     // the end).
     printf("Content-type: "

     // Output the first section header
     // Each section header must start with two dashes,
     // the arbitrary boundary string, a newline character,
     // the content type for this section, and TWO newlines.
            "Content-type: text/html\n\n");

     // Output a line to describe what we're doing
     printf("<h1>Server Push Demonstration</h1>\n");

     // Loop through the 10 strings
     for (x = 0; x < 10; x++) {
          // Output the section header first
                 "Content-type: text/html\n\n");
          // Flush output, just to be safe
          // Wait to let the browser display last section
          // Output data for this section
          printf("Special Edition: Using CGI<br>"
                 "Server Push demonstration.  "
                 "Push %i:<br>%s\n"
                 ,x+1, pushes[x]);
          // Flush again

     // All done, so output the terminator.
     // The trailing two dashes let the browser know that
     // there will be no more parts in this multipart
     // document.

Now that you see how it's done, you should be able to make your own programs. If you want to push graphics instead of text, change the MIME header for the individual sections, and output binary data. (See Chapter 3, "Designing CGI Applications," for details about raw versus cooked mode; you need to tell the operating system to switch the STDOUT output mode to binary if you're going to send binary data.)

Interestingly, you can use Server Push to create animated inline graphics in an otherwise static document. To do so, first create your static document. Include an <img> tag, with the source pointing to a CGI Server Push program instead of a graphics file. For example, say that you've written a Server Push program called photos.exe, which outputs a slide show of your family album. Here's how you can incorporate a dynamic slide show into your HTML:

<head><title>In-Line Push</title></head>
<h1>In-Line Push</h1>
This page of otherwise ordinary HTML includes a link to a server push program. Sit back and watch the show:
<img src="/cgi-bin/photos.exe?">

Near Real-Time HTML

As you saw earlier in this chapter in the section "Methods of Generating Real-Time HTML," not everything needs to be generated on the fly. Documents that are updated regularly and served as static documents are often called near real-time, because the information is fresh but the document itself is static. Often, CGI is used to update the document (rather than create the document in real time). This allows the document to reflect changes immediately, but avoids the overhead of running a CGI program every time a browser fetches the document.

MHonArc (pronounced monarch) is a good example of providing near real-time content. This freeware Perl 5 program (available from this site ) provides e-mail archival to HTML, with full indexing, thread linking, and support for embedded MIME types. Although the HTML pages themselves are already composed and retrieved normally, they can be updated in the background. You can schedule the MHonArc program to run at regular times, or it can be triggered by the arrival of new mail. Although the code is highly UNIX-centric-and therefore not particularly useful on other platforms-you can examine the source for ideas and techniques.

List maintenance also benefits from near real-time HTML. Lists of favorite links or FTP directory listings don't change very often, but you want them up-to-date at all times. A database with a real-time CGI program to retrieve and format information may be overkill here. A more efficient method is to have a CGI program that updates the list as new information is added, or a scheduled job that updates the list from a central database at regular intervals.

The (SFF-NET) uses a combination of CGI, SSI, and static documents to provide up-to-the-moment lists without running a CGI program every time. When visitors want to propose a new link for one of the lists on the SFF-NET, they fill out an online form that invokes a standard CGI program. The CGI program validates the information, adds the words not validated yet, and appends it, in proper HTML format, to a text file. Users never see this file directly; instead, when they browse a list of links, they see a static HTML page that uses an SSI include file function. The new links (in the text file) show up in the list right next to the existing links. This provides real-time updating of the overall list without touching the main HTML page. The site administrator then looks at the text file of new links at his leisure, and moves new links from the text file to the HTML file.

See  "Fun Stuff: Examples of Things Done Right,"  and "CGI Interactive Games," p. xxx, for more examples of real-time and near real-time HTML. 

Server Performance Considerations

Dynamic HTML can be a lot of fun and can be extraordinarily useful at times. However, it doesn't come without cost.

The first consideration is for caching proxy servers. If your page includes a page count or a random quotation or a Server Push animation, it can't be cached. Defeating caching isn't necessarily an evil-you wouldn't want your up-to-the-second stock market quotes to be cached, for instance-but it can create unnecessary network traffic.

If you visit the Usenet groups regularly, you see a recurring theme of experienced old hackers venting their spleens at newbies who chew up bandwidth for no reasonable purpose. The range of opinion you find goes from calm, rational argumentation, to wild, impassioned screeds. Some go so far as to say that any CGI program is evil and that page hit-counters are the devil's own spawn.

In a book with the sole purpose of teaching you how to write your own CGI scripts, you won't find much support for the extremists. The network is there to be used. Like any limited resource, it should be used wisely rather than wastefully. The problem is in determining what's wise. If you keep your high-traffic pages static, you'll make everyone except the true Internet curmudgeons happy. Of course, if you're a Java developer, all bets are off. The new ways of using the Web are completely incompatible with caching from the start.

The second thing to consider is that CGI programs tax the Web server. For each retrieval that calls a CGI program, the server must prepare environment variables, validate security, launch the script, and pipe the results back to the caller. If a hundred scripts are executing simultaneously, the server may become overburdened. Even if the server has sufficient resources to cope, the overall server throughput will suffer.

Server Push puts more of a strain on the system than almost any other type of dynamic HTML, because the script continues executing (consuming processor cycles and memory) theoretically forever. Just a few of these scripts running at the same time can bring an otherwise capable server to its knees. They have a high level of traffic and resource consumption for relatively little gain.

There are no hard and fast rules. As with any system, you must balance performance against cost, and capacity against throughput.

Previous Chapter <-- Table of Contents --> Next Chapter

QUE Home Page

For technical support for our books and software contact

Copyright ©1996, Que Corporation