©1996, Que Corporation. All
rights reserved. No part of this book may be used or reproduced in any
form or by any means, or stored in a database or retrieval system without
prior written permission of the publisher except in the case of brief quotations
embodied in critical articles and reviews. Making copies of any part of
this book for any purpose other than your own personal use is a violation
of United States copyright laws. For information, address Que Corporation,
201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp
Notice: This material is excerpted from Special
Edition Using CGI, ISBN: 0-7897-0740-3. The electronic version of this
material has not been through the final proof reading stage that the book
goes through before being published in printed form. Some errors may exist
here that are corrected before the book is published. This material is
provided "as is" without any warranty of any kind.
CHAPTER 28-Learning from the Pros
In a world where the objective of writing CGI is to publish it immediately,
it makes sense to give you a list of public pages that illustrate some
of what we've talked about so far in this book. After all, knowing that
something can be accomplished is sometimes all you need to inspire you
to do it for yourself.
I picked sites that are outstanding for one or more of several reasons:
Either the site demonstrates a superb and elegant use of CGI, the site
has good CGI reference materials, or the site has CGI tools you can download
or buy. In the search for excellence, I'm not terribly concerned about
whether the software you'll find is freeware, shareware, or a commercial
application. My only objective is to show you how to do it right. There
may be cheaper ways of doing it right; there will certainly always be more
expensive ways. But the sites I offer are doing it right, and doing it
right now. There's no reason you can't be doing it, too.
Undoubtedly, every reader will be able to tell me that I missed a great
site, or that XYZ Corp. gives away as freeware what I featured for $1,800
from ZYX Corp. Save your postage stamps, please. I intentionally skipped
some well-known and excellent sites-in most cases, because the sites are
too busy to be useful. This is the worst Catch-22 of the Web: If you do
something very clever or very popular, your server is likely to be overwhelmed
by visitors. WWW should stand for World Wide Web, not World Wide Wait.
So I tried to choose sites that offer a reasonable response time, and are
solid rather than fashionable. I also probably overlooked some great sites
just because the Web changes so fast.
That said, you'll look at the following areas:
- Programming tutorials and sample code
- CGI and SSI freeware and shareware
- Fun stuff: examples of things done right
- Connecting SQL databases
- Spiders, worms, crawlers, and robots
- CGI interactive games
- A brief case study: Internet Concepts, LLC
This Ever-Changing URL in Which We Live
URLs change. Sites come and go. Some last for years, others for days
or hours. Sometimes a popular site becomes temporarily unavailable due
excess traffic. Sometimes a router fails between you and the site. Sometimes
the site's server goes down. Sometimes a site just...disappears.
Any book that provides current information runs the risk of becoming
outdated. A book like this one, though, is almost certain to have expired
links-pointers to sites that have either moved or gone on to the Great
Bit Bucket in the Sky.
We made every effort to ensure that, as of the date of manuscript preparation,
the URLs provided throughout this book were correct and working. By correct,
I mean that the site's URL is given accurately, and that the content available
at that site roughly correlates with what we said it did. There's no guarantee,
especially when we refer to subpages on a site, that the Webmaster hasn't
shuffled things around-or even decided to give up CGI information in favor
of spotlighting the latest interactive smut fiction. By working,
I mean that we tested the link and it seemed both reliable and reasonably
Programming Tutorials and Sample Code
I'll start off by examining a variety of online tutorials, many of which
include sample code. Some of them are meant to be tutorials; others are
just such good examples of programming, or such simple code, that they
become lead-by-example instruction sheets. I won't bother to list too many,
since the book you're holding is one long CGI tutorial in itself. However,
even a book as comprehensive as this one can't cover everything, so here
are pointers to some fundamental or esoteric tutorials you might find useful.
- (The Common Gateway Interface).
If your high-school teachers did their jobs, you'll know that you must
go back to the original sources when doing research. You'll make your teachers
and yourself happy by reading this tutorial from NCSA. Starting at ground
zero and working up to a library of examples, this hypertext document gives
the proper foundation for further exploration.
- University of Utah's Introduction to (CGI
Programming ). This document contains an introductory tutorial on CGI
programming, including some example CGI programs. If you're already an
accomplished programmer, this tutorial will provide the basic information
you need to develop your own CGI programs. The example programs are in
UNIX Bourne shell language (sh) and Perl. They're relatively simple
programs and should be understandable to anyone familiar with the UNIX
environment and the C programming language.
- W4 Consultancy (http://sparkie.riv.net/w4/software/counter/). Digital
counter script for UNIX. From this page, you can download a gzip of the
counter and also access a FAQ about the counter. If you haven't done any
CGI work before, this might make a good first project. I'm including this
one here, rather than in the programs section, because the code and documentation
make an excellent primer.
- ( Gates-o-Wisdom
Software ). NCSA-based SSI page counter tips and tricks for UNIX. Also
contains a great tutorial on general SSI and CGI techniques for NCSA servers.
- (Teleport CGI
Scripts ). This page is a compendium of Perl and shell scripts for
users of Teleport Internet Services. However, you'll find when you look
at the individual script documentation sections that the source code is
usually included. The scripts you find here are short and sweet, and give
you a good idea of how to accomplish many common tasks.
- (WebSite CGI). If
you use WebSite, you won't find a better reference than Bob Denny's own
documentation (after all, he wrote the server). WebSite is one of the most
popular and successful NT servers. It attacks the GUI problem directly,
by providing built-in support to link the Web server with VB, Delphi, or
other Windows development environments. This particular page provides jumping-off
points for technical papers, server self-tests, CGI programming, and related
Corner, explains the peculiarities and strengths of WebSite CGI, and
gives you step-by-step instructions for using WebSite's support for VB,
Delphi, and Perl.
- You can also visit Bob Denny directly at Bob
Denny's, where he'll entertain and enlighten you further.
- CGI Scripting with MacHTTP and AppleScript).
You may find it hard to reach this site, but once you get there, you'll
discover a wealth of information pertaining to scripting for the Macintosh.
- CGI and
AppleScript. Here's a Dr. Dobb's Journal article by Cal Simone, founder
of MainEvent software. This is a wonderful tutorial by a gifted author
and programmer. In it, you'll learn the essentials of how AppleScript interfaces
with your Macintosh system, and how you can use it to do CGI magic.
- Writing CGI Scripts for (WebStar).
This site offers support for the Macintosh's WebStar server via UserTalk
in the Frontier environment. You'll find a good explanation of how to use
Frontier to create dynamic HTML on your WebStar server. If that's your
platform, this is your tutorial.
- (The Amiga HTTP Common
Gateway Interface ). Mike Meyer takes time out to explain the tips
you need to run CGI scripts on an Amiga Web server. He includes a number
of useful examples in a link at the bottom of the page.
CGI and SSI Freeware and Shareware
As you wander through the online world looking for samples of scripts
for tips on technique, you may run across some ready-to-run scripts that
do exactly what you want. In this section, I present some sites that offer
freeware or shareware CGI, SSI, and Java scripts. You won't find anything
wild or strange here: these are workaday programs you can take home and
put right to work doing useful tasks.
Don't forget about the list of publicly available software libraries
at the end of Chapter 3, "Designing CGI Applications."
You'll find pointers to routines there that can save you hundreds of development
You've probably encountered many of these software offerings without
knowing it. If you've visited an NT server with a graphical counter, for
instance, chances are good that the site is using Kevin Athey's creation,
or at least the GD library component of it. For that matter, you'll find
that the GD library is used in most CGI scripts that produce on-the-fly
GIFs, regardless of platform. Likewise, most of the programs in this section
are proven products in wide-spread use. Fill your coffee cup, clear some
space on your hard disk, and get ready to download.
Here are some of the best freeware and shareware tools available to
spice up your Web pages and make your site more powerful:
- (Behold! Software ).
Kevin Athey's collection of CGI programs for Windows NT and Windows 95
includes a list of sites using his software. Behold! Software is a place
where you can get free software for Windows 95 and Windows NT. The emphasis
of the site is CGI and Web utilities. The two utilities available at the
time of this writing were a hit counter and a real-time clock.
- Examples of Perl CGI Scripts, with (Source
Code ). This no-nonsense page demonstrates six useful utilities,
all built with Perl-a clickable image map, a way to maintain state information,
how to generate a random number, how to hunt up names in a phone book,
a way to design a self-scoring questionnaire, and a client pull demonstration.
- Each utility includes the source code, which usually has a nice header
explaining the script's function, but absolutely no documentation thereafter.
Fortunately, these scripts are short and simple enough that you can probably
figure out what's going on.
- (Mooncrow's CGI/Perl
Source Page ). "Mooncrow" is Carl M. Evans, a long-time
computer professional with a BSEE, an MSEE, several commercial applications,
and a text book to his credit.
- Also to his credit is Mooncrow's Aeyrie, which includes Mooncrow's
CGI/Perl Source Page. "When I decided to create and run my own Web
pages, I had trouble locating adequate resources on the Internet concerning
CGI/Perl programming," says Evans, "so I created my own. While
scripts can be written in a number of languages, I prefer to use Perl 4
or Perl 5. It doesn't matter what platform the server is being run on as
long as the server supports Perl 4 and/or Perl 5 compliant scripts."
- Evans ended up with probably the most complete reference set of Perl
programs available on the Internet. With over 50 links to tutorials, sample
programs, reference materials, and source code, Evans provides a wonderful
resource for anyone thinking of using Perl for CGI scripting.
- (gd 1.2 Library). If
you're planning to create on-the-fly GIFs, don't miss Thomas Boutell's
wonderful C library. You can incorporate this code directly into your own
programs to give them spontaneous GIF creation powers.
- (Greyware Automation
Products). Greyware provides a good selection of freeware and shareware
CGI and SSI programs for NT and Win95. The SSI programs here are the ones
discussed in detail in Chapter 16, "Using Server-Side Includes,"
and included on the CD-ROM.
- Greyware's CGIShell program is of particular interest to anyone wanting
to do CGI with Visual Basic, Delphi, or another 16-bit GUI development
language on EMWAC. CGIShell comes with a handful of fully functional demonstration
programs with source code, including a guest book written in VB4 that you
can put to work immediately. The online documentation often provides a
good explanation of what goes on behind the scenes.
NT Web Server Tools). Jim Buyens has put together a great resource
covering programs the provide Server Extensions, Connectivity, DNS, Finger,
Firewall, FTP, Gopher, HTTP, Log Analysis, Mail, News, NFS, Perl, Publishing,
Search Engines, Software Suites, Telnet, TFTP, WAIS, and X-Windows Clients.
Oh, yes, there's also a category called Other Resources for the things
he couldn't fit into the existing groups.
- If you're running Windows NT, put this one in your bookmark file. You'll
find yourself coming back again and again.
NT Web Server Tools). It's a long URL, but worth typing. This site
is probably the most comprehensive repository of NT software on the Internet.
It has a little bit of everything, and a lot of things you won't find elsewhere.
You'll see this site featured again in the "Examples of Things Done
Fun Stuff: Examples of Things Done Right
Here's a collection of sites that demonstrate stylish, informative,
creative, or intriguing uses of CGI on the Web. You'll find plain old CGI
and SSI mixed in with Java, real-time audio, real-time video, and others.
I'll start small, with a simple page counter, and work my way toward
the bizarre and fanciful. I picked sites that illustrate technique and
taste. If you don't find any ideas for programs in this section, check
your pulse-you may already be dead.
- Voyager, Publisher of Interactive Media (http://www.voyagerco.com/).
Tasteful and elegant presentation all-around. Pay particular attention
to the current date and quote-of-the-day, which are carefully blended into
the page's overall theme.
- (The Amazing
Fishcam!). No list of sites would be complete without including
the one, the original, the amazing Fishcam! This site is nothing more than
two cameras focused on a tank of fish. Nothing more? Well, as the site
explains in gleeful detail, there's a lot more. You can look at the fish
in low resolution or high, and if you're running Netscape, you can visit
the Continuously Refreshing Fish Cam-a wonderful example of server push
technology. Although the idea of watching fish in near real-time isn't
particularly exciting, this site was one of the first to demonstrate the
power of the Web to provide electronic photos. Just in case you care about
the fish as well as the technology, this page happily refers you to 12
other aquatic sites.
- (The Amazing Parrot-Cam!).
If fish aren't enough, here's Webster, the parrot, on a live camera feed
for your viewing pleasure. In addition to good camera-work, this page has
a nice explanation of how their camera is set up and connected to the computer.
This site takes you on a whirlwind tour of the Internet. With over 8,000
sites in its list of URLs to choose from, you often find interesting and
surprising places you never would have chosen to visit otherwise. Autopilot
relies on Netscape's client pull function to whisk you from site to site
every 12 seconds. This is also a good demonstration of random URL generation.
Generator). This handy site lets you build an image file to use
as a background. It starts with some stock images, then takes you through
a customization phase where you can edit the colors until you get exactly
what you want. This UNIX magic comes to you via a program written by email@example.com.
List of Internet-Accessible Machines ). This is the got-everything
page for Internet gadgets. Want to find a Coke machine that responds to
a ping? Want to change the track on a CD player at Georgia Tech? Do you
care about Paul Haas's current refrigerator contents? Want to play with
a remote-control model railway over the Internet? Are you craving some
real-time Internet Talk Radio from NRL? Or did you ever wonder how to find
the infamous Ghostwatcher home page? This site points you to all the cool
places for gadgets, machines, and goofy things on the Internet. Great for
helping you think of new ways to use the Web!
- Dr. Fellowbug's Laboratory of Fun & (Horror).
Great examples of games and general interactivity...with a macabre twist
that's as much fun as the games themselves. No software to download here,
but hours of entertainment, and perhaps an idea or two for the terminally
twisted mind. The animated Hangman game is particularly well done.
- (The Electric
Postcard). This site uses CGI and e-mail in a clever way. It's
one of those "Gee, duh!" ideas that other people always seem
to get first. The Electric Postcard lets you choose from a variety of amusing
(or just plain strange) postcard stock, then lets you personalize your
Windows NT Application Center). You saw this site earlier in the
section on freeware and shareware programs. I'm listing it again here because
it's the cleanest example on the Web of interfacing a back-end database
with a software library. The site is well-indexed, carefully categorized,
and easy to use. Kudos to Beverly Hills Software for providing such a well-designed
and useful site.
- (The Vertex Award) (Nanimation
of the Week). While often almost too slow to be tolerated, this
site is nevertheless important enough that I included it for you...I think
it's worth the wait. A Nanimation is a Netscape animation. This page lists
the Vertex Award winners for best Nanimation on the Internet. Even the
introduction to the award lets you know you're in for something special.
The pages that win awards are spectacular.
- (The Netscape
Engineering Sign). This CGI-machine interface lets you type a message
to be displayed in huge green letters on a sign in the Netscape office's
engineering pit. Let's hope they never put it out on a runway. "Land
here!" "No, over there!" "Yo mamma!"
Web in Pig Latin ). This one could easily win the award for most
bizarre idea ever to grace the Internet. In fact, it's won several awards:
Business Weekly's "As A Time Out" Site of the Week; The Stick's
Misc Surf Site; a Hot Site in Internet World; and "a site that 'does
stuff' by the Center for the Easily Amused." Basically, you enter
an URL on a form provided. The CGI program goes out, fetches the page,
and presents it to you in Pig Latin. Arly-nay, ooday! I present it here
because the CGI does more than create HTML on the fly for you; it actually
goes out and fetches a page, playing a browser role, to generate the HTML.
- Talk to
My Cat (). Well, why not? This site, says author Michael Witbrock,
has a speech synthesizer connected to the computer. You type in a sentence,
and the speech synthesizer says it aloud to Michael's cat. If the cat happens
to be around, that is. And awake. And listening. Who knows? Who cares?
Is this any different from talking to a cat in person?
- (WebChat Broadcasting
System). WBS, or WebChat Broadcasting System, is one of the cleanest
examples of real-time chatting using the Web. With hundreds of "channels"
(separate discussion areas) to choose from, WebChat offers something for
everyone. And it seems everyone has been there once or more. WebChat boasts
over 35 million hits per month. They'll also sell you their software to
run on your own server, or lease you space on their server. There's also
a freeware version of WebChat available with limited features. You'll need
a UNIX machine to run it, although a port to NTPerl is under way.
the Web-Controlled Robot ). Xavier isn't a toy. Xavier has three
on-board 486 computers, a Sony videocam, and enough engineering guts to
rebuild the atom bomb from scratch. Well, maybe not, but he can tell Knock-Knock
jokes! Users can issue commands to Xavier and, by tapping into this video
eye, watch him carry those commands out. Xavier communicates to the rest
of the world with wireless Ethernet. What I want to know is why Xavier
gets to go wireless before I do?
In this section, I'll point out sites that do indexing well. For the
sake of contrast and instruction, I'll include one that actually makes
the content harder to find than if it were buried at sea in a locked cabinet.
This kind of egregious irresponsibility is rare, though, and I'm happy
to provide you with several of the best and brightest searchable sites
on the Internet. I'll start with examples of small sites, and work my way
up to the behemoths at Infoseek and Alta Vista.
- (The UBC Facility of Medicine
Home Page). A good example of a site (really a collection of pointers
to sites) done up with a static index. For this type of project, where
full-text indexing is either impossible or impractical, UBC demonstrates
how to do it manually. If you haven't visited this site before, be sure
to make a bookmark for it. The information presented here is invaluable.
). Perl code for preparing your site to participate in the ALIWEB master
index and search engine. Useful even if you don't plan to participate,
since you can examine the Perl code to see what kinds of information are
used to create a site index.
- Technical Discussion of the (Harvest
System ). A thoughtful and complete overview of the problems inherent
in current indexing systems, along with the rationale behind the new Harvest
System's approach. For information on getting the Harvest software, or
to sample sites already using it, see Harvest's main page at this
Newsgroup-related Indexes ). This site contains a list of pointers
to several other WAIS engines maintaining full-text indices for a number
of popular UseNet newsgroups. If nothing else, you can visit these sites
to see how efficient WAIS can be. WAIS is often overlooked these days in
favor of large relational database back ends, but there's no reason not
to use WAIS for appropriate tasks. If you need a full-text search engine
to handle a reasonable amount of data, WAIS can do the job quickly and
- (Greyware Site Index).
Here's an example of using WAIS to catalogue all the HTML on a site. The
WAIS catalogues are rebuilt daily and stored in one directory. Static HTML
documents in that directory let you select the database, then execute the
actual search using Boolean operators and keywords. This site proves that
WAIS is alive and well on the NT platform. You can search over 11M of index
in less than a quarter-second, on average. The cataloguing itself takes
about 15 minutes a day to run.
- Social Security Handbook 1995, from the United States Social (Security
Administration). I'm including this URL for a very specific reason:
This is the best example I've found of exactly the wrong way to
index a site. The material could easily be organized with a database engine-even
FreeWAIS could handle it without breathing hard. Instead, the "index"
is nothing more than a list of links: "Index letter A," "Index
letter B," and so on. When you choose an index letter-roughly corresponding
to the first word of the subject, rather than the key idea of the subject,
you'll find a bunch of static links to documents by number. Yes, that's
right, by inscrutable SSA document number. Good luck ever finding anything
here. They'd have done much better by throwing everything in one directory
and using keyword retrieval. Study this page carefully so you know how
not to do it. If you're ever tempted to organize your site this
way, be prepared to deal with angry e-mail from your bewildered and abused
- (Infoseek Guide ).
Here's an example to balance the Social Security Administration's abomination.
This search engine shows how it should be done. It's clean, fast,
easy to use, and remarkably useful. Infoseek's award-winning engine not
only brings you speedy results, but a great deal of flexibility for advanced
users. If you're writing your own search engine from scratch, take a close
look at Infoseek's specifications and capabilities first. When you realize
the size of the task and the sanity Infoseek brings to it, you'll be even
- (Alta Vista).
Another example of how to do things right. Using some frighteningly powerful
DEC workstations and servers, Alta Vista brings you an incredibly fast,
incredibly large index of Internet sites and newsgroup contents. The proprietary
64-bit search software was developed in-house by Digital's research laboratory
personnel. These guys aren't fooling around. The indexer software can crunch
a gigabyte of text per hour. Scooter, their Web-spider which collects information,
can visit up to 2.5 million sites each day. Although the presentation isn't
as slick as Infoseek's, the search engine's breadth of knowledge simply
staggers the mind. This is a technology to watch.
- Indexes and Search Engines for (
Internet sources). A useful list of search engines and indexes
maintained by Jan Wright. Jan's list will help you find the proper search
engine for your site.
Connecting SQL Databases
Many Web servers, especially recent entries into the field and those
designed for the NT platform, have database connectivity built in to the
server. Even those servers that don't talk to databases directly (through
ODBC, or Open Database Connectivity) usually include a CGI module of some
sort that does. While this allows the advertisers to claim that the server
comes packaged with database functionality, often the level of database
support is only good enough to demonstrate connectivity, not build a real
application. In any case, older servers, especially in the UNIX world,
usually have no database support at all.
This section looks at a few third-party products designed from the ground
up to help you connect your Web server to a back-end database. While many
products are available, the ones I chose are clear leaders in the field-either
because of outstanding performance, or general availability and widespread
- (Cold Fusion). Cold Fusion
is a full set of connectivity tools to make your Web server work seamlessly
with your SQL back-end database server. Works with O'Reilly WebSite, Netscape
HTTPD, or Process Software's Purveyor. Support for other platforms is coming
- Users don't need to program in C, Perl, or any other programming language.
Cold Fusion provides the power automagically through HTML, using high-level
database commands and a general-purpose CGI scripting language.
- Cold Fusion's heart is DBML.EXE, a CGI script tailored for ODBC access
to the back-end database of your choice. Cold Fusion dynamically generates
HTML pages containing the results of queries or submissions, and lets you
freely mix if-then-else conditional processing and multiple SQL statements
in with your regular HTML.
- W3-mSQL (W3-mSQL).
W3-mSQL is an interface package that lets you use mSQL (a freeware, light-weight
UNIX SQL engine) with your Web server. W3-mSQL is a CGI script that works
by interpreting enhanced HTML on the fly. Using a variation on HTML comments
to embed W3-mSQL commands, you connect to, query, update, and close a back-end
database entirely within your HTML.
- If you're planning to use mSQL on your UNIX machine and don't want
to write the interface code yourself, check out W3-mSQL.
See "MiniSQL (mSQL) and W3-mSQL," for more information on these two packages.
- ( mSQLJava
Home Page ). This site offers a library of HotJava classes suitable
for use with an mSQL back-end database. The package is copyrighted by Darryl
Collins, but may be used, copied, and redistributed under the terms described
in the GNU General Public License. At this site, you'll find links to FTP
sites where you can download the class library, links to pages with documentation,
and links to pages with sample programs and source code.
- (mSQL (MiniSQL)).
If all this talk about mSQL tools has you wondering about the back-end
database itself, here's the official source of information and code. While
the site is occasionally very slow to respond, it's best to get the information
straight from the horse's mouth.
- mSQL is a lightweight freeware SQL engine for UNIX machines. It's fully
ANSI-compatible, but implements only a subset of SQL commands. For Web
developers, this is ideal, since the subset includes just about everything
you'll need and discards the bits you'll never use.
Tango Solutions, from Everyware, is a complete CGI package for the Macintosh
to connect HTML to their own back-end database, ButlerSQL. Development
is underway to allow Tango to talk to other SQL engines, but at the moment
only ButlerSQL is supported. There's no charge for the ButlerSQL version
of Tango; versions that connect to other databases may have a fee eventually.
- On the Tango home page, you'll find links to demonstration programs-some
of them rather slick-for online shopping, conferencing, and other useful
ways to take advantage of Tango on your Macintosh server. You'll also find
a non-searchable FAQ page with links to individual questions-and-answers
(we have to wonder why Everyware didn't store this information in a ButlerSQL
database and let the user search for keywords using Tango), and generic
product information. You may download the Tango software directly from
- (Oracle World Wide Web
Interface Kit Archive). If you're using Oracle as your back-end
database, look no further than this page for your interface software. Oracle
meticulously provides information for interfacing most common Web servers
with their database product. They even examine cross-platform connectivity
issues and third-party products, and have complete working examples of
useful programs-including one that lets you do a keyword search of NCSA's
World Wide Web Connection, Version 1 ). With typical IBM verbiage
and charts, this page shows you how to go about connecting your OS/2 or
AIX Web server to a DB2 back-end database. You'll find demos showing how
DB2 WWW Connection V1 (that's the short name) can generate Netscape tables
to hold query results, and you can download the software directly.
- If your platform is OS/2 or AIX, and you're trying to talk to a Big
Blue database, this package is probably your best bet.
Gateway List ). Here's a handy site maintained by KangChan Lee.
Lee has gathered in one place links to dozens of Web-to-database gateway
programs, methods, and tutorials.
- If you're using a back-end database other than the ones I've already
mentioned in this chapter, take a glance at Lee's page. You'll probably
find your database there, along with a helpful link to available software
Spiders, Worms, Crawlers, and Robots
If you're just looking for information from the Internet, use one of
the publicly available search engines. It's unlikely you'll ever have the
resources to duplicate the mighty Alta Vista, for example, and even if
you do, you'll need more help than this section could possibly give. Besides,
all the really good robot code has commercial value, and hence isn't freeware.
On the other hand, if you want to build a small special-purpose spider,
worm, crawler, or robot, some code is available to help you get started.
More important than how to do it, however, is how not to do it. That's
why the first link I'll present is to an article you must read if
you're going to build a Web automaton. Also be sure to check out Chapter
14, "Robots and Web Crawlers," for more information about
Web Agents). This white paper by David Eichmann discusses the ethics
of using automata on the World Wide Web. If you don't want to be inundated
with angry letters from systems administrators, read this paper carefully
before you write the first line of code for your nifty new robot.
- This article is highly informative, with hot links to references and
other papers pertinent to the subject. By reading this paper, you'll arm
yourself with all the knowledge necessary to build a Web-safe robot.
MOMSpider is a UNIX-based Perl 4 program. You may use or modify this code,
subject to the generous licensing restrictions from the University of California,
Irvine. If nothing else, you can use the code as a jumping-off place when
building your own automaton.
). Checkbot is a link-verification tool written in Perl, using
libwww. Written by Dimitri Tischenko and Hans de Graaff, this robot collects
links (starting from a given URL), and then validates them. This is a handy
tool to have around, although you'll probably want to modify it for your
). Victor Parada's WebCopy program takes a command-line argument
of an URL, then goes out and fetches the document. It can run recursively,
fetching all links referenced by that document. You can download the code
right from the site, and start using it right away (you'll also need Perl,
if you don't already have it). By design, this program won't follow links
across multiple servers; this is to protect you from (a) endless recursion,
and (b) retrieving more than you bargained for.
- (WebWatch). WebWatch is
a commercial program for Win95, but you can download an evaluation copy.
(The evaluation copy has an expiry feature built in, and you don't get
to see the source code.) The documentation says the program doesn't work
on Windows NT now, but will soon.
- WebWatch is a personal-use spider that monitors your bookmarks, updates
lists of sites, checks for changed information, and so on. You'll find
step-by-step installation instructions and a short FAQ. You may or may
not find this product useful, but it certainly demonstrates some smart
thinking and slick marketing. You could do worse than to build a robot
with this kind of user interface and intelligence.
CGI Interactive Games
There are thousands of online games to choose from, if you're of a mind
to play games on the Internet. In this section, I've selected a few that
illustrate CGI techniques particularly well. Some are incredibly complex,
others very simple, yet all deal with maintaining state information to
provide the illusion of interactivity.
- (Real Virtual, Incorporated).
Real Virtual does far more than Dungeons & Dragons on the Web, but
it does that bit exceptionally well. For the CGI student, there's a lot
to study here (plus, if you like fantasy role-playing games, you can have
a great time). Pay particular attention to the way Real Virtual maintains
state information as you move through the setup screens. View the document
source and notice all the hidden fields containing your selections, plus
information to let the CGI program know what to do next.
- Real Virtual spent a lot of time and care developing this project.
From the user's point of view, the Fantasy Worlds adventure looks a lot
like a PC-based game, but with all the advantages of being real-time and
). Netropolis lets you become the CEO of a corporation located in or
around England. The goal is to win lots of cash and taunt the other players.
- Of special interest here is the slick use of image maps to provide
a sense of location, plus the integration of e-mail into the game.
- If you like stomping on the business competition, you may also enjoy
- ( S.P.Q.R).
The first thing you should notice when you stop by S.P.Q.R. is that the
URL at the top of your browser changes to something like Site.
This vile concoction isn't something you'd want to type manually, but is
there for a purpose. If you go to S.P.Q.R. with that URL, you'll resume
the game wherever you left off. S.P.Q.R. (from Time Warner Electronic Publishing)
generates a fake URL for you on the fly when you walk through the front
door. From then on, throughout the game, that URL marks you as you, so
the game can preserve state information.
- The game itself is visually simple, but intriguing. You wander through
Rome collecting scrolls (which you can then read) and keys (which you can
use to unlock things). Your mission is to save Rome from disaster. The
game doesn't miss a beat when it comes to maintaining state information,
or matching graphical output to what you've done.
- (QIN: Tomb of
the Middle Kingdo). This cool game from Pathfinder also uses an
artificially munged URL to keep track of who's who. The game is a visual
version of a text-based adventure game, with low-key but nevertheless impressive
graphics. This game lets you wander around a virtual 3D world by clicking
the view presented. It's all done with image maps, so one of the inherent
failings is that you can click anywhere-not just areas that do something.
- This isn't the fault of the game design. It's a problem with using
image maps on things without clear boundaries. For example, a toolbar or
row of icons clearly has places to click. The trivial case occurs when
the user clicks right on a boundary or on the background by mistake. In
a game where you're clicking areas of a 3D picture to govern motion, however,
most clicks are null. The trivial case becomes the few areas of the image
that actually do something. This can lead to lots and lots of clicking,
just to find out which areas of the image map are hotspots. Keep this in
mind when you're designing your own game.
Barney Fun Page). If you really hate Barney (the big purple dinosaur),
you'll love this page! Gerald Oskoboiny lets you get out all your angst
against the Purple One with a knife, a gun, an axe, an UZI, a shotgun,
a motorcycle, or a cannon. You select your weapon and fire away, changing
weapons as needed. Each time you shoot, the picture of Barney changes to
show the wounds, and you get a caption like, "Barney has been grazed.
You can do better than that," or "Barney has been slightly wounded,"
until, at last, Barney dies.
- Gerald thoughtfully keeps on file morgue photos from the last ten Barney-killings,
so you can view the corpses and celebrate. Now, if only the Purple One
would stay dead...
- This site is sometimes slow (probably due to all the crazed Barney
hunters), but instructive for the would-be CGI programmer. Although the
subject matter is just plain silly, the site demonstrates well how to make
static drawings become interactive.
A Brief Case Study: Internet Concepts, LLC
Internet Concepts, LLC, knows that content and presentation are the
two things that make one site stand out from another. They have created
several award-winning sites you may have already encountered, as follows:
These sites not only are well-designed and visually appealing, but they
take an unusual approach to the development of site content: They rely
on the user to provide it.
Using a framework of hand-crafted CGI scripts written in Perl 5 and
running on a Sun SPARCstation, Internet Concepts lets users submit an entry
on a fill-out form. A CGI script then processes that entry, adding it to
the database and making it immediately available on the Web.
Stephan Spencer of Internet Concepts says, "Some consider this
real-time updating risky, but since December 1994 when we first implemented
this practice we have had no notable problems. Nonetheless, we'll probably
change this in the near future to a policy of holding submissions in a
'pending' area until we have reviewed them."
The database is based on Perl dbm. The script that processes
new entries requires the user to assign a password, too. The user can then
make changes to that entry later on. A "root" or master password
allows site supervisors to change individual passwords, edit entries, or
delete entries. Another script allows browsing. It displays the database
sorted by name, organization name (if applicable), category/genre, and
location. Most of these sites are also keyword-searchable.
InnSite even offers a geographical search interface that responds to
user clicks by zooming in indefinitely on a region, returning images real-time
from the (Xerox PARC Map Viewer ).
Internet Concepts provides many of these sites as public service to
the Internet community. They also, however, design and implement many commercial
sites. One of the most interesting is the Online Catalogue at (Seton Identification
Products). This site offers the "Workplace Safety Home Page,"
and a searchable online catalog of thousands of signs, labels, tags, pipe
markers, and other identification products. The site supports online ordering
of over 6,000 items.
If you're interested in finding out more about Internet Concepts, their
home page is at this site, or you can send them e-mail at .
The wizards at Internet Concepts have used CGI to create their enchantments.
With what you've learned in this book, you can invoke the magic of CGI,
Previous Chapter <-- Table
QUE Home Page
For technical support for our books and software contact firstname.lastname@example.org
Copyright ©1996, Que Corporation