Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103^rd Street, Indianapolis, IN 46290 or at support@mcp .com.

Notice: This material is excerpted from Special Edition Using CGI, ISBN: 0-7897-0740-3. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

CHAPTER 10-Using MIME with CGI

You can use MIME (Multipurpose Internet Mail Extensions) to set up your server to deliver multimedia features such as audio and video via CGI scripts. MIME is an important part of the conversation between a Web browser and a server during which the capabilities of the browser to handle different media types is revealed.

In this chapter, you learn

What MIME is
How MIME is used on the Web
MIME content types
How to map MIME types to browser helper applications and configure your server to recognize MIME types
Security issues you need to consider with MIME
How to negotiate content based on MIME

For your reference, I list the MIME types approved by the Internet Assigned Numbers Authority (IANA), as well as unapproved types supported by popular servers and browsers in Appendix B, "Commonly Used MIME Media Types." I also provide several CGI script fragments that you can use to code your own MIME headers.

MIME is a technical specification, originally developed for Internet mail, that is used to define the type of content a Web server sends a browser, and what the browser can accept. MIME isn't rocket science. It's more like high school science: If you can understand a few basic concepts and follow directions, you can pass the course. However, if you're truly into it, you can proceed to graduate-level usage, involving content-negotiation between server and browser.

In fact, you're already an expert in using MIME. You've used MIME without knowing it since the first time you browsed the Web. It's the underlying mechanism that enables a browser to know what a Web server is sending it, and how to handle it.

What is MIME?

MIME is often erroneously defined as an acronym for Multimedia Internet Mail Extensions. This is an understandable mistake, since MIME on the World Wide Web is often used for multimedia applications. However, MIME really stands for Multipurpose Internet Mail Extensions, which is significantly different. Not all MIME types deal with non-text media types, as you'll see later in this chapter.

What, you ask, does a mail standard have to do with the Web? Certainly HTML documents bear little resemblance to e-mail. How is it that an e-mail standard was incorporated into the HTTP specification?

To get the answer to these questions, first look at the origin of the MIME standard in Request For Comments (RFC) 1341, which was written back in the dark ages-1992.

RFCs are working notes of the Internet research and development community that often have the force of standards on the Internet. They may concern any aspect of computer communications and may represent anything from meeting notes to Frequently Asked Questions (FAQs) to a proposed specification for a standard. Unless specified otherwise, RFCs are considered to be public domain. While most Internet standards are RFCs, not all RFCs specify standards.

Because the Internet runs on rough consensus and working code, whether an RFC is adopted widely depends a lot on how well it works in practice.

The Internet standard for e-mail was established by RFC 822 (included on the CD), released in 1982. This standard proposed a way for machines to exchange mail documents that were almost exclusively text. Because so many different e-mail systems needed to interoperate, each with different addressing schemes and transmission methods, the RFC established a lowest common denominator for mail interchange.

In June 1992, Network Working Group members Nathaniel S. Borenstein and Ned Freed published RFC 1341 (included on the CD). In it they defined a method for e-mail to contain not only non-textual components such as audio, video, or graphics files, but alternate text formats such as PostScript or Standard Generalized Markup Language (SGML).

Because Internet mail gateways are notoriously finicky about non-text mail, MIME represents a significant extension to the capability of e-mail to transfer information. Before MIME, the only way to include audio, video, or other binary files in e-mail was to first run the file through a program to UUEncode it, or convert the binary file to an ASCII (text) representation. You could then include the ASCII code in your e-mail and send it. Your recipient had to reverse the process (UUDecode) on receipt.

RFC 1341 proposed a way to automatically encode with mail software material that wasn't compliant with the original RFC 822 standard. It also proposed a way to include more than one attachment (known as multipart attachments), possibly with more than one type of encoding. Because the sending and receiving mail software had to know what kind of file it was dealing with, the RFC established several standard MIME types and subtypes.

The seven MIME types defined by the RFC are as follows:

Text
Multipart
Message
Application
Image
Audio
Video

(Each type is defined further in the section "What Are the Standard MIME Types?")

The authors of RFC 1341 intended these types to be exhaustive, and discouraged adding to them. They also intended that subtypes be used to provide support for various implementations of a type-for example, different types of text. To illustrate this usage and to provide a common starting point, they defined several subtypes for each type. The type and subtype are combined with a slash and serve to describe the object. The combination of a type and subtype is commonly referred to as a media type. For example, text/plain, video/quicktime, audio/basic, and application/sgml are all valid MIME media type/subtype combinations.

How MIME Became Part of the HTTP Specification

The inclusion of the SGML subtype is important historically because, as you may know, HTML is an implementation within the SGML standard. (Many would argue this point, but it's clear that was the intention of the authors of the HTML standard.) Because the MIME standard supported SGML as well as many other media types people wanted to deliver over the Web, MIME was a natural inclusion in the HTML and HTTP standards developed by Tim Berners-Lee, Dan Connolly, and the group at CERN.

The effort that lead to the World Wide Web began two years before the MIME standard was issued, in 1990. By October of 1992, four months after the proposed MIME specification in RFC 1341, Connolly's thoughts were turning toward a convergence of MIME and SGML that not only could form the basis of the World Wide Web, but create a platform for other services such as Gopher and WAIS. In a posting to an e-mail discussion list ( this site), he proposed just such a thing, and Borenstein enthusiastically agreed. Connolly asserted that MIME was a good standard for identifying content types in general, not just for e-mail.

The rest, as they say, is history. MIME was adopted into the HTML 2.0 standard and formed an important mechanism by which the Web client and the Web server understand each other's capabilities.

How MIME Is Used on the World Wide Web

MIME is the primary way a Web server tells a Web client about the document or file it's sending. The Web browser also communicates information about its capabilities to the server using MIME types. There's more on this topic later in the sections "Mapping MIME Types to Browser Helper Applications" and "Configuring a Server to Recognize MIME Types."

Any file received without a MIME header is generally assumed by the Web browser to be an HTML text document (MIME type text/html). This can lead to distressing results: generally a string of sentences undisturbed by tabs or paragraph marks. If, on the other hand, a text document is preceded by a MIME header declaring it as text/plain, it's much easier to read. Such documents are displayed without formatting controls such as fonts and heading styles but with paragraph breaks and tabs. Many such documents are on the Web, ranging from converted e-mail messages to RFCs and other Internet standards documents.

All that's necessary for you to do to make a plain text document display nicely, albeit plainly, in a Web browser is to append a MIME header to the top that declares it as the MIME type text/plain. There's more on this in the next section, "Understanding MIME Headers."

Although any document received without a MIME header is assumed to be text/html by most browsers, to be certain your Web browser displays a Web page as an HTML document, the Web server must identify the document as type text/html by first sending the client a MIME header. Interestingly, text/html isn't one of the official media types registered with the IANA. Go figure.

By the same token, the only way to make sure that any other media type will have a chance of being displayed or handled properly is to inform the browser of its type using a MIME header. Thus, MIME allows browsers to distinguish audio clips from video clips from VRML worlds from HTML pages. Being able to understand and manipulate this key metadata will equip you to fully exploit the multimedia potential of the World Wide Web.

Understanding MIME Headers

So what's this thing called a MIME header, and how does one use it? The answer to this question depends on whether you're using e-mail or the Web. First, look at the full MIME specification, which applies to e-mail. The next section describes the elements of the MIME spec that are used on the Web.

As defined in RFC 1341 and subsequently revised in RFC 1521 (included on the CD), a MIME header consists of the following parts:

A version notice of the form Mime-Version: 1.0.
A content type declaration of the form Content-Type: type/subtype. Content-type has the following two optional parameters:

[rb] A boundary parameter of the form boundary="arbitrary boundary delimiter". You only need to declare a boundary in HTML applications for multipart MIME types (more on this later). Most often, HTML documents are all of a single type.
[rb] A character set parameter of the form charset="character set type". Character set can be used in sophisticated server applications that supply a different document, in a different language, depending on content negotiation. Currently, however, most servers don't recognize or deal with charsets.

An encoding type of the form Content-Transfer-Encoding: encoding type. This specifies the form in which the server will transfer the document. Several encoding types are defined: "7bit", "quoted-printable", "base64", "8bit", "binary", and "x-token".
An optional content description of the form Content-Description: description. This is often used to specify a file name for the document to be stored under.

Keep in mind that these are the requirements for e-mail headers. As you'll see in the next section, using MIME headers for HTTP transfers can be much simpler.

The following example demonstrates a typical MIME header for e-mail:

Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="IMA.Boundary.750407228"
--IMA.Boundary.750407228
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Description: cc:Mail note part

In this header, the MIME version is declared as 1.0, and the content type is multipart/mixed, meaning the document contains more than one type separated by a boundary, the boundary being an arbitrary text string. Each content type is then declared after each boundary. In this case, the content after the first boundary is text/plain. The content transfer encoding for the first part of the message is the standard Internet mail 7 bit. The content description can be used by the client to determine either a file name to use to store the section or other information.

Using MIME Headers on the Web

All this may be more than you need if all you want to do is use MIME on the Web. Most Web clients don't require all the detailed information expected by e-mail clients. Web browsers have the advantage of being able to communicate their capability to handle MIME types to the server ahead of time. Let's look at the minimum MIME header requirements for Web usage.

Every time you access a Web page, a dialog occurs between your browser and the Web server. As part of the request for the page, your browser sends a description of the MIME types it understands. It may, for example, tell the server that in addition to the standard text/html, it can understand image/gif and audio/basic. Most servers do nothing with this information, as you'll see later in the section "Content Negotiation Based on MIME."

However, it's possible for a server to give you back a different document based on what your browser says it can handle. Say that the server has two versions of the page you request: a standard HTML version and a PostScript file. By doing some CGI programming on the server, you can make the server send the PostScript version to those browsers that can accept Postscript, and the HTML version to Postscript-challenged browsers. Of course, this assumes that the document in question is served by a Web server that does something with the MIME information sent to it by the client. For a look at reality, see the section "Content Negotiation Based on MIME" later in this chapter.

But how does your browser know which document is coming down the communications pipe? By reading the MIME header, that's how. Every file a Web server sends to a browser begins with a line announcing the content type. For a Web page, the line looks like this:

Content-type: text/html

Don't forget to include a blank line following any MIME header information you send from a server to a browser. If you fail to do this, your header is ignored, and the header text appears at the beginning of the Web document.

If you try to make your CGI script send text of any kind to the browser, and you forget to first send a MIME header, the server will report a 500 error. You'll tear your hair out trying to find a problem with the script, which will run just fine in a Telnet debugging session, but fail when you use a browser. Have your script send a MIME header, and the problem disappears.

When a Web server sends a GIF image, it sends a header such as the following:

Content-type: image/gif
Content-transfer-encoding: BINARY

The content transfer encoding indicates to the browser that binary data follows. The header, as always, must be followed by a blank line.

See  "Use Your Header," p. xxx, for more information on sending headers to the client.

Understanding MIME Content Types

As long as you stay with the traditional MIME content type/subtypes of text/html, image/gif, and image/jpeg, life is good. Virtually all browsers understand these types and can render them without the assistance of helper applications. But what if you have some PostScript files or Microsoft Word files that you'd like to make available to your users through your Web server? If you just make up new content types to fit your whim, text/postscript, for example, your users will at best be presented with a browser dialog box asking if they want to save the file, and at worst be staring at garbage binary data rendered in their browser.

Use of other content types requires coordination between what you say you're sending, and what the browser knows it can handle. In the following section, I discuss the standard MIME types. Refer to Appendix B, "Commonly Used MIME Media Types," for a pretty exhaustive list of the officially sanctioned content types as well as commonly accepted non-standard content types. Later in the section "Mapping MIME Types to Browser Helper Applications," I discuss mapping content types to browser helper applications.

What Are the Standard MIME Types?

As a result of RFC 1521, which updated the original MIME specification, the IANA was established as the certifying authority for new MIME media types. It was the expressed intention of the original RFC that the number of MIME types be limited to the seven proposed in the RFC, as follows:

Text. Used to describe text of various types, including plain text (e-mail), PostScript, and HTML, although HTML isn't an approved subtype.
Multipart. Indicates that a message contains multiple sections with potentially more than one MIME type.
Message. Used for various types of messages, including messages that refer to other messages for elements of their bodies.
Application. A catch-all description allowing for miscellaneous file types.
Image. Used for graphics files, such as GIF or JPEG.
Audio. Used for audio such as Sun u-law (au) or Microsoft Windows .WAV files.
Video. Used for video files such as QuickTime or Microsoft Video files.

Each MIME type has a variety of subtypes, and in practice, you almost never use a type without a corresponding subtype. One exception is NCSA Mosaic's support for the "telnet" type with no subtype.

In addition to the 12 content type/subtype pairs proposed in RFC 1521, the IANA has recognized an additional 45 pairs. With the Internet being a functioning anarchy, other types have become popular, if not sanctioned, by being incorporated into popular servers and browsers.

The RFCs state that local type definitions are allowed, but that they should be prefixed with x- to distinguish experimental from recognized standards. It's possible for such standards to gain support from major browser and server developers and thus become de facto standards. Examples of widely supported x- media types include video/x-msvideo and application/x-rtf (which makes little sense, given the existence of application/rtf).

Refer to Appendix B, "Commonly Used MIME Media Types," for a list of recognized and generally accepted MIME media types.

Using a New MIME Content Type

The RFCs defined several MIME media types and subtypes and the IANA has registered many more. But what if the type you want to use-MacroMedia's Shockwave, for example-is neither registered nor in common use? Before you consider adding a new type to your server, you should first see whether an existing type can serve your purpose. You can do this by checking out the tables listed in Appendix B, "Commonly Used MIME Media Types," or by visiting the anonymous, FTP site, which lists all official types. The MIME media types accepted by the IANA as of early 1996.

Say that no existing types fit your application. You have a couple of choices: defining and using your own type without registration, and registering the new type with IANA.

Creating and Using a MIME Type

If you're sure that you can control both sides of the equation (what your server says and what your users' browsers support), you can create your own MIME type. I did this for a report delivery service my company developed. All the users were using browsers preconfigured by my company's staff, and so I could make up new types with impunity. However, to my chagrin, I found out that not only were some of these types already defined under slightly different names (for example, application/msword), but I was breaking Internet tradition by not naming my types with the prefix x-. Now I'm stuck supporting my hastily conceived MIME types.

Creating your own MIME type involves configuring your server to recognize the type, and making sure that your users' browsers are configured with a helper application (or plug-in, for newer browsers) that can deal with the type. There's more information on this process later in the sections "Mapping MIME Types to Browser Helper Applications" and "Configuring Your Server to Recognize MIME Types."

If you operate a Web site that's visited by the general public, to prevent problems, you shouldn't arbitrarily create your own MIME types, no matter how tempting it is. Either adapt an existing type, or submit your new type to IANA for approval.

Registering Your MIME Type

The best thing to do if you want a broad range of browsers and servers to support your new MIME type is to register the type with IANA. While the process looks quite easy, in practice you must be prepared to argue in favor of the new type with a variety of often opinionated people on the Internet Engineering Task Force's ietf-types mailing list, which was established for discussion of new types.

Submitting your new type for registration is the right thing to do for a couple reasons. First, it's the only way you have a chance for the major browsers to support your type. If the type is unregistered, it's less likely that browser and server developers will support it. Unless you want upset users complaining to you that your server sent them garbage, this is an important consideration.

Second, chances are good that you're not the only person in the world who wants to use the proposed type. And I'm sure you want the world to experience the brilliance of your new MIME type!

The RFCs specify the process for adding content types. The following is adapted from RFC 1590 (available on the CD and at URL):

Present the request for registration to the community.
Send a proposed media type (content-type/subtype) to the ietf-types@uninett.no mailing list. Send e-mail to ietf-types-request@uninett.no to join this mailing list (a real person handles the signup, so be gentle). Proposed content-types aren't formally registered and must use the x- notation for the subtype name.
The intent of the public posting is to solicit comments and feedback on the choice of content-type/subtype name, and other considerations such as security.
Submit the content type to the IANA for registration.
After two weeks, submit the proposed Media Type to the IANA for registration. The request and supporting documentation should be sent to iana@isi.edu. Provided a reasonable review period has elapsed, the IANA will register the Media Type, assign an Object Identifier (OID) under the IANA branch, and make the media type registration available to the community.

The media type registrations are posted in the anonymous FTP directory ftp.isi.edu/in-notes/iana/assignments/media-types. Media types are listed in the periodically issued "Assigned Numbers" RFC.

Be prepared to offer the following two pieces of information about your proposed MIME type:

The data format or a reference to a published specification of the data. This means that if the file format or other aspects of the media type are proprietary, you'll get complaints from the community. The recent discussion over registering Microsoft's PowerPoint as a media type is a case in point. Members of the ietf-types discussion list believe strongly that file formats that are registered should be readily available to the public. However, RFC 1590 specifically addresses this point and states that the requirement is met by identifying the software package and version involved.
Identification of security considerations. You aren't required to state that the media type has no security implications, nor are you required to provide an exhaustive analysis of all the ways its use could create security problems. However, any known risks must be identified. If any security concerns surface after registration, you should publish them in an RFC. See the discussion of security in the section, "Security Considerations with MIME," later in this chapter.
When registering a new type, keep in mind the following:
Trying to register a type name that's a trademark of another company is frowned on due to the risk of lawsuit. Many people on the discussion list will dis you if you try.
If you propose a new type rather than a new subtype for an existing type, chances are good that you'll get roundly flamed. The types set forth in RFC 1341 were intended to be exhaustive, and are treated as gospel. Such a change is really possible only through issuing an RFC.
Don't propose new transfer encodings; these are an article of faith as well and require changing the MIME spec, not simply registration.

If this looks like a daunting process, well, it can be. The ietf-types community is aware of and concerned by the proliferation of MIME types. They're perfectly willing to allow dozens of x- types to exist, and are equally willing to limit the number of officially blessed media types.

Nonetheless, registration is really the only way you can hope to have wide adoption of your new media type. And without browser support, it will be hard, if not impossible, to have the type accepted.

Mapping MIME Types to Browser Helper Applications

It's probably happened to you before-you click a cool link, and Netscape presents you with a dialog box similar to the one shown in figure 10.1.

Fig. 10.1

You see the Unknown File Type dialog box when Netscape doesn't know what to do with a file the server is sending.

The problem: Your browser doesn't recognize the MIME type the server says it wants to send you. The server has sent a MIME header, in this case, Content-type: audio/x-wav, that isn't supported by the built-in capabilities of your browser. The typical way such MIME types are supported is by obtaining an application to play or display the type and configuring your browser to start this helper application when it sees that particular MIME type.

In this example, if you have a Windows PC with a sound card, you can configure the MPLAYER.EXE application to play the .WAV sound for you. To do this with Netscape 2.0, use the following procedure (see your browser's manual for information on other Web browsers):

If you got the Unknown File Type dialog box, note the MIME type displayed. This is the MIME type for which you need to set up a helper application. If you want, you can click the Pick App button and then skip to step 3. Otherwise, open the Options menu and choose General Preferences.
Click the tab marked Helpers (see fig. 10.2).

Fig. 10.2

Use the Helpers page in the Preferences dialog box to configure a viewer in Netscape.

Choose the Create New Type button. You'll see a dialog box similar to figure 10.3.

Fig. 10.3

Use the Configure New Mime Type dialog box in Netscape to create a new MIME type.

Type the MIME type in the Mime Type text box. Be certain to type it exactly as it appeared in the Unknown File Type dialog box, including upper- and lowercase. Don't include the slash. For this example, you type audio.
Move the cursor to the Mime SubType text box. Type the MIME subtype, which appears after the slash in the Unknown File Type dialog box. For this example, you type x-wav. Choose OK.
You see the Helpers page in the Preferences dialog box. In the File Extensions text box, type common extensions for files of this type separating multiple file extensions with commas. For this example type of audio/x-wav, you type .wav, .WAV. This setting is used only to identify files when a MIME type isn't sent to the browser, as when a local HTML file is loaded.
In the Action section, select the Launch the Application radio button. This indicates that you want to start a helper application when this MIME type is encountered. Type the full path to the application's program file in the text box. If you don't know where to find the program, select the Browse button to look on your disk for the program file. In this example, type MPLAYER.EXE, the program file for the Media Player application under Microsoft Windows. No path information is necessary because this program lives in the Windows path.
Choose OK to return to Netscape. If you selected Pick App from the Unknown File Type dialog box, the file will be downloaded and the helper application will start. In this example, the .WAV file is downloaded and loaded into the Media Player (see fig. 10.4). Use the controls in the Media Player application to play the sound.

Fig. 10.4

After configuring a helper for the new type, you're ready to go.

If you have a UNIX-based browser, chances are that it supports the mailcap standard proposed in RFC 1524. The process of configuring a MIME type is similar to that described above. Refer to your browser's documentation for instructions on configuring MIME types, or refer to the text of the RFC. You can also check out sample mailcap files on Web sites such as site or this site.

If you're using Mac Mosaic or another browser, refer to your browser's documentation to configure a MIME media type. The process is similar to that described above.

Configuring a Server to Recognize MIME Types

So you've added the appropriate MIME type to your browser and it works fine on someone else's site. Now you've added a cool .WAV file to a page on your site. But when you click it, Netscape complains about a completely different MIME type, something like application/x-httpd-cgi, or perhaps application/octet-stream. What gives? What happened to audio/x-wav?

The problem here is that you can't arbitrarily add files with new MIME types to your site and expect your server software to recognize them. Your Web server must be configured to recognize all the MIME types you want to use. Unless you specifically tell it otherwise, by means of a CGI script, a Web server generally assigns MIME types based on the file name extension (the characters following the dot) or last few characters of the name of the file in question. If the server doesn't recognize a file, it assigns a default MIME type, generally based on whether the file appears to be ASCII (application/text) or binary (application/octet-stream), or whether it's the result of a CGI script (application/x-httpd-cgi).

It's easy to add new MIME type support to your Web server if you have access to your server's configuration files, and if you can take the server down and start it back up. If you don't have such access, it can be an exercise in social engineering: You must convince your site manager to add the type for you. Depending on his or her mood, you may be forced to grovel a little first. Be prepared to give reasons why you can't live without this support.

If you have access to the Web server, adding your new MIME type is usually a matter of editing one or more configuration files. I provide instructions for two popular UNIX servers, the NCSA and the W3/CERN, in the next sections. See your server documentation for instructions on adding a MIME type to other Web servers.

Adding a MIME Type to the NCSA Server

The NCSA server-and derivative servers such as APACHE-references several configuration files that control its operation. These files are usually located in a directory called conf under the directory that contains the httpd, or server daemon, file. You can add a MIME type to this server in one of two ways: by editing the server resources map, srm.conf, or by editing the mime.types file. To avoid problems if you ever reinstall the server (and thus accidentally overwrite mime.types), the preferred method is to edit srm.conf.

Use the following procedure to set up a new MIME type in the server resources map file for the NCSA server:

Locate the server resource map file (srm.conf). It's usually found in the conf directory beneath the directory from which you start your server. In UNIX, edit it by using vi or another text editor. If you're among the vi-challenged and download srm.conf to a PC for editing, see the tip at the end of this section.
Find the section in the file that concerns the AddType directive. If you haven't modified the file, the section begins with the following lines:

# AddType allows you to tweak mime.types without actually editing it, or to
# make certain files to be certain types.
# Format: AddType type/subtype ext1

Add a line using the form AddType type/subtype ext1. (The ext1 in this form stands for the extension, or last few characters, of the file name, and it's case-sensitive.) For example, to add the type audio/x-wav, type AddType audio/x-wav wav. This tells the server that when it sees a file ending in wav, it will inform the browser that the MIME type is audio/x-wav. However, files ending in WAV will not be affected by this directive, because the case doesn't match.
If you want multiple file extensions to be recognized, include them on the same line, separated by spaces. For example, AddType application/msword doc DOC tells the server that files ending in both doc and DOC are the application/msword type.

If you're using the NCSA server and want to have HTML files on your site that don't end in the traditional .html extension, edit srm.conf and add the line AddType text/html newextension, where newextension is the file ending you want the server to recognize as HTML. For example, to make your server recognize files ending in .htm as HTML documents, add the following line to srm.conf:

AddType text/html htm
***end tip***

Save the file and restart the server. The server will reread the configuration files and now recognize the new MIME type.

Make sure that the server resources map file is saved as plain ASCII text. This is especially important if you download the file to a PC for editing and then transfer it back up. The NCSA server is very finicky about the file format, and if you transfer a PC-edited file as binary, your server won't work correctly.

Suppose that you want to add the media type audio/x-wav to your NCSA server's configuration. Your existing configuration file looks, in part, like the following:

AddType text/html htm
AddType application/x-msexcel xls

To add the .WAV type, insert the following line:

AddType audio/x-wav wav WAV

After you restart the server, any files the server delivers that have the names ending in wav or WAV are identified as audio/x-wav. Also, any CGI scripts you create that reference this type will be understood by the server.

Adding a MIME Type to the W3 (CERN) Server

The W3, or CERN server, and derivative servers use a single configuration file that controls their operation. This file can be located anywhere on the server. If the file isn't /etc/httpd.conf, you must start the server with a parameter indicating the file's name and location. This can make it difficult to locate the file. See for more information.

The best way to find the proper configuration file is to use the command ps -ax | grep httpd (or ps -ef | grep httpd for System V-based systems such as Solaris). This finds all instances of the server daemon, usually httpd or some variation, now running on the UNIX machine. (If you've renamed your server something else, grep for that name instead.) This command prints the command line used to start the server. Look for the parameter -r. On BSD-style UNIX systems, the command output looks like this:

%: ps -ax | grep httpd
128 ?  IW    0:19 httpd -r /web/program/webstart.conf
%: ps 128
PID TT STAT  TIME COMMAND
  128 ?  IW    0:19 httpd -r /web/program/webstart.conf

What follows is the location and name of the configuration file. If you can't see the entire line, try running ps processnumber, where processnumber is the number of the process.

After you locate the server file, follow these steps to add the MIME type:

Edit the server configuration file using vi or another text editor.
Find the section in the file that concerns the AddType directive. If the section doesn't exist, you can create the lines anywhere in the file.
Add a line of the form: AddType .ext type/subtype encoding. In this form, .ext indicates a dot followed by one or more characters. This is the extension on the files you want to identify as the MIME type. Encoding means the type of transfer protocol you want the server to use when sending the file to the Web browser. Legal types include binary and ascii. To add the type audio/x-wav, type AddType .wav audio/x-wav binary. This tells the server that when it sees a file ending in .wav, it will inform the browser that the MIME type is audio/x-wav and transfer the file in binary mode.

If you're using the W3 (CERN) server and you want to have HTML files on your site that don't end in the traditional .html, add the line AddType .newextension text/html ascii, where newextension is the file ending you want the server to recognize as HTML. For example, to make your server recognize files ending in .htm as HTML documents, add the following line to your configuration file:

AddType .htm text/html ascii

Save the file, kill the server (or HUP it), and restart it. The server will reload the configuration files and now recognize the new MIME type.

Now add the media type audio/x-wav to the W3 server's configuration. The existing configuration file looks, in part, like the following:

AddType .xls application/x-msexcel binary
AddType .avi video/msvideo binary

To add the .WAV type, insert the following line:

AddType .wav audio/x-wav binary

After you restart or HUP the server, any files the server delivers that have the .wav extension are identified as audio/x-wav. Also, any CGI scripts you create that reference this type will be understood by the server.

As you can see, adding a new MIME media type to the browser and the server is a simple process. The browser and the server can now understand each other's capabilities. However, before you use a helper application with any MIME object, you need to be aware of the security implications.

Security Considerations with MIME

The Internet is a wild and woolly place. As the old saw goes, on the Internet, nobody knows you're a dog. Or a hacker. Because you don't always (or even usually) know who the person is behind that cool Web server you've accessed, you need to be very careful when defining new MIME types for your browser to accept. Even seemingly innocuous media types such as application/postscript or application/msword can potentially wreak havoc on your system.

The danger lies in the concept of the helper application. Usually, such applications are merely tools to display or play files. One might play an audio file; another might display a video movie. It's hard to imagine these media types damaging an unsuspecting client system.

However, when you define a helper application that's a more full-featured program, perhaps one with its own scripting language, such as Microsoft Word or Excel, you're opening yourself up to a lot of potential damage.

Word, just as an example, allows a macro to delete files and directories. The recent Word Prank (also known as Concept) virus is a good demonstration of the security problems inherent in distributing Word documents. If you haven't run into this nasty little critter yet, chances are good that you will.

The Prank virus is based in a Word macro. It infects the default document template, NORMAL.DOT, and every subsequent document you open. It does so by installing several macros, one of which is an auto open, or self-running, macro that runs each time you open a Word document. Fortunately, the Prank virus is benign. All it does is display a dialog box each time you open a document. Prank was probably created simply to prove a point. And the point is clear: Defining programs with powerful scripting capabilities as helper applications is a risky business.

In the specific case of the Prank virus, you can inoculate your Word installation using files available on the Microsoft Web site.

Where possible, the best solution is to use viewers with limited features, such as the Word viewer available from Microsoft, when dealing with documents of unknown origin. Where this isn't possible, you need to be practice safe computing: Don't load strange programs or display strange files on your computer.

Netscape and other browsers allow you to configure a MIME media type in order to have the browser prompt the user each time that type is downloaded. The user can then decide if he wants to start a helper application to display or play the file.

Because of security issues, major browsers such as Netscape display warnings when a known type with a potential for security issues is downloaded. Such a message is displayed in figure 10.5.

Fig. 10.5

Netscape displays a security warning whenever downloading a potentially insecure MIME object.

This warning can be disabled, however, and probably will be by most users due to the inconvenience of having to approve each download.

Security Considerations with PostScript

A more subtle threat is posed by PostScript. Although many people don't know it, PostScript is a full-featured programming language, not just a page description language. Many of its operators allow access to disks and other system resources. To quote RFC 1521,

The execution of general-purpose PostScript interpreters entails serious security risks, and implementors are discouraged from simply sending PostScript email bodies to "off-the-shelf" interpreters.

Various specific features of the PostScript language are considered security risks, such as the setsystemparams, setdevparams, deletefile, renamefile, filenameforall operators, as well as facilities for exiting the normal interpreter, or server, loop such as the exitserver and startjob operators.

One of the most popular helper applications for viewing PostScript files is Ghostscript (available in two versions: GNU, from, and from Aladdin). Since GNU version 2.6.1, Ghostscript runs in secure mode by default, meaning that it doesn't allow potentially harmful actions.

If you or your users want to view PostScript documents, be sure to run the latest version of Ghostscript and its companion viewers, Ghostview and GSview.

Future Secure MIME Standards

For a couple of basic reasons, there are security concerns regarding MIME media types. By enabling a helper application, you're no longer passively browsing the Web. You're instead taking code of various types and running it or playing it using your computer's processor. This isn't dangerous in and of itself. You run code from other sources each time you buy a commercial software package. There's a big difference between commercial code and code from the Internet, however. You have reason to trust commercial software developers. Code you run off the Internet usually comes from an unknown, essentially untrusted, source.

To trust the code, you must be able to trust the originator of the code, and the first step toward that trust is making sure that the originator is who he says he is. The concepts of a digital signature and digital certification have been proposed to help solve this problem.

Various authentication and security schemes have been proposed and implemented on the Internet. For example, a major proposed e-mail standard, Internet Privacy-Enhanced Mail protocols (PEM), is described in RFCs 1421-1424. Most security discussions have centered on securing the data stream between point A and point B, and on the ability to ensure that the sender of a message is who he says he is. Less widely discussed are the security problems inherent in MIME. One proposed standard that's closely related to MIME has been proposed by RSA Data Security, Inc, creators of the popular RSA public key encryption scheme.

RSA's Secure MIME (S/MIME) standard is intended, like MIME before it, primarily for use in e-mail. RSA proposes a "digital envelope" technology that could be used to contain e-mail. According to RSA's documents (at RSA ), this methodology uses a symmetric cipher utilizing DES, Triple-DES, or RC2 for message encryption, and a public-key algorithm is used for key exchange and digital signatures.

As far back as 1991, RSA proposed a set of Public-Key Cryptography Standards (PKCS) dealing with various aspects of security. PKCS #7 deals with secure message bodies, and PKCS #10 is a message syntax for certification requests. They proposed two MIME types, application/x-pkcs7-mime, which specifies that a MIME body part has been cryptographically encoded, and application/x-pkcs10 for use in submitting a certification request.

If S/MIME becomes an Internet standard, and it's not at all clear that it will, it seems reasonable to assume that aspects of it will, like MIME itself, become part of the World Wide Web. However, many major software vendors have expressed support for S/MIME, including Microsoft, Lotus, Banyan, VeriSign, ConnectSoft, QUALCOMM, Frontier Technologies, Network Computing Devices, FTP Software, Wollongong, and SecureWare.

The digital certification technology in particular would be useful in resolving some of the security concerns embodied by the use of MIME. According to an RSA white paper,

Digital certification is an application in which a certification authority 'signs' a special message m containing the name of some user, say 'Alice,' and her public key in such a way that anyone can 'verify' that the message was signed by no one other than the certification authority and thereby develop trust in Alice's public key.

If you can trust that whoever is sending you a MIME object is who he says he is, you can better assess the potential danger of running or playing that object. Obviously the infrastructure necessary for massive digital certification on the Internet has yet to be built, and the trusted entities who will do the certifying have yet to be identified. So for the time being, be careful out there!

If you're interested in the S/MIME standard, you can join the S/MIME Developer's List by sending e-mail to smime-dev-request@rsa.com. Include your company, product, and contact information in the body.

Another secure MIME standard was proposed in October 1995 in RFC 1847 (available on the CD). Two MIME media types were defined: multipart/signed and multipart/encrypted.

The multipart/signed media type defines a method for normal ASCII text to be digitally signed so that the receiver can verify that it originated with the sender and wasn't altered in transmission. The message text can be read in the clear. The RFC defines the format of control information used by the receiver to verify the signature.

The multipart/encrypted media type defines a method for encrypting a message so that it can't be read in the clear. The message is coded as application/octet-stream, and control information specifies how the receiver can decode the message.

It remains to be seen whether these new MIME media types are incorporated into the Web, but it's likely that if a secure e-mail standard emerges, it will be adapted for use by the Web.

Content Negotiation Based on MIME

As discussed earlier in this chapter, each time your browser requests a document from a Web server, it sends a message to the server informing it of the MIME types it understands. This allows you to do some fancy CGI scripting and deliver alternative versions of MIME objects to your users.

How It's Supposed to Work

The designers of the HTML standard envisioned a heterogeneous Web in which servers and browsers would interact to decide which of several alternate versions of a document best fits the needs of the user. They imagined that an information provider would produce documents in multiple versions, for example, plain text, HTML, PostScript, SGML, LaTex, and so on. Or perhaps the document is available in alternative languages-English, French, German, and so on. By having many alternatives, and by receiving information from the browser of the accepted and desired types of documents, the server can decide which alternative is the best fit and send that. Thus, a client is much more likely to get a satisfactory result.

If you've used Lynx, the character-mode Web browser developed at the University of Kansas (U.Kansas), you're well aware that we no longer live in a text-only environment. If you don't have a graphics-capable browser, your world is full of [IMAGE] tags, and image maps you can't use. It was supposed to be different.

As presented in the specification document available on the CD and at this site, content negotiation is enabled by the following three parameters that would be communicated by the browser to the server:

q The degradation (quality) factor between 0 and 1. If omitted, 1 is assumed. This indicates the desirability of various possible alternative versions of an object. For example, you may prefer PostScript over HTML, and indicate this by assigning a lower quality factor to HTML than to PostScript when you make your request.
mxb The maximum size of message (in bytes) that will cause the value to the reader to become zero, even if the file is readily available from the server. How many times have you clicked a link and discovered that the thing you've asked for is huge? Although downloading a huge file over a modem connection may be just what you want to do sometimes, more often there's a threshold of object size beyond which you don't often go. You can indicate this to the server using this parameter.
mxs The maximum delay (in seconds) that, even for a very small message with no length-related penalty, will cause the value to the reader to become zero. Heavily loaded servers (such as Netscape's after a beta release) can take forever to deliver even trivial files. This parameter indicates how impatient you are to get on with your life and not wait on the underpowered PC in Joe's Garage to come across with the file.

To see how these factors can interact, suppose that for a Web project you're doing you want to find a video (public domain, of course) of a nice spring day in a forest. You probably have specific parameters in mind-it should be in QuickTime, although you can accept Microsoft Video and perhaps a few other formats, and it must be under 5M, because most of your users use modems. For an MS Video file, you can accept up to 7M because you've got this dandy magic box converter that reduces the size of the file while converting it to QuickTime. Further, since you're pressed for time, you don't want to spend a lot of time viewing clips on bogged down servers.

You can translate these requirements into a GET statement using the Accept field. This field has two parts, as follows:

The MIME content type/subtype
Optional parameters for the type which are separated from the MIME type by a semicolon

Among the optional parameters are the q, mxs, and mxb keywords. So the request for the forest video might look like this:

GET /somevideo HTTP/1.0
Accept: video/quicktime; q=.9, mxb = 5242880, mxs = 30, video/x-msvideo; q=.1 mxb = 7864320, mxs = 30

Rather than simply request the object somevideo and take your chances that it fits your requirements, append an Accept field to the request to narrow the range of choices the server has in filling the request.

By specifying a q, or quality, value of .9 for QuickTime, and .1 for MS Video, you indicate that you'd like to have a QuickTime video if at all possible, but you'll accept an alternate format.

You tell the server to not even think of sending a QuickTime file that's larger than 5M or an MS Video file larger than 7M. You simply don't have time to spend viewing files that are too large.

Finally, you state that if the server can't come up with the goods within 30 seconds of the request, forget it.

That's the way content negotiation is supposed to work. The client specifies the request in a way that guides the server's decisions about what to provide. By setting the various parameters, the client indicates the appropriateness of the various responses. The server interprets the request without needing to resort to external programming, such as a CGI script.

Sounds like a rich, highly interactive world, doesn't it? Too bad it's not our world. In the world we live in, things aren't so simple. Although the Apache and W3 (CERN) servers do support negotiation, there's hardly a browser out there that does. And without at least two participants, you can't have a conversation.

However, as you'll see in the next section, with some luck and a bit of CGI programming, you can create your own form of content negotiation.

How to Make MIME Work with CGI

As mentioned earlier, during each request for a document, the browser sends a list of MIME types it can accept. The server captures this information and makes it available to CGI programs. So you should be able to use this information to do your own content negotiation, right? Well, there's a problem here, and its name is expediency.

Because the list of MIME types a browser can accept can be quite long, many browsers abbreviate the listing by sending a list of important types followed by a wild card-*/*. The wild card means, send anything you've got. I suppose the reasoning goes, Why bore the server with all these MIME types when it's not going to do anything with them anyway? To speed up the process, suppress all but the most important types.

To make use of content negotiation using a CGI script, you must first determine what your target browser is sending the server. If it's sending a wild card, you're out of luck. But if it sends the MIME type you're interested in exploiting, you're in business. The Perl script in listing 10.1 displays the MIME types sent by your browser.

Listing 10.1 MIMETEST.CGI Reporting the MIME Types Your Browser Accepts
#! /usr/local/bin/perl

# print out a MIME header so the server knows
#this is an HTML document
print qq|Content-type: text/html;\n\n|;

# print out standard HTML beginning of document
print qq|<html><head><title>MIME Test</title></head>\n|;
print qq|<body>\n|;

# print the environmental variable
print qq|<h1>Your browser accepts: $ENV{'HTTP_ACCEPT'}</h1>\n|;

# close the document
print qq|</body></html>\n|;

This script makes use of an environmental variable that's set by UNIX Web servers. (Windows-based servers make these variables available to scripts in a different manner. See your server's manual for information.) Many environmental variables are set, including SERVER_SOFTWARE, SERVER_NAME, PATH_INFO, QUERY_STRING, and SCRIPT_NAME. The variable that's most important for MIME processing is HTTP_ACCEPT. This variable contains all the MIME types that the browser has passed to the server, separated by commas.

See  "Environmental Variables: Information for the Taking," p. xxx, for more information on using environmental variables in a CGI script.

Unfortunately, when you run this script, you'll discover that Netscape is among the browsers that send a few image types and a wild card. Netcom's NetCruiser doesn't even send a wild card, since you can't add new MIME types to its repertoire. Microsoft's Internet Explorer has it backward-it sends the wild card, followed by a handful of media types. SPRY Mosaic, on the other hand, reports all MIME types to the server.

If you run the MIMETEST script using Netscape, you see a display similar to figure 10.6.

Fig. 10.6

The MIMETEST script shows that Netscape uses the wild-card MIME type.

As you can see, not much information is here. Netscape uses the wild card to indicate that any MIME type is fine. While this speeds up the requesting of documents (and we know that Netscape will do almost anything for speed), it deprives the server, and you, the CGI programmer, from valuable information about the capabilities of the browser. You have no idea whether this client can accept audio files, QuickTime movies, or VRML.

Other browsers provide more information on MIME types. For example, figure 10.7 shows what happens when you run the MIMETEST script using SPRY Mosaic:

Fig. 10.7

The MIMETEST script shows that SPRY Mosaic reports all MIME types.

For purposes of argument, say that you have two formats of an image-a GIF and a JPEG. You'd much rather have your server deliver the JPEG because it's much smaller. Since you realize that not every browser out there can render a JPEG, you'd like to check first before sending the image.

In a perfect world, content negotiation such as this would be handled between the browser and the server, without the user ever knowing and without you having to do any programming. In this world, however, you have to write a CGI script that assembles the requested document from parts. In this way, you can put logic in your script to determine whether the browser can handle the MIME type you want to send.

The Perl script in listing 10.2 parses the HTTP_ACCEPT environmental variable and creates a document on the fly, tailored to the specific browser making the request.

Listing 10.2 IMGTEST.CGI Delivering a Different Image Type, Depending on Browser Capabilities
#! /usr/local/bin/perl
# print out a MIME header so the server knows
#this is an HTML document
print qq|Content-type: text/html;\n\n|;

# print out standard HTML beginning of document
print qq|<html><head><title>Mike Ellsworth's Tool Time</title>\n|;
print qq|</head><BODY bgcolor="#ffffff">\n|;

# assign the environmental variable to a Perl variable
# if you want to get fancy, you can add some code to put the parts into
# an array by splitting on the comma between MIME types
$accepts = $ENV{'HTTP_ACCEPT'};

# now test to see if the string jpeg exists in the environment
if ($accepts =~ /jpeg/i) {
     print qq|<img border=0 src="testme.jpg" alt="Yo" align=left></a>|;
} else {
     print qq|<img border=0 src="testme.gif" alt="Yo" align=left></a>|;
}
# read in the base document and print to STDOUT
open(READ, "testimg.txt");
     while (<READ>) {
          print;
     }
close(READ);

# close the document
print qq|</body></html>\n|;

One disadvantage of this approach to content negotiation is that every document must be served by a CGI script. You can set up a master script to serve up all the documents on your site, but then all your URLs could look funny when displayed on your users' current URL line, or when they put them in a bookmark or hotlist.

Another disadvantage is that serving all your Web documents via a CGI script really slows your server's performance. There's considerable overhead in the server recognizing the CGI script, forking a process to run it, and returning the output to the user. Also, you're adding more I/Os by starting the Perl interpreter, opening the CGI script, opening and reading the base document, and so on.

There's not much you can do about this second point, except write tight code or use fast languages. But you can clean up the URLs of documents served by a CGI script by using the POST method to call the script rather than the familiar GET method.

For example, one script I wrote for our site takes a parameter from the user and delivers a document describing services available in the country selected. Using the GET method to call this script makes the URL look like the following:

http://www.nielsen.com/home/countries/country.cgi?country=Great+Britain

It's not a pretty sight. But you could bookmark this URL and return exactly to this page later.

If you use the POST method to call your CGI program, the URL displayed to the user never changes. Say that you want to get two documents-somedoc.htm and someotherdoc.htm. With the POST method, even though the links specify the proper document, your resulting URLs look identical to the user. Your document displays links such as the following:

<FORM ACTION="get.cgi" METHOD="POST">
<INPUT TYPE="hidden" VALUE="doc2get" NAME="somedoc.htm">
<INPUT TYPE="submit" VALUE="A fabulous doc" NAME="item">
</form>

<FORM ACTION="get.cgi" METHOD="POST">
<INPUT TYPE="hidden" VALUE="doc2get" NAME="someotherdoc.htm">
<INPUT TYPE="submit" VALUE="An equally fabulous doc" NAME="item">
</form>

After users view each document, however, it appears to them that the URL is the same:

http://www.yoursite.com/get.cgi
http://www.yoursite.com/get.cgi

This is because when you use the POST method, information is sent to the server using STDIN rather than as part of a the QUERY_STRING. Unlike QUERY_STRING, STDIN information isn't displayed as part of the URL. However, if you were to bookmark either URL and return later, the get.cgi program would be run with no parameters because they weren't saved as part of the URL. As a result, you wouldn't get the document you thought you had bookmarked.

If you instead use the GET method to call your CGI programs, the links can look something like this:

<a href ="get.cgi?doc2get=somedoc.htm>A fabulous doc</a>
<a href ="get.cgi?doc2get=someotherdoc.htm>An equally fabulous doc</a>

When the user views each resulting document, the URL appears as follows:

http://www.yoursite.com/get.cgi?doc2get=somedoc.htm
http://www.yoursite.com/get.cgi?doc2get=someotherdoc.htm

This is perhaps an improvement, but it's hard for humans to understand or to tell a friend about.

Using the PATH_INFO environmental variable instead of the default (QUERY_STRING) makes the URLs look a little better:

http://www.yoursite.com/get/somedoc.htm
http://www.yoursite.com/get/someotherdoc.htm

This assumes that your CGI program is called get, with no extension. It further assumes that you've established a script alias for the root directory so that CGI scripts can be run from there.

See  "Where CGI Scripts Live," p. xxx, for more information on script aliases. 
See  "Integrating CGI into Your HTML Pages," p. xxx, for a discussion on the differences between the GET and POST access methods. 
See  "Real-Time HTML," p. xxx, for more information on serving documents via CGI scripts.

CGI MIME Headers vs. Server-Generated Headers

The CGI scripts shown as examples in this chapter all begin the output with a MIME header, which should always be followed by a blank line. When you do this, the header is read and interpreted by the server rather than passed directly to the browser. Because a process or document can generate three types of headers, the server must parse the header to determine its type. The three HTML header types are as follows:

Content-type. The MIME type of the document you're returning. This is the header you use in communicating the MIME type of the object. Most commonly it's text/html.
Location. Informs the server that the desired document is available at a different location, perhaps even on a different server. If the argument to this is a URL, the server will tell the browser that the document is available at a different location. If the argument to this is a virtual path, the server retrieves the document specified.
Status. Used to report a status code such as 403 - Forbidden.

Any header not following these forms is passed back to the client, as long as the script name begins with nph-, which stands for non-parsed header. A non-parsed header is, as its name implies, not parsed by the server, but rather sent directly to the client. The server assumes that all relevant header information is contained in the proper format when you use an nph script. However, you must be careful if you decide to use this option, since you'll be responsible for assembling the entire header. There are many possible fields for complete result headers. At a minimum, you must provide the following information:

HTTP version. This generally begins the header and is in the format HTTP/1.0.
Server status. This generally is a three-digit server code followed by text explanation. To indicate that everything is fine, send 200 OK.
Server type. This is the type of server in the form Server: name/version (for example, Server: NCSA/1.4).
MIME Content-type. As previously discussed-for example, Content-type: text/html.

The order of header lines within the HTTP header isn't important. However, as a matter of style, make the MIME fields the last ones, so that the MIME fields and the following document form a valid MIME document. (Always remember that the header is separated from the document or file being sent by a blank line.) Listing 10.3 is an example of using this technique.

Listing 10.3 NPH-TEST.CGI Causing the Server to Send the Header Directly to the Client Without Parsing It
#! /usr/local/bin/perl

# print out an entire HTTP header
print qq|HTTP/1.0 200 OK\n|;
print qq|Server: NCSA/1.4\n|;
print qq|Content-type: text/html\n\n|;

# print out standard HTML beginning of document
print qq|<html><head><title>A test page</title></title></head>\n|;
print qq|<BODY bgcolor="#ffffff">\n|;

# print the HTML document
print qq|<h1>This is a test, only a test.</h1>\n|;

# close the document
print qq|</body></html>\n|;

For a good discussion of headers and CGI, go to this site.

Previous Chapter <-- Table of Contents --> Next Chapter

QUE Home Page

For technical support for our books and software contact support@mcp.com