Chapter 9

Using Files


CONTENTS

If you've read the previous chapters and have executed some of the programs, then you already know that a file is a series of bytes stored on a disk instead of inside the computer's memory. A file is good for long-term storage of information. Information in the computer's memory is lost when the computer is turned off. Information on a disk, however, is persistent. It will be there when the computer is turned back on.

Back in Chapter 1 "Getting Your Feet Wet," you saw how to create a file using the edit program that comes with Windows 95 and Windows NT. In this chapter, you'll see how to manipulate files with Perl.

There are four basic operations that you can do with files. You can open them, read from them, write to them, and close them. Opening a file creates a connection between your program and the location on the disk where the file is stored. Closing a file shuts down that connection.

Every file has a unique fully qualified name so that it can't be confused with other files. The fully qualified name includes the name of the disk, the directory, and the file name. Files in different directories can have the same name because the operating system considers the directory name to be a part of the file name. Here are some fully qualified file names:


c:/windows/win95.txt

c:/windows/command/scandisk.ini

c:/a_long_directory_name/a_long_subdirectory_name/a_long_file_name.doc


Caution
You may be curious to know if spaces can be used inside file names. Yes, they can. But, if you use spaces, you need to surround the file name with quotes when referring to it from a DOS or UNIX command line.

Note
It is very important that you check for errors when dealing with files. To simplify the examples in this chapter, little error checking will be used in the example. Instead, error checking information will be discussed in Chapter 13, "Handling Errors and Signals."

Some Files Are Standard

In an effort to make programs more uniform, there are three connections that always exist when your program starts. These are STDIN, STDOUT, and STDERR. Actually, these names are file handles. File handles are variables used to manipulate files. Just like you need to grab the handle of a hot pot before you can pick it up, you need a file handle before you can use a file. Table 9.1 describes the three file handles.

Table 9.1  The Standard File Handles

NameDescription
STDIN Reads program input. Typically this is the computer's keyboard.
STDOUT Displays program output. This is usually the computer's monitor.
STDERR Displays program errors. Most of the time, it is equivalent to STDOUT, which means the error messages will be displayed on the computer's monitor.

You've been using the STDOUT file handle without knowing it for every print() statement in this book. The print() function uses STDOUT as the default if no other file handle is specified. Later in this chapter, in the "Examples: Printing Revisited" section, you will see how to send output to a file instead of to the monitor.

Example: Using STDIN

Reading a line of input from the standard input, STDIN, is one of the easiest things that you can do in Perl. This following three-line program will read a line from the keyboard and then display it. This will continue until you press Ctrl+Z on DOS systems or Ctrl-D on UNIX systems.


Listing 9.1  09LST01.PL-Read from Standard Input Until an End-of-File Character Is Found

while (<STDIN>) {

    print();

}


The <> characters, when used together, are called the diamond operator. It tells Perl to read a line of input from the file handle inside the operator. In this case, STDIN. Later, you'll use the diamond operator to read from other file handles.

In this example, the diamond operator assigned the value of the input string to $_ . Then, the print() function was called with no parameters, which tells print() to use $_ as the default parameter. Using the $_ variable can save a lot of typing, but I'll let you decide which is more readable. Here is the same program without using $_.


while ($inputLine = <STDIN>) {

    print($inputLine);

}

When you pressed Ctrl+Z or Ctrl+D, you told Perl that the input file was finished. This caused the diamond operator to return the undefined value which Perl equates to false and caused the while loop to end. In DOS (and therefore in all of the flavors of Windows), 26-the value of Ctrl+Z-is considered to be the end-of-file indicator. As DOS reads or writes a file, it monitors the data stream and when a value of 26 is encountered the file is closed. UNIX does the same thing when a value of 4-the value of Ctrl+D-is read.

Tip
When a file is read using the diamond operator, the newline character that ends the line is kept as part of the input string. Frequently, you'll see the chop() function used to remove the newline. For instance, chop($inputLine = <INPUT_FILE>);. This statement reads a line from the input file, assigns its value to $inputLine and then removes that last character from $inputLine-which is almost guaranteed to be a newline character. If you fear that the last character is not a newline, use the chomp() function instead.

Example: Using Redirection to Change STDIN and STDOUT

DOS and UNIX let you change the standard input from being the keyboard to being a file by changing the command line that you use to execute Perl programs. Until now, you probably used a command line similar to:


perl -w 09lst01.pl

In the previous example, Perl read the keyboard to get the standard input. But, if there was a way to tell Perl to use the file 09LST01.PL as the standard input, you could have the program print itself. Pretty neat, huh? Well, it turns out that you can change the standard input. It's done this way:


perl -w 09lst01.pl < 09lst01.pl

The < character is used to redirect the standard input to the 09LST01.PL file. You now have a program that duplicates the functionality of the DOS type command. And it only took three lines of Perl code!

You can redirect standard output to a file using the > character. So, if you wanted a copy of 09LST01.PL to be sent to OUTPUT.LOG, you could use this command line:


perl -w 09lst01.pl <09lst01.pl >output.log

Keep this use of the < and > characters in mind. You'll be using them again shortly when we talk about the open() function. The < character will signify that files should be opened for input and the > will be used to signify an output file. But first, let's continue talking about accessing files listed on the command line.

Example: Using the Diamond Operator (<>)

If no file handle is used with the diamond operator, Perl will examine the @ARGV special variable. If @ARGV has no elements, then the diamond operator will read from STDIN-either from the keyboard or from a redirected file. So, if you wanted to display the contents of more than one file, you could use the program shown in Listing 9.2.


Listing 9.2  09LST02.PL-Read from Multiple Files or from STDIN

while (<>) {

    print();

}


The command line to run the program might look like this:


perl -w 09lst02.pl 09lst01.pl 09lst02.pl

And the output would be:


while (<STDIN>) {

    print();

}

while (<>) {

    print();

}

Perl will create the @ARGV array from the command line. Each file name on the command line-after the program name-will be added to the @ARGV array as an element. When the program runs the diamond operator starts reading from the file name in the first element of the array. When that entire file has been read, the next file is read from, and so on, until all of the elements have been used. When the last file has be finished, the while loop will end.

Using the diamond operator to iterate over a list of file names is very handy. You can use it in the middle of your program by explicitly assigning a list of file names to the @ARGV array. Listing 9.3 shows what this might look like in a program.


Listing 9.3  09LST03.PL-Read from Multiple Files Using the @ARGV Array

@ARGV = ("09lst01.pl", "09lst02.pl");

while (<>) {

    print();

}


This program displays:


while (<STDIN>) {

    print();

}

while (<>) {

    print();

}

Next, we will take a look at the ways that Perl lets you test files, and following that, the functions that can be used with files.

File Test Operators

Perl has many operators that you can use to test different aspects of a file. For example, you can use the -e operator to ensure that a file exists before deleting it. Or, you can check that a file can be written to before appending to it. By checking the feasibility of the impending file operation, you can reduce the number of errors that your program will encounter. Table 9.2 shows a complete list of the operators used to test files.

Table 9.2  Perl's File Test Operators

OperatorDescription
-A OPERAND Returns the access age of OPERAND when the program started.
-b OPERAND Tests if OPERAND is a block device.
-B OPERAND Tests if OPERAND is a binary file. If OPERAND is a file handle, then the current buffer is examined, instead of the file itself.
-c OPERAND Tests if OPERAND is a character device.
-C OPERAND Returns the inode change age of OPERAND when the program started.
-d OPERAND Tests if OPERAND is a directory.
-e OPERAND Tests if OPERAND exists.
-f OPERAND Tests if OPERAND is a regular file as opposed to a directory, symbolic link or other type of file.
-g OPERAND Tests if OPERAND has the setgid bit set.
-k OPERAND Tests if OPERAND has the sticky bit set.
-l OPERAND Tests if OPERAND is a symbolic link. Under DOS, this operator always will return false.
-M OPERAND Returns the age of OPERAND in days when the program started.
-o OPERAND Tests if OPERAND is owned by the effective uid. Under DOS, it always returns true.
-O OPERAND Tests if OPERAND is owned by the read uid/gid. Under DOS, it always returns true.
-p OPERAND Tests if OPERAND is a named pipe.
-r OPERAND Tests if OPERAND can be read from.
-R OPERAND Tests if OPERAND can be read from by the real uid/gid. Under DOS, it is identical to -r.
-s OPERAND Returns the size of OPERAND in bytes. Therefore, it returns true if OPERAND is non-zero.
-S OPERAND Tests if OPERAND is a socket.
-t OPERAND Tests if OPERAND is opened to a tty.
-T OPERAND Tests if OPERAND is a text file. If OPERAND is a file handle, then the current buffer is examined, instead of the file itself.
-u OPERAND Tests if OPERAND has the setuid bit set.
-w OPERAND Tests if OPERAND can be written to.
-W OPERAND Tests if OPERAND can be written to by the real uid/gid. Under DOS, it is identical to -w.
-x OPERAND Tests if OPERAND can be executed.
-X OPERAND Tests if OPERAND can be executed by the real uid/gid. Under DOS, it is identical to -x.
-z OPERAND Tests if OPERAND size is zero.

Note
If the OPERAND is not specified in the file test, the $ variable will be used instead.

The operand used by the file tests can be either a file handle or a file name. The file tests work by internally calling the operating system to determine information about the file in question. The operators will evaluate to true if the test succeeds and false if it does not.

If you need to perform two or more tests on the same file, you use the special underscore (_) file handle. This tells Perl to use the file information for the last system query and saves time. However, the underscore file handle does have some caveats. It does not work with the -t operator. In addition, the lstat() function and -l test will leave the system buffer filled with information about a symbolic link, not a real file.

The -T and -B file tests will examine the first block or so of the file. If more than 10 percent of the bytes are non-characters or if a null byte is encountered, then the file is considered a binary file. Binary files are normally data files, as opposed to text or human-readable files. If you need to work with binary files, be sure to use the binmode() file function, which is described in the section, "Example: Binary Files," later in this chapter.

Example: Using File Tests

For our first example with file tests, let's examine a list of files from the command line and determine if each is a regular file or a special file.

Start a foreach loop that looks at the command line array. Each element in the array is assigned to the default loop variable $_.
Print the file name contained in
$_.
Print a message indicating the type of file by checking the evaluation of the
-f operator.

Listing 9.4  09LST04.PL-Using the -f Operator to Find Regular Files Inside a foreach Loop

foreach (@ARGV) {

    print;

    print((-f) ? " -REGULAR\n" : " -SPECIAL\n")

}


When this program is run using the following command line:


perl -w 09lst01.pl \perl5 perl.exe \windows

the following is displayed:


09lst01.pl -REGULAR

\perl5 -SPECIAL

perl.exe -REGULAR

\windows -SPECIAL

Each of the directories listed on the command line were recognized as special files. If you want to ignore all special files in the command line, you do so like this:

Start a foreach loop that looks at the command line array.
If the current file is special, then skip it and go on to the next iteration of the
foreach loop.
Print the current file name that is contained in
$_.
Print a message indicating the type of file.

Listing 9.5  09LST05.PL-Using the -f Operator to Find Regular Files Inside a foreach Loop

foreach (@ARGV) {

    next unless -f;    # ignore all non-normal files.

    print;

    print((-f) ? " -REGULAR\n" : " -SPECIAL\n")

}


When this program is run using the following command line:


perl -w 09lst01.pl \perl perl.exe \windows

the following is displayed:


09lst01.pl -REGULAR

perl.exe -REGULAR

Notice that only the regular file names are displayed. The two directories on the command line were ignored.

As mentioned above, you can use the underscore file handle to make two tests in a row on the same file so that your program can execute faster and use less system resources. This could be important if your application is time critical or makes many repeated tests on a large number of files.

Start a foreach loop that looks at the command line array.
If the current file is special, then skip it and go on to the next iteration of the
foreach loop.
Determine the number of bytes in the file with the
-s operator using the underscore file handle so that a second operating system call is not needed.
Print a message indicating the name and size of the file.

Listing 9.6  09LST06.PL-Finding the Size in Bytes of Regular Files Listed on the Command Line

foreach (@ARGV) {

    next unless -f;

    $fileSize = -s _;

    print("$_ is $fileSize bytes long.\n");

}


When this program is run using the following command line:


perl -w 09lst06.pl \perl5 09lst01.pl \windows perl.exe

the following is displayed:


09lst01.pl is 36 bytes long.

perl.exe is 61952 bytes long.

Tip
Don't get the underscore file handle confused with the $_ special variable. The underscore file handle tells Perl to use the file information from the last system call and the $ variable is used as the default parameter for a variety of functions.

File Functions

Table 9.3  Perl's File Functions

FunctionDescription
binmode(FILE_HANDLE) This function puts FILE_HANDLE into a binary mode. For more information, see the section, "Example: Binary Files," later in this chapter.
chdir(DIR_NAME) Causes your program to use DIR_NAME as the current directory. It will return true if the change was successful, false if not.
chmod(MODE, FILE_LIST) This UNIX-based function changes the permissions for a list of files. A count of the number of files whose permissions was changed is returned. There is no DOS equivalent for this function.
chown(UID, GID, FILE_LIST) This UNIX-based function changes the owner and group for a list of files. A count of the number of files whose ownership was changed is returned. There is no DOS equivalent for this function.
close(FILE_HANDLE) Closes the connection between your program and the file opened with FILE_HANDLE.
closedir(DIR_HANDLE) Closes the connection between your program and the directory opened with DIR_HANDLE.
eof(FILE_HANDLE) Returns true if the next read on FILE_HANDLE will result in hitting the end of the file or if the file is not open. If FILE_HANDLE is not specified the status of the last file read is returned. All input functions return the undefined value when the end of file is reached, so you'll almost never need to use eof().
fcntl(FILE_HANDLE, Implements the fcntl() function which lets
FUncTION, SCALAR) you perform various file control operations. Its use is beyond the scope of this book.
fileno(FILE_HANDLE) Returns the file descriptor for the specified FILE_HANDLE.
flock(FILEHANDLE, OPERATION) This function will place a lock on a file so that multiple users or programs can't simultaneously use it. The flock() function is beyond the scope of this book.
getc(FILE_HANDLE) Reads the next character from FILE_HANDLE. If FILE_HANDLE is not specified, a character will be read from STDIN.
glob(EXPRESSION) Returns a list of files that match the specification of EXPRESSION, which can contain wildcards. For instance, glob("*.pl") will return a list of all Perl program files in the current directory.
ioctl(FILE_HANDLE, Implements the ioctl() function which lets
FUncTION, SCALAR) you perform various file control operations. Its use is beyond the scope of this book. For more in-depth discussion of this function see Que's Special Edition Using Perl for Web Programming.
link(OLD_FILE_NAME, This UNIX-based function creates a new
NEW_FILE_NAME) file name that is linked to the old file name. It returns true for success and false for failure. There is no DOS equivalent for this function.
lstat(FILE_HANDLE_OR_ Returns file statistics in a 13-element array.
FILE_NAME) lstat() is identical to stat() except that it can also return information about symbolic links. See the section,"Example: Getting File Statistics," for more information.
mkdir(DIR_NAME, MODE) Creates a directory named DIR_NAME. If you try to create a subdirectory, the parent must already exist. This function returns false if the directory can't be created. The special variable $! is assigned the error message.
open(FILE_HANDLE, EXPRESSION) Creates a link between FILE_HANDLE and a file specified by EXPRESSION. See the section, "Example: Opening a File," for more information.
opendir(DIR_HANDLE, DIR_NAME) Creates a link between DIR_HANDLE and the directory specified by DIR_NAME. opendir() returns true if successful, false otherwise.
pipe(READ_HANDLE, Opens a pair of connected pipes like the
WRITE_HANDLE) corresponding system call. Its use is beyond the scope of this book. For more on this function see Que's Special Edition Using Perl for Web Programming.
print FILE_HANDLE (LIST) Sends a list of strings to FILE_HANDLE. If FILE_HANDLE is not specified, then STDOUT is used. See the section, "Example: Printing Revisited," for more information.
printf FILE_HANDLE Sends a list of strings in a format specified by
(FORMAT, LIST) FORMAT to FILE_HANDLE. If FILE_HANDLE is not specified, then STDOUT is used. See the section, "Example: Printing Revisited," for more information.
read(FILE_HANDLE, BUFFER, Reads bytes from FILE_HANDLE starting at
LENGTH,LENGTH OFFSET) OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes read or the undefined value.
readdir(DIR_HANDLE) Returns the next directory entry from DIR_HANDLE when used in a scalar context. If used in an array context, all of the file entries in DIR_HANDLE will be returned in a list. If there are no more entries to return, the undefined value or a null list will be returned depending on the context.
readlink(EXPRESSION) This UNIX-based function returns that value of a symbolic link. If an error occurs, the undefined value is returned and the special variable $! is assigned the error message. The $_ special variable is used if EXPRESSION is not specified.
rename(OLD_FILE_NAME, Changes the name of a file. You can use this
NEW_FILE_NAME) function to change the directory where a file resides, but not the disk drive or volume.
rewinddir(DIR_HANDLE) Resets DIR_HANDLE so that the next readdir() starts at the beginning of the directory.
rmdir(DIR_NAME) Deletes an empty directory. If the directory can be deleted it returns false and $! is assigned the error message. The $ special variable is used if DIR_NAME is not specified.
seek(FILE_HANDLE, POSITION, Moves to POSITION in the file connected to
WHEncE) FILE_HANDLE. The WHEncE parameter determines if POSITION is an offset from the beginning of the file (WHEncE=0), the current position in the file (WHEncE=1), or the end of the file (WHEncE=2).
seekdir(DIR_HANDLE, POSITION) Sets the current position for readdir(). POSITION must be a value returned by the telldir() function.
select(FILE_HANDLE) Sets the default FILE_HANDLE for the write() and print() functions. It returns the currently selected file handle so that you may restore it if needed. You can see the section, "Example: Printing Revisited," to see this function in action.
sprintf(FORMAT, LIST) Returns a string whose format is specified by FORMAT.
stat(FILE_HANDLE_OR_ Returns file statistics in a 13-element array.
FILE_NAME) See the section, "Example: Getting File Statistics," for more information.
symlink(OLD_FILE_NAME, This UNIX-based function creates a new
NEW_FILE_NAME) file name symbolically linked to the old file name. It returns false if the NEW_FILE_NAME cannot be created.
sysread(FILE_HANDLE, BUFFER, Reads LENGTH bytes from FILE_HANDLE starting
LENGTH, OFFSET) at OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes read or the undefined value.
syswrite(FILE_HANDLE, BUFFER, Writes LENGTH bytes from FILE_HANDLE starting
LENGTH, OFFSET) at OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes written or the undefined value.
tell(FILE_HANDLE) Returns the current file position for FILE_HANDLE. If FILE_HANDLE is not specified, the file position for the last file read is returned.
telldir(DIR_HANDLE) Returns the current position for DIR_HANDLE. The return value may be passed to seekdir() to access a particular location in a directory.
truncate(FILE_HANDLE, LENGTH) Truncates the file opened on FILE_HANDLE to be LENGTH bytes long.
unlink(FILE_LIST) Deletes a list of files. If FILE_LIST is not specified, then $ will be used. It returns the number of files successfully deleted. Therefore, it returns false or 0 if no files were deleted.
utime(FILE_LIST) This UNIX-based function changes the access and modification times on each file in FILE_LIST.
write(FILE_HANDLE) Writes a formatted record to FILE_HANDLE. See Chapter 11, "Creating Reports," for more information.

Note
The UNIX-based functions will be discussed further in Chapter 18, "Using Internet Protocols."
UNIX-based implementations of Perl have several database functions available to them. For example, dbmopen() and dbmclose(). These functions are beyond the scope of this book.

Example: Opening Files

The open() function is used to open a file and create a connection to it called a file handle. The basic open() function call looks like this:


open(FILE_HANDLE);

The FILE_HANDLE parameter in this version of open() is the name for the new file handle. It is also the name of the scalar variable that holds the file name that you would like to open for input. For example:

Assign the file name, FIXED.DAT, to the $INPUT_FILE variable. All capital letters are used for the variable name to indicate that it is also the name of the file handle.
Open the file for reading.
Read the entire file into
@array. Each line of the file becomes a single element of the array.
Close the file.
Use a
foreach loop to look at each element of @array.
Print
$_, the loop variable, which contains one of the elements of @array.

Listing 9.7  09LST07.PL-How to Open a File for Input

$INPUT_FILE = "fixed.dat";



open(INPUT_FILE);

@array = <INPUT_FILE>;

close(INPUT_FILE);



foreach (@array) {

    print();

}


This program displays:


1212Jan       Jaspree             Painter

3453Kelly     Horton              Jockey

It is considered good programming practice to close any connections that are made with the open() function as soon as possible. While not strictly needed, it does ensure that all temporary buffers and caches are written to the hard disk in case of a power failure or other catastrophic failure.

Note
DOS-and by extension, Windows-limits the number of files that you can have open at any given time. Typically, you can have from 20 to 50 files open. Normally, this is plenty. If you need to open more files, please see your DOS documentation.

The open() function has many variations to let you access files in different ways. Table 9.4 shows all of the different methods used to open a file.

Table 9.4  The Different Ways to Open a File

Open StatementDescription
open(FILE_HANDLE); Opens the file named in $FILE_HANDLE and connect to it using FILE_HANDLE as the file handle. The file will be opened for input only.
open(FILE_HANDLE, FILENAME.EXT); Opens the file called FILENAME.EXT for input using FILE_HANDLE as the file handle.
open(FILE_HANDLE, +<FILENAME.EXT); Opens FILENAME.EXT for input using FILE_HANDLE as the file handle.
open(FILE_HANDLE, >FILENAME.EXT); Opens FILENAME.EXT for output using FILE_HANDLE as the file handle.
open(FILE_HANDLE, -); Opens standard input.
open(FILE_HANDLE, >-); Opens standard output.
open(FILE_HANDLE, >>FILENAME.EXT); Opens FILENAME.EXT for appending using FILE_HANDLE as the file handle.
open(FILE_HANDLE, +<FILENAME.EXT); Opens FILENAME.EXT for both input and output using FILE_HANDLE as the file handle.
open(FILE_HANDLE, +>FILENAME.EXT); Opens FILENAME.EXT for both input and output using FILE_HANDLE as the file handle.
open(FILE_HANDLE, +>>FILENAME.EXT); Opens FILENAME.EXT for both input and output using FILE_HANDLE as the file handle.
open(FILE_HANDLE, | PROGRAM) Sends the output printed to FILE_HANDLE to another program.
open(FILE_HANDLE, PROGRAM |) Reads the output from another program using FILE_HANDLE.

Note
I am currently researching the differences between +<, +>, and +>>. The research should be available by 12/1/97 as a link from
http:\\www.mtolive.com\pbe\index.html.

For information about handling failures while opening files, see Chapter 13, "Handling Errors and Signals."

By prefixing the file name with a > character you open the file for output. This next example opens a file that will hold a log of messages.

Call the open() function to open the MESSAGE.LOG file for writing with LOGFILE as the file handle. If the open was successful, a true value will be returned and the statement block will be executed.
Send the first message to the
MESSAGE.LOG file using the print() function. Notice that an alternate method is being used to call print().
Send the second message to the
MESSAGE.LOG file.
Close the file.

if (open(LOGFILE, ">message.log")) {

    print LOGFILE ("This is message number 1.\n");

    print LOGFILE ("This is message number 2.\n");

    close(LOGFILE);

}

This program displays nothing. Instead, the output from the print() function is sent directly to the MESSAGE.LOG file using the connection established by the open() function.

In this example, the print() function uses the first parameter as a file handle and the second parameter as a list of things to print. You can find more information about printing in the section, "Example: Printing Revisited," later in this chapter.

If you needed to add something to the end of the MESSAGE.LOG file, you use >> as the file name prefix when opening the file. For example:

Call the open() function to open the MESSAGE.LOG file for appending with LOGFILE as the file handle. If the file does not exist, it will be created; otherwise, anything printed to LOGFILE will be added to the end of the file.
Send a message to the
MESSAGE.LOG file.
Send a message to the
MESSAGE.LOG file.
Close the file.

if (open(LOGFILE, ">>message.log")) {

    print LOGFILE ("This is message number 3.\n");

    print LOGFILE ("This is message number 4.\n");

    close(LOGFILE);

}

Now, when MESSAGE.LOG is viewed, it contains the following lines:


This is message number 1.

This is message number 2.

This is message number 3.

This is message number 4.

Example: Binary Files

When you need to work with data files, you will need to know what binary mode is. There are two major differences between binary mode and text mode:

Note
The examples in this section relate to the DOS operating system.

In order to demonstrate these differences, we'll use a data file called BINARY.DAT with the following contents:


01

02

03

First, we'll read the file in the default text mode.

Initialize a buffer variable. Both read() and sysread() need their buffer variables to be initialized before the function call is executed.
Open the
BINARY.DAT file for reading.
Read the first 20 characters of the file using the
read() function.
Close the file.
Create an array out of the characters in the
$buffer variable and iterate over that array using a foreach loop.
Print the value of the current array element in hexadecimal format.
Print a newline character. The current array element is a newline character.

Listing 9.8  09LST08.PL-Reading a File to Show Text Mode Line Endings

$buffer = "";



open(FILE, ">binary.dat");

read(FILE, $buffer, 20, 0);

close(FILE);



foreach (split(//, $buffer)) {

    printf("%02x ", ord($_));

    print "\n" if $_ eq "\n";

}


This program displays:


30 31 0a

30 32 0a

30 33 0a

This example does a couple of things that haven't been seen yet in this book. The Read() function is used as an alternative to the line-by-line input done with the diamond operator. It will read a specified number of bytes from the input file and assign them to a buffer variable. The fourth parameter specifies an offset at which to start reading. In this example, we started at the beginning of the file.

The split() function in the foreach loop breaks a string into pieces and places those pieces into an array. The double slashes indicate that each character in the string should be an element of the new array.

For more information about the split() function, see Chapter 5 "Functions," and Chapter 10, "Regular Expressions."

Once the array of characters has been created, the foreach loop iterates over the array. The printf() statement converts the ordinal value of the character into hexadecimal before displaying it. The ordinal value of a character is the value of the ASCII representation of the character. For example, the ordinal value of '0' is 0x30 or 48.

The next line, the print statement, forces the output onto a new line if the current character is a newline character. This was done simply to make the output display look a little like the input file.

For more information about the printf() function, see the section, "Example: Printing Revisited," later in this chapter.

Now, let's read the file in binary mode and see how the output is changed.

Initialize a buffer variable.
Open the
BINARY.DAT file for reading.
Change the mode to binary.
Read the first 20 characters of the file using the
read() function.
Close the file.
Create an array out of the characters in the
$buffer variable and iterate over that array using a foreach loop.
Print the value of the current array element in hexadecimal format.
Print a newline character. The current array element is a newline character.

Listing 9.9  09LST09.PL-Reading a File to Show Binary Mode Line Endings

$buffer = "";



open(FILE, "<binary.dat");

binmode(FILE); 

read(FILE, $buffer, 20, 0);

close(FILE);



foreach (split(//, $buffer)) {

    printf("%02x ", ord($_));

    print "\n" if $_ eq "\n";

}


This program displays:


30 31 0d 0a

30 32 0d 0a

30 33 0d 0a

When the file is read in binary mode, you can see that there are really two characters at the end of every line-the linefeed and newline characters.

Our next example will look at the end-of-file character in both text and binary modes. We'll use a data file called EOF.DAT with the following contents:


01

02

<end of file character>03

Since the end-of-file character is a non-printing character, it can't be shown directly. In the spot <end of file character> above is really the value 26.

Here is the program that you saw previously read the BINARY.DAT file, only this time, it will read EOF.DAT.

Initialize a buffer variable.
Open the
BINARY.DAT file for reading.
Read the first 20 characters of the file using the
read() function.
Close the file.
Create an array of out of the characters in the
$buffer variable and iterate over that array using a foreach loop.
Print the value of the current array element in hexadecimal format.
Print a newline character. The current array element is a newline character.

Listing 9.10  09LST10.PL-Reading a File to Show the Text Mode End-of-File Character

$buffer = "";



open(FILE, "<eof.dat");

read(FILE, $buffer, 20, 0);

close(FILE);



foreach (split(//, $buffer)) {

    printf("%02x ", ord($_));

    print "\n" if $_ eq "\n";

}


This program displays:


30 31 0d 0a

30 32 0d 0a

The end-of-file character prevents the read() function from reading the third line. If the file is placed into binary mode, the whole file can be read.

Initialize a buffer variable.
Open the
BINARY.DAT file for reading.
Change the mode to binary.
Read the first 20 characters of the file using the
read() function.
Close the file.
Create an array of out of the characters in the
$buffer variable and iterate over that array using a foreach loop.
Print the value of the current array element in hexadecimal format.
Print a newline character. The current array element is a newline character.

Listing 9.11  09LST11.PL-Reading a File to Show that Binary Mode Does Not Recognize the End-of-File Character

$buffer = "";



open(FILE, "<eof.dat");

binmode(FILE);

read(FILE, $buffer, 20, 0);

close(FILE);



foreach (split(//, $buffer)) {

    printf("%02x ", ord($_));

    print "\n" if $_ eq "\n";

}


This program displays:


30 31 0d 0a

30 32 0d 0a

1a 30 33 0d 0a

With binary mode on, bytes with a value of 26 have no special meaning and the third line can be read. You see that the value 26-33 in hexadecimal-was printed along with the rest of the characters.

Example: Reading into a Hash

You've already seen that you can read a file directly into a regular array using this syntax:


@array = <FILE_HANDLE>;

Unfortunately, there is no similar way to read an entire file into a hash. But, it's still pretty easy to do. The following example will use the line number as the hash key for each line of a file.

Open the FIXED.DAT file for reading.
For each line of
FIXED.DAT create a hash element using the record number special variable ($.) as the key and the line of input ($_) as the value.
Close the file.
Iterate over the keys of the hash.
Print each key, value pair.

Listing 9.12  09LST12.PL-Reading a Fixed Length Record with Fixed Length Fields into a Hash

open(FILE, "<fixed.dat");

while (<FILE>) {

    $hash{$.} = $_;

}

close(FILE);



foreach (keys %hash) {

    print("$_: $hash{$_}");

}


This program displays:


1: 1212Jan       Jaspree             Painter

2: 3453Kelly     Horton              Jockey

Example: Getting File Statistics

The file test operators can tell you a lot about a file, but sometimes you need more. In those cases, you use the stat() or lstat() function. The stat() returns file information in a 13-element array. You can pass either a file handle or a file name as the parameter. If the file can't be found or another error occurs, the null list is returned. Listing 9.13 shows how to use the stat() function to find out information about the EOF.DAT file used earlier in the chapter.

Assign the return list from the stat() function to 13 scalar variables.
Print the scalar values.

Listing 9.13  09LST13.PL-Using the stat() Function

($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,

    $atime, $mtime, $ctime, $blksize, $blocks) = stat("eof.dat");



print("dev     = $dev\n");

print("ino     = $ino\n");

print("mode    = $mode\n");

print("nlink   = $nlink\n");

print("uid     = $uid\n");

print("gid     = $gid\n");

print("rdev    = $rdev\n");

print("size    = $size\n");

print("atime   = $atime\n");

print("mtime   = $mtime\n");

print("ctime   = $ctime\n");

print("blksize = $blksize\n");

print("blocks  = $blocks\n");


In the DOS environment, this program displays:


dev     = 2

ino     = 0

mode    = 33206

nlink   = 1

uid     = 0

gid     = 0

rdev    = 2

size    = 13

atime   = 833137200

mtime   = 833195316

ctime   = 833194411

blksize =

blocks  =

Some of this information is specific to the UNIX environment and is beyond the scope of this book. For more information on this topic, see Que's 1994 edition of Using Unix. One interesting piece of information is the $mtime value-the date and time of the last modification made to the file. You can interpret this value by using the following line of code:


($sec, $min, $hr, $day, $month, $year, $day_Of_Week, 

    $julianDate, $dst) = localtime($mtime);

If you are only interested in the modification date, you can use the array slice notation to just grab that value from the 13-element array returned by stat(). For example:


$mtime = (stat("eof.dat"))[9];

Notice that the stat() function is surrounded by parentheses so that the return value is evaluated in an array context. Then the tenth element is assigned to $mtime. You can use this technique whenever a function returns a list.

Example: Using the Directory Functions

Perl has several functions that let you work with directories. You can make a directory with the mkdir() function. You can delete a directory with the rmdir() function. Switching from the current directory to another is done using the chdir() function.

Finding out which files are in a directory is done with the opendir(), readdir(), and closedir() functions. The next example will show you how to create a list of all Perl programs in the current directory-well, at least those files that end with the pl extension.

Open the current directory using DIR as the directory handle.
Read a list of file names using the
readdir() function; extract only those that end in pl; and the sorted list. The sorted list is assigned to the @files array variable.
Close the directory.
Print the file names from the
@files array unless the file is a directory.

Listing 9.14  09LST14.PL-Print All Files in the Current Directory Whose Name Ends in PL

opendir(DIR, ".");

@files = sort(grep(/pl$/, readdir(DIR)));

closedir(DIR);



foreach (@files) {

    print("$_\n") unless -d;

}


For more information about the grep() function, see Chapter 10, "Regular Expressions."

This program will display each file name that ends in pl on a separate line. If you need to know the number of Perl programs, evaluate the @files array in a scalar context. For example:


$num_Perl_Programs = @files;

Tip
For this example, I modified the naming convention used for the variables. I feel that $num_Perl_Programs is easier to read than $numPerlPrograms. No naming convention should be inflexible. Use it as a guideline and break the rules when it seems wise.

Example: Printing Revisited

We've been using the print() function throughout this book without really looking at how it works. Let's remedy that now.

The print() function is used to send output to a file handle. Most of the time, we've been using STDOUT as the file handle. Because STDOUT is the default, we did not need to specify it. The syntax for the print() function is:


print FILE_HANDLE (LIST)

You can see from the syntax that print() is a list operator because it's looking for a list of values to print. If you don't specify a list, then $ will be used. You can change the default file handle by using the select() function. Let's take a look at this:

Open TESTFILE.DAT for output.
Change the default file handle for write and print statements. Notice that the old default handle is returned and saved in the
$oldHandle variable.
This line prints to the default handle which now the
TESTFILE.DAT file.
Change the default file handle back to
STDOUT.
This line prints to
STDOUT.

open(OUTPUT_FILE, ">testfile.dat");

$oldHandle = select(OUTPUT_FILE);

print("This is line 1.\n");

select($oldHandle);

print("This is line 2.\n");

This program displays:


This is line 2.

and creates the TESTFILE.DAT file with a single line in it:


This is line 1.

Perl also has the printf() function which lets you be more precise in how things are printed out. The syntax for printf() looks like this:


printf FILE_HANDLE (FORMAT_STRING, LIST)

Like print(), the default file handle is STDOUT. The FORMAT_STRING parameter controls what is printed and how it looks. For simple cases, the formatting parameter looks identical to the list that is passed to printf(). For example:

Create two variables to hold costs for January and February.
Print the cost variables using variable interpolation. Notice that the dollar sign needs to be preceded by the backslash to avoid interpolation that you don't want.

$januaryCost = 123.34;

$februaryCost = 23345.45;



printf("January  = \$$januaryCost\n");

printf("February = \$$februaryCost\n");

This program displays:


January  = $123.34

February = $23345.45

In this example, only one parameter is passed to the printf() function-the formatting string. Because the formatting string is enclosed in double quotes, variable interpolation will take place just like for the print() function.

This display is not good enough for a report because the decimal points of the numbers do not line up. You can use the formatting specifiers shown in Table 9.5 together with the modifiers shown in Table 9.6 to solve this problem.

Table 9.5   Format Specifiers for the printf() Function

Specifier
Description
c
Indicates that a single character should be printed.
s
Indicates that a string should be printed.
d
Indicates that a decimal number should be printed.
u
Indicates that an unsigned decimal number should be printed.
x
Indicates that a hexadecimal number should be printed.
o
Indicates that an octal number should be printed.
e
Indicates that a floating point number should be printed in scientific notation.
f
Indicates that a floating point number should be printed.
g
Indicates that a floating point number should be printed using the most space-spacing format, either e or f.

Table 9.6  Format Modifiers for the printf() Function

Modifier
Description
-
Indicates that the value should be printed left-justified.
#
Forces octal numbers to be printed with a leading zero. Hexadecimal numbers will be printed with a leading 0x.
+
Forces signed numbers to be printed with a leading + or - sign.
0
Pads the displayed number with zeros instead of spaces.
.
Forces the value to be at least a certain width. For example, %10.3f means that the value will be at least 10 positions wide. And because f is used for floating point, at most 3 positions to the right of the decimal point will be displayed. %.10s will print a string at most 10 characters long.

Create two variables to hold costs for January and February.
Print the cost variables using format specifiers.

$januaryCost = 123.34;

$februaryCost = 23345.45;



printf("January  = \$%8.2f\n", $januaryCost);

printf("February = \$%8.2f\n", $februaryCost);

This program displays:


January  = $  123.34

February = $23345.45

This example uses the f format specifier to print a floating point number. The numbers are printed right next to the dollar sign because $februaryCost is 8 positions width.

If you did not know the width of the numbers that you need to print in advance, you could use the following technique.

Create two variables to hold costs for January and February.
Find the length of the largest number.
Print the cost variables using variable interpolation to determine the width of the numbers to print.
Define the
max() function. You can look in the "Example: Foreach Loops" of Chapter 7 "Control Statements," for more information about the max() function.

Listing 9.15  09LST15.PL-Using Variable Interpolation to Align Numbers When Printing

$januaryCost = 123.34;

$februaryCost = 23345.45;



$maxLength = length(max($januaryCost, $februaryCost));



printf("January  = \$%$maxLength.2f\n", $januaryCost);

printf("February = \$%$maxLength.2f\n", $februaryCost);



sub max {

    my($max) = shift(@_);



    foreach $temp (@_) {

        $max = $temp if $temp > $max;

    }

    return($max);

}


This program displays:


January  = $  123.34

February = $23345.45

While taking the time to find the longest number is more work, I think you'll agree that the result is worth it.

Tip
In the next chapter, "Regular Expressions," you see how to add commas when printing numbers for even more readability when printing numbers.

So far, we've only looked at printing numbers. You also can use printf() to control printing strings. Like the printing of numbers above, printf() is best used for controlling the alignment and length of strings. Here is an example:

Assign "John O'Mally" to $name.
Print using format specifiers to make the value 10 characters wide but only print the first 5 characters from the string.

$name = "John O'Mally";

printf("The name is %10.5s.\n", $name);

This program displays:


The name is      John.

The left side of the period modifier controls the width of the printed value also called the print field. If the length of the string to be printed is less than the width of the print field, then the string is right justified and padded with spaces.

You can left-justify the string by using the dash modifier. For example:

Assign "John O'Mally" to $name.
Print using format specifiers to left-justify the value.

$name = "John O'Mally";

printf("The name is %-10.5s.\n", $name);

This program displays:


The name is John     .

The period way off to the right shows that the string was left-justified and padded with spaces until it was 10 positions wide.

Globbing

Perl supports a feature called globbing which lets you use wildcard characters to find file names. A wildcard character is like the wild card in poker. It can have more than one meaning. Let's look at some of the simpler examples.

Example: Assigning a Glob to an Array

One common chore for computer administrators is the removal of backup files. You can use the globbing technique with the unlink() function to perform this chore.


unlink(<*.bak>);

The file specification, *.bak, is placed between the diamond operator and when evaluated returns a list of files that match the specification. An asterisk means zero or more of any character will be matched. So this unlink() call will delete all files with a BAK extension.

You can use the following: To get a list of all files that start with the letter f.


@array = <f*.*>;

The next chapter, "Regular Expressions," will show you more ways to specify file names. Most of the meta-characters used in Chapter 10 can be used inside globs.

Using Data Structures with Files

In the last chapter, you saw how to create complex data structures. Creating a program to read and write those structures is beyond the scope of this book. However, the following examples will show you how to use simpler data structures. The same techniques can be applied to the more complicated data structures as well.

Example: Splitting a Record into Fields

This example will show you how to read a file line-by-line and break the input records into fields based on a separator string. The file, FIELDS.DAT, will be used with the following contents:


1212:Jan:Jaspree:Painter

3453:Kelly:Horton:Jockey

The individual fields or values are separated from each other by the colon (:) character. The split() function will be used to create an array of fields. Then a foreach loop will print the fields. Listing 9.16 shows how to input lines from a file and split them into fields.

Use the qw() notation to create an array of words.
Open the
FIELDS.DAT file for input.
Loop while there are lines to read in the file.
Use the split function to create an array of fields, using the colon as the field separator. The scalar value of
@fieldList is passed to split to indicate how many fields to expect. Each element in the new array is then added to the %data hash with a key of the field name.
Loop through
@fieldList array.
Print each element and its value in the
%data hash.

Listing 9.16  09LST16.PL-Reading Records from Standard Input

@fieldList = qw(fName lName job age);



open(FILE, "<fields.dat");



while(<FILE>) {

    @data{@fieldList} = split(/:/, $_, scalar @fieldList);



    foreach (@fieldList) {

        printf("%10.10s = %s\n", $_, $data{$_});

    }

}



close(FILE);


This program will display:


fName = 1212

     lName = Jan

       job = Jaspree

       age = Painter



     fName = 3453

     lName = Kelly

       job = Horton

       age = Jockey

The first line of this program uses the qw() notation to create an array of words. It is identical to @fieldList = ("fName", "lName", "job", "age"); but without the distracting quotes and commas.

The split statement might require a little explanation. It is duplicated here so that you can focus on it.


@data{@fieldList} = split(/:/, $_, scalar @fieldList);

Let's use the first line of the input file as an example. The first line looks like this:


1212:Jan:Jaspree:Painter

The first thing that happens is that split creates an array using the colon as the separator, creating an array that looks like this:


 ("1212", "Jan", "Jaspree", "Painter")

You can substitute this list in place of the split() function in the statement.


@data{@fieldList} = ("1212", "Jan", "Jaspree", "Painter");

And, you already know that @fieldList is a list of field name. So, the statement can be further simplified to:


@data{"fName", "lName", "job", "age"} = 

    ("1212", "Jan", "Jaspree", "Painter");

This assignment statement shows that each array element on the right is paired with a key value on the left so that four separate hash assignments are taking place in this statement.

Summary

This was a rather long chapter, and we've really only talked about the basics of using files. You have enough information now to explore the rest of the file functions. You also could create functions to read more complicated data structures with what you've learned so far.

Let's review what you know about files. You read that files are a series of bytes stored somewhere outside the computer's memory. Most of the time, a file will be on a hard disk in a directory. But, the file also could be on a floppy disk or on a networked computer. The physical location is not important as long as you know the fully qualified file name. This name will include any computer name, drive name, and directory name that is needed to uniquely identify the file.

There are three files-actually file handles-that always are opened before your program starts. These are STDIN, STDOUT, and STDERR. The STDIN file handle is used to connect to the standard input, usually the keyboard. You can use the < character to override the standard input on the command line so that input comes from a file instead of the keyboard. The STDOUT file handle is used to connect to the standard output, usually the monitor. The > character is used to override the standard output. And finally, the STDERR file handle is used when you want to output error messages. STDERR usually points to the computer's monitor.

The diamond operator (<>) is used to read an entire line of text from a file. It stops reading when the end of line character-the newline-character is read. The returned string always includes the newline character. If no file handle is used with the diamond operator, it will attempt to read from files listed in the @ARGV array. If that array is empty, it will read from STDIN.

Next, you read about Perl's file test operators. There are way too many to recap here, but some of the more useful ones are the -d used to test for a directory name, -e used to see if a file exists, and -w to see if a file can be written to. The special file handle, _, can be used to prevent Perl from making a second system call if you need to make two tests on the same file one right after another.

A table of file functions (refer to Table 9.3) was shown which shows many functions that deal with opening files, reading and writing information, and closing files. Some functions were specific to UNIX, although not many.

You learned how to open a file and that files can be opened for input, for output, or for appending. When you read a file, you can use text mode (the default) or binary mode. In binary mode on DOS systems, line endings are read as two characters-the line feed and the carriage return. On both DOS and UNIX systems, binary mode lets you read the end of file character as regular characters with no special meaning.

Reading file information directly from the directory was shown to be very easy by using the opendir(), readdir(), and closedir() functions. An example was given that showed how to find all files with an extension of PL by using the grep() function in conjunction with readdir().

Then, we looked closely at the print() and printf() functions. Both can be used to send output to a file handle. The select() function was used to change the default handle from STDOUT to another file. In addition, some examples were given of the formatting options available with the printf() function.

The topic of globbing was briefly touched on. Globs let you specify a file name using wildcards. A list of file names is returned that can be processed like any other array.

And finally, you read about how to split a record into fields based on a separator character.

This chapter covered a lot of ground. And some of the examples did not relate to each other. Instead, I tried to give you a feel for the many ways that files can be used. An entire book can be written on the different ways to use files. But, you now know enough to create any kind of file that you might need.

Chapter 10, "Regular Expressions," will cover this difficult topic. In fact, Perl's regular expressions are one of the main reasons to learn the language. Few other languages will give you equivalent functionality.

Review Questions

Answers to Review Questions are in Appendix A.

  1. What is a file handle?
  2. What is binary mode?
  3. What is a fully qualified file name?
  4. Are variables in the computer's memory considered persistent storage?
  5. What is the <> operator used for?
  6. What is the default file handle for the printf() function?
  7. What is the difference between the following two open statements?

    open(FILE_ONE, ">FILE_ONE.DAT");
         open(FILE_TWO, ">>FILE_TWO.DAT");
  8. What value will the following expression return?

     (stat("09lst01.pl"))[7];
  9. What is globbing?
  10. What will the following statement display?

    printf("%x", 16);

Review Exercises

  1. Write a program to open a file and display each line along with its line number.
  2. Write a program that prints to four files at once.
  3. Write a program that gets the file statistics for PERL.EXE and displays its size in bytes.
  4. Write a program that uses the sysread() function. The program should first test the file for existence and determine the file size. Then the file size should be passed to the sysread() function as one of its parameters.
  5. Write a program that reads from the file handle in the following line of code. Read all of the input into an array and then sort and print the array.

    open(FILE, "dir *.pl |");
  6. Using the binary mode, write a program that reads the PERL.EXE and print any characters that are greater than or equal to "A" and less than or equal to "Z."
  7. Write a program that reads a file with two fields. The first field is a customer ID and the second field is the customer name. Use the ! character as a separator between the fields. Store the information into a hash with the customer id as the key and the customer name as the value.
  8. Write a program that reads a file into a array, then displays 20 lines at time.