![]() ![]() ![]() ![]() ![]() |
UNIX Unleashed, System Administrator's Edition
- 18 -File System and Disk AdministrationBy Steve Shah This chapter discusses the trials and tribulations of creating, maintaining, and repairing file systems. While these tasks may appear simple from a user's standpoint, they are, in fact, intricate and contain more than a handful of nuances . In the course of this chapter, we'll step through many of these nuances and, hopefully, come to a strong understanding of the hows and whys of file systems. Before we really jump into the topic, you should have a good understanding of UNIX directories, files, permissions, and paths. These are the key building blocks in understanding how to administer your file systems, and I assume you already have a mastery of them. If the statement, "Be sure to have /usr/bin before /usr/local/bin in your $PATH" confuses you in any way, you should be reading something more fundamental first. Refer to Part I, "Introduction to UNIX," for some basic instructions in UNIX. This chapter goes about the explanation of file systems a bit differently than other books. We first discuss the maintenance and repair of file systems, then discuss their creation. This was done because it is more likely that you already have existing file systems you need to maintain and fix. Understanding how to maintain them also helps you better understand why file systems are created the way they are. The techniques we cover here are applicable to most UNIX systems currently in use. The only exceptions are when we actually create the file systems. This is where the most deviation from any standard (if there ever was one) occurs. We cover the creation of file systems under the SunOS, Solaris, Linux, and IRIX implementations of UNIX. If you are not using one of these operating systems, you should check the documentation that came with your operating system for details on the creation of file systems.
What Is a File SystemThe file system is the primary means of file storage in UNIX. Each file system houses directories, which, as a group, can be placed almost anywhere in the UNIX directory tree. The topmost level of the directory tree, the root directory, begins at /. Subdirectories nested below the root directory may traverse as deep as you like so long as the longest absolute path is less than 1,024 characters. With the proliferation of vendor-enhanced versions of UNIX, you will find a number of "enhanced" file systems. From the standpoint of the administrator, you shouldn't have to worry about the differences too much. The two instances where you will need to worry about vendor-specific details are in the creation of file systems and when performing backups. We will cover the specifics of:
Note that the ufs and 4.2 file systems are actually the same. A file system, however, is only a part of the grand scheme of how UNIX keeps its data on disk. At the top level, you'll find the disks themselves. These disks are then broken into partitions, each varying in size depending on the needs of the administrator. It is on each partition that the actual file system is laid out. Within the file system, you'll find directories, subdirectories, and, finally, the individual files. Although you will rarely have to deal with the file system at a level lower than the individual files stored on it, it is critical that you understand two key concepts: inodes and the superblock. Once you understand these, you will find that the behavior and characteristics of files make more sense. inodesAn inode maintains information about each file. Depending on the type of file system, the inode can contain upwards of 40+ pieces of information. Most of it, however, is only useful to the kernel and doesn't concern us. The fields that do concern us are
The mode, link count, user ID, group ID, size, and access time are used when generating file listings. Note that the inode does not contain the file's name. That information is held in the directory file (see below for details). SuperblocksThis is the most vital information stored on the disk. It contains information on the disk's geometry (number of heads, cylinders, and so on), the head of the inode list, and free block list. Because of its importance, the system automatically keeps mirrors of this data scattered around the disk for redundancy. You only have to deal with superblocks if your file system becomes heavily corrupted. Types of FilesFiles come in 8 flavors:
Normal Files These are the files you use the most. They can be either text or binary files; however, their internal structure is irrelevant from a System Administrator standpoint. A file's characteristics are specified by the inode in the file system that describes it. An ls -l on a normal file will look something like this: -rw------- 1 sshah admin 42 May 12 13:09 hello Directories These are a special kind of file that contains a list of other files. Although there is a one-to-one mapping of inode to disk blocks, there can be a many-to-one mapping from directory entry to inode. When viewing a directory listing using the ls -l command, you can identify directories by their permissions starting with the d character. An ls -l on a directory looks something like this: drwx------ 2 sshah admin 512 May 12 13:08 public_html Hard LinksA hard link is actually a normal directory entry except instead of pointing to a unique file , it points to an already existing file . This gives the illusion that there are two identical files when you do a directory listing. Because the system sees this as just another file, it treats it as such. This is most apparent during backups because hard-linked files get backed up as many times as there are hard links to them. Because a hard link shares an inode, it cannot exist across file systems. Hard links are created with the ln command. For example, given this directory listing using ls -l, we see: -rw------- 1 sshah admin 42 May 12 13:04 hello When you type ln hello goodbye and then perform another directory listing using ls -l, you see: -rw------- 2 sshah admin 42 May 12 13:04 goodbye -rw------- 2 sshah admin 42 May 12 13:04 hello Notice how this appears to be two separate files that just happen to have the same file lengths. Also note that the link count (second column) has increased from one to two. How can you tell they actually are the same file? Use ls -il. Observe: 13180 -rw------- 2 sshah admin 42 May 12 13:04 goodbye 13180 -rw------- 2 sshah admin 42 May 12 13:04 hello You can see that both point to the same inode, 13180.
Symbolic LinksA symbolic link (sometimes referred to as a symlink) differs from a hard link because it doesn't point to another inode but to another filename. This allows symbolic links to exist across file systems as well as be recognized as a special file to the operating system. You will find symbolic links to be crucial to the administration of your file systems, especially when trying to give the appearance of a seamless system when there isn't one. Symbolic links are created using the ln -s command. A common thing people do is create a symbolic link to a directory that has moved. For example, if you are accustomed to accessing the directory for your home page in the subdirectory www but at the new site you work at, home pages are kept in the public_html directory, you can create a symbolic link from www to public_html using the command ln -s public_html www. Performing an ls -l on the result shows the link. drwx------ 2 sshah admin 512 May 12 13:08 public_html lrwx------ 1 sshah admin 11 May 12 13:08 www -> public_html SocketsSockets are the means for UNIX to network with other machines. Typically, this is done using network ports; however, the file system has a provision to allow for interprocess communication through socket files. (A popular program that uses this technique is the X Windows system.) You rarely have to deal with this kind of file and should never have to create it yourself (unless you're writing the program). If you need to remove a socket file, use the rm command. Socket files are identified by their permission settings beginning with an s character. An ls -l on a socket file looks something like this: srwxrwxrwx 1 root admin 0 May 10 14:38 X0 Named PipesSimilar to sockets, named pipes enable programs to communicate with one another through the file system. You can use the mknod command to create a named pipe. Named pipes are recognizable by their permissions settings beginning with the p character. An ls -l on a named pipe looks something like this: prw------- 1 sshah admin 0 May 12 22:02 mypipe Character DevicesThese special files are typically found in the /dev directory and provide a mechanism for communicating with system device drivers through the file system one character at a time. They are easily noticed by their permission bits starting with the c character. Each character file contains two special numbers, the major and minor. These two numbers identify which device driver that file communicates with. An ls -l on a character device looks something like this: crw-rw-rw- 1 root wheel 21, 4 May 12 13:40 ptyp4 Block DevicesBlock devices also share many characteristics with character devices in that they exist in the /dev directory, are used to communicate with device drivers, and have major and minor numbers. The key difference is that block devices typically transfer large blocks of data at a time versus one character at a time. (A hard disk is a block device, whereas a terminal is a character device.) Block devices are identified by their permission bits starting with the b character. An ls -l on a block device looks something like this: brw------- 2 root staff 16, 2 Jul 29 1992 fd0c Managing File SystemsManaging file systems is relatively easy. That is, once you can commit to memory the location of all the key files in the directory tree on each major variation of UNIX as well as your own layout of file systems across the networkÉ In other words, it can be a royal pain. From a technical standpoint there isn't much to deal with. Once the file systems have been put in their correct places and the boot time configuration files have been edited so that your file systems automatically come online at every start up, there isn't much to do besides watch your disk space. From a management standpoint, it's much more involved. Often you'll need to deal with existing configurations, which may not have been done "the right way," or you're dealing with site politics such as, "I won't let that department share my disks." Then you'll need to deal with users who don't understand why they need to periodically clean up their home directories. Don't forget the ever exciting vendor-specific nuisances and their idea of how the system "should be" organized. This section covers the tools you need to manage the technical issues. Unfortunately, managerial issues are something that can't be covered in a book. Each site has different needs as well as different resources, resulting in different policies. If your site lacks any written policy, take the initiative to write one yourself. Mounting and Unmounting File SystemsAs I mentioned earlier in this chapter, part of the power in UNIX stems from its flexibility in placing file systems anywhere in the directory tree. This feat is accomplished by mounting file systems. Before you can mount a file system, you need to select a mount point. A mount point is the directory entry in the file system where the root directory of a different file system will overlay it. UNIX keeps track of mount points, and accesses the correct file system, depending on which directory the user is currently in. A mount point may exist anywhere in the directory tree.
Mounting and Unmounting File Systems Manually To mount a file system, use the mount command: mount /dev/device /directory/to/mount where /dev/device is the device name you want to mount and /directory/to/mount is the directory you want to overlay in your local file system. For example, if you wanted to mount /dev/hda4 to the /usr directory, you would type: mount /dev/hda4 /usr Remember that the directory must exist in your local file system before anything can be mounted there. There are options that can be passed to the mount command. The most important characteristics are specified in the -o option. These characteristics are: rw read/write ro read only bg background mount (if the mount fails, place the process into the background and keep trying until success.) intr interruptible mount (if a process is pending I/O on a mounted partition, it will allow the process to be interrupted and the I/O call dropped) An example of these parameters being used is: mount -o rw,bg,intr /dev/hda4 /usr See the man page on your system for vendor specific additions. To unmount a file system, use the umount command. For example: umount /usr This unmounts the /usr file system from the current directory tree, unveiling the original directory underneath it. There is, of course, a caveat. If users are using files on a mounted file system, you cannot unmount it. All files must be closed before this can happen, which on a large system can be tricky to say the least. There are three ways to handle this:
Mounting File Systems Automatically At boot time, the system automatically mounts the root file system with read-only privileges. This enables it to load the kernel and read critical startup files. However, once it has bootstrapped itself, it needs guidance. Although it is possible for you to mount all the file systems by hand, it isn't realistic because you would then have to finish bootstrapping the machine yourself, and worse, the system could not come back online by itself. (Unless, of course, you enjoy coming into work at 2 a.m. to bring a system back up.) To get around this, UNIX uses a special file called /etc/fstab (/etc/vfstab under Solaris). This file lists all the partitions that need to be mounted at boot time and the directory where they need to be mounted. Along with that information you can pass parameters to the mount command. Each file system to be mounted is listed in the fstab file in the following format: /dev/device /dir/to/mount ftype parameters fs_freq fs_passno where:
Any lines in the fstab file that start with the pound symbol (#) are considered comments. If you need to mount a new file system while the machine is live, you must perform the mount by hand. If you wish to have this mount automatically active the next time the system is rebooted, you should be sure to add the appropriate entry to your fstab file. There are two notable partitions that don't follow the same set of rules as normal partitions. They are the swap partition and /proc. (Note that SunOS does not use the /proc file system.) Mounting the swap partition is not done using the mount command. It is instead managed by the swap command under Solaris and IRIX, and by the swapon command under SunOS and Linux. In order for a swap partition to be mounted, it must be listed in the appropriate fstab file. Once it's there, use the appropriate command (swap or swapon) with the -a parameter followed by the partition on which you've allocated swap space. The /proc file system is even stranger because it really isn't a file system. It is an interface to the kernel abstracted into a file system style format. This should be listed in your fstab file with file system type proc.
Here is a complete fstab file from a SunOS systems: # # Sample /etc/fstab file for a SunOS machine # # Local mounts /dev/sd0a / 4.2 rw 1 1 /dev/sd0g /usr 4.2 rw 1 2 /dev/sd0b swap swap rw 0 0 /dev/sd0d /var 4.2 rw 0 0 # Remote mounts server1:/export/home /home nfs rw,bg,intr 0 0 server1:/export/usr/local /usr/local nfs rw,bg,intr 0 0 server2:/export/var/spool/mail /var/spool/mail nfs rw,bg,intr 0 0 Common Commands for File System ManagementIn taking care of your system, you'll quickly find that you can use these commands and many of their parameters without having to look them up. This is because you're going to be using them all the time. I highly suggest you learn to love them.
df The df command summarizes the free disk space by file system. Running it without any parameters displays all the information about normally mounted and NFS mounted file systems. The output varies from vendor to vendor (under Solaris, use df -t) but should closely resemble this: Filesystem 1024-blocks Used Available Capacity Mounted on /dev/hda3 247871 212909 22161 91% / /dev/hda6 50717 15507 32591 32% /var /dev/hda7 481998 15 457087 0% /local server1:/var/spool/mail 489702 222422 218310 50% /var/spool/mail The columns reported show:
Common parameters to this command are:
The GNU df program, which is part of the fileutils distribution, has some additional print formatting features you may find useful. You can download the latest fileutils package at ftp://ftp.cdrom.com/pub/gnu. du The du command summarizes disk usage by directory. It recurses through all subdirectories and shows disk usage by each subdirectory with a final total at the end. Running it without any parameters shows the usage like so: 409 ./doc 945 ./lib 68 ./man 60 ./m4 391 ./src 141 ./intl 873 ./po 3402 . The first column shows the blocks of disk used by the subdirectory, and the second column shows the subdirectory being evaluated. To see how many kilobytes each subdirectory consumes, use the -k option. Some common parameters to this command are
Like the df program, this program is available as part of the GNU fileutils distribution. The GNU version has expanded on many of the parameters which you may find useful. The fileutils package can be downloaded from ftp://ftp.cdrom.com/pub/gnu. ln The ln program is used to generate links between files. This is very useful for creating the illusion of a perfect file system in which everything is in the "right" place when, in reality, it isn't. This is done by making a link from the desired location to the actual location. The usage of this program is ln file_being_linked_to link_name where file_being_linked_to is the file that already exists, and you wish to have another file point to it called link_name. The command above generates a hard link, meaning that the file link_name will be indistinguishable from the original file. Both files must exist on the same file system. A popular parameter to ln is the -s option, which generates symbolic links instead of hard links. The format of the command remains the same: ln -s file_being_linked_to link_name the difference being that the link_name file is marked as a symbolic link in the file system. Symbolic links may span file systems and are given a special tag in the directory entry.
tar The tar program is an immensely useful archiving utility. It can combine an entire directory tree into one large file suitable for transferring or compression. The command line format of this program is: tar parameters filelist Common parameters are:
Unlike most other UNIX commands, the parameters do not need to have a dash before them. To create the tarfile myCode.tar, I could use tar in the following manners: tar cf myCode.tar myCode where myCode is a subdirectory relative to the current directory where the files I wish to archive are located. tar cvf myCode.tar myCode Same as the previous tar invocation, although this time it lists all the files added to the archive on the screen. tar cf myCode.tar myCode/*.c This archives all the files in the myCode directory that are suffixed by .c tar cf myCode.tar myCode/*.c myCode/*.h This archives all the files in the myCode directory that are suffixed by .c or .h To view the contents of the myCode.tar file, use: tar tf myCode.tar To extract the files in the myCode.tar file, use: tar xf myCode.tar If the myCode directory doesn't exist, tar creates it. If the myCode directory does exist, any files in that directory are overwritten by the ones being untarred. tar xvf myCode.tar Same as the previous invocation of tar, but this lists the files as they are being extracted. tar xpf myCode.tar Same as the previous invocation of tar, but this attempts to set the permissions of the unarchived files to the values they had before archiving (very useful if you're untarring files as root).
While the stock tar that comes with your system works fine for most uses, you may find that the GNU version of tar has some nicer features. You can find the latest version of GNU tar at ftp://ftp.cdrom.com/pub/gnu. find Of the commands that I've mentioned so far, you're likely to use find the most. Its purpose is to find files or patterns of files. The parameters for this tool are find dir parameters where dir is the directory where the search begins, and parameters define what is being searched for. The most common parameters you will use are:
Some examples of the find command are find / -name core -mtime +7 -print -exec /bin/rm {} \; This starts its search from the root directory and finds all files named core that have not been modified in seven days. find / -xdev -atime +60 -a -mtime +60 -print This searches all files, from the root directory down, on the local file system, which have not been accessed for at least 60 days and have not been modified for at least 60 days, and prints the list. This is useful for finding those files that people claim they need but, in reality, never use. find /home -size +500k -print This searches all files from /home down and lists them if they are greater than 500 KB in size. A handy way of finding large files in the system. The GNU version of find, which comes with the findutils package, offers many additional features you will find useful. You can download the latest version from ftp://ftp.cdrom.com/pub/gnu. Repairing File Systems with fsckSooner or later, it happens: Someone turns off the power switch. The power outage lasts longer than your UPS's batteries and you didn't shut down the system. Someone presses the reset button. Someone overwrites part of your disk. A critical sector on the disk develops a flaw. If you run UNIX long enough, eventually a halt occurs where the system did not write the remaining cached information (sync'ed) to the disks. When this happens, you need to verify the integrity of each of the file systems. This is necessary because if the structure is not correct, using the file systems could quickly damage them beyond repair. Over the years, UNIX has developed a very sophisticated file system integrity check that can usually recover the problem. It's called fsck. The fsck UtilityThe fsck utility takes its understanding of the internals of the various UNIX file systems and attempts to verify that all the links and blocks are correctly tied together. It runs in five passes, each of which checks a different part of the linkage and each of which builds on the verifications and corrections of the prior passes. fsck walks the file system, starting with the superblock. It then deals with the allocated disk blocks, pathnames, directory connectivity, link reference counts, and the free list of blocks and inodes.
The Superblock Every change to the file system affects the superblock, which is why it is cached in RAM. Periodically, at the sync interval, it is written to disk. If it is corrupted, fsck checks and corrects it. If it is so badly corrupted that fsck cannot do its work, find the paper you saved when you built the file system and use the -b option with fsck to give it an alternate superblock to use. The superblock is the head of each of the lists that make up the file system, and it maintains counts of free blocks and inodes. Inodes fsck validates each of the inodes. It makes sure that each block in the block allocation list is not on the block allocation list in any other inode, that the size is correct, and that the link count is correct. If the inodes are correct, then the data is accessible. All that's left is to verify the pathnames. What Is a Clean (Stable) File System?Sometimes fsck responds /opt: stable (ufs file systems) This means that the superblock is marked clean and that no changes have been made to the file system since it was marked clean. First, the system marks the superblock as dirty; then it starts modifying the rest of the file system. When the buffer cache is empty and all pending writes are complete, it goes back and marks the superblock as clean. If the superblock is marked clean, there is normally no reason to run fsck, so unless fsck is told to ignore the clean flag, it just prints this notice and skips over this file system. Where Is fsck?When you run fsck, you are running an executable in either the /usr/sbin or /bin directory called fsck, but this is not the real fsck. It is just a dispatcher that invokes a file system type-specific fsck utility. When Should I Run fsck?Normally, you do not have to run fsck. The system runs it automatically when you try to mount a file system at boot time that is dirty. However, problems can creep up on you. Software and hardware glitches do occur from time to time. It wouldn't hurt to run fsck just after performing the monthly backups.
How Do I Run fsck?Because the system normally runs it for you, running fsck is not an everyday occurrence for you to remember. However, it is quite simple and mostly automatic. First, to run fsck, the file system you intend to check must not be mounted. This is a bit hard to do if you are in multiuser mode most of the time, so to run a full system fsck you should bring the system down to single user mode. In single user mode you need to invoke fsck, giving it the options to force a check of all file systems, even if they are already stable. fsck -f (SunOS) fsck -o f (Solaris) fsck (Linux and IRIX) If you wish to check a single specific file system, type its character device name. (If you aren't sure what the device name is, see the section on adding a disk to the system for details on how to determine this information.) For example: fsck /dev/hda1 Stepping Through an Actual fsck fsck occurs in five to seven steps, depending on your operating system and what errors are found, if any. fsck can automatically correct most of these errors and does so if invoked at boot time to automatically check a dirty file system. The fsck we are about to step through was done on a ufs file system. While there are some differences between the numbering of the phases for different file systems, the errors are mostly the same, requiring the same solutions. Apply common sense liberally to any invocation of fsck and you should be okay. Checking ufs File Systems For ufs file systems, fsck is a five-phase process. fsck can automatically correct most of these errors and does so if invoked at boot time to automatically check a dirty file system. However, when you run fsck manually, you are asked to answer the questions that the system would automatically answer.
Phase 1: Check Blocks and Sizes This phase checks the inode list, looking for invalid inode entries. Errors requiring answers include UNKNOWN FILE TYPE I=inode number (CLEAR) The file type bits are invalid in the inode. Options are to leave the problem and attempt to recover the data by hand later or to erase the entry and its data by clearing the inode. PARTIALLY TRUNCATED INODE I=inode number (SALVAGE) The inode appears to point to less data than the file does. This is safely salvaged, because it indicates a crash while truncating the file to shorten it. block BAD I=inode number block DUP I=inode number The disk block pointed to by the inode is either out of range for this inode or already in use by another file. This is an informational message. If a duplicate block is found, phase 1b is run to report the inode number of the file that originally used this block. Phase 2: Check Pathnames This phase removes directory entries from bad inodes found in phase 1 and 1b and checks for directories with inode pointers that are out of range or pointing to bad inodes. You may have to handle ROOT INODE NOT DIRECTORY (FIX?) You can convert inode 2, the root directory, back into a directory, but this usually means there is major damage to the inode table. I=OUT OF RANGE I=inode number NAME=file name (REMOVE?) UNALLOCATED I=inode number OWNER=O MODE=M SIZE=S MTIME=T TYPE=F (REMOVE?) BAD/DUP I=inode number OWNER=O MODE=M SIZE=S MTIME=T TYPE=F (REMOVE?) A bad inode number was found, an unallocated inode was used in a directory, or an inode that had a bad or duplicate block number in it is referenced. You are given the choice to remove the file, losing the data, or to leave the error. If you leave the error, the file system is still damaged, but you have the chance to try to dump the file first and salvage part of the data before rerunning fsck to remove the entry. fsck may return one of a variety of errors indicating an invalid directory length. You will be given the chance to have fsck fix or remove the directory as appropriate. These errors are all correctable with little chance of subsequent damage. Phase 3: Check Connectivity This phase detects errors in unreferenced directories. It creates or expands the lost+found directory if needed and connects the misplaced directory entries into the lost+found directory. fsck prints status messages for all directories placed in lost+found. Phase 4: Check Reference Counts This phase uses the information from phases 2 and 3 to check for unreferenced files and incorrect link counts on files, directories, or special files. UNREF FILE I=inode number OWNER=O MODE=M SIZE=S MTIME=T (RECONNECT?) The filename is not known (it is an unreferenced file), so it is reconnected into the lost+found directory with the inode number as its name. If you clear the file, its contents are lost. Unreferenced files that are empty are cleared automatically. LINK COUNT FILE I=inode number OWNER=O MODE=M SIZE=S MTIME=T COUNT=X (ADJUST?) LINK COUNT DIR I=inode number OWNER=O MODE=M SIZE=S MTIME=T COUNT=X (ADJUST?) In both cases, an entry was found with a different number of references than what was listed in the inode. You should let fsck adjust the count. BAD/DUP FILE I=inode number OWNER=O MODE=M SIZE=S MTIME=T (CLEAR) A file or directory has a bad or duplicate block in it. If you clear it now, the data is lost. You can leave the error and attempt to recover the data, and rerun fsck later to clear the file. Phase 5: Check Cylinder Groups This phase checks the free block and unused inode maps. It automatically corrects the free lists if necessary, although in manual mode it asks permission first. What Do I Do After fsck Finishes?First, relax, because fsck rarely finds anything seriously wrong, except in cases of hardware failure where the disk drive is failing or where you copied something on top of the file system. UNIX file systems are very robust. However, if fsck finds major problems or makes a large number of corrections, rerun it to be sure the disk isn't undergoing hardware failure. It shouldn't find more errors in a second run. Then, recover any files that it may have deleted. If you keep a log of the inodes it clears, you can go to a backup tape and dump the list of inodes on the tape. Recover just those inodes to restore the files. Back up the system again, because there is no reason to have to do this all over again. Dealing with What Is in lost+foundIf fsck reconnects unreferenced entries, it places them in the lost+found directory. They are safe there, and the system should be backed up in case you lose them while trying to move them back to where they belong. Items in lost+found can be of any type: files, directories, special files (devices), and so on. If it is a named pipe or socket, you may as well delete it. The process which opened it is long since gone and will open a new one when it is run again. For files, use the owner name to contact the user and have him look at the contents and see if the file is worth keeping. Often, it is a file that was deleted and is no longer needed, but the system crashed before it could be fully removed. For directories, the files in the directory should help you and the owner determine where they belong. You can look on the backup tape lists for a directory with those contents if necessary. Then just remake the directory and move the files back. Then remove the directory entry in lost+found. This re-creation and move has the added benefit of cleaning up the directory. Creating File SystemsNow that you understand the nuances of maintaining a file system, it's time to understand how they are created. This section walks you through the three steps of:
Disk TypesAlthough there are many different kinds of disks, UNIX systems have come to standardize on SCSI for workstations. Many PCs also sport SCSI interfaces, but because of the lower cost and abundance, you'll find a lot of IDE drives on UNIX PC's as well. SCSI itself comes in a few different flavors now. There is regular SCSI, SCSI-2, SCSI-Wide, SCSI-Fast and Wide, and now SCSI-3. Although it is possible to mix and match these devices with converter cables, you may find it both easier on your sanity and your performance if you stick to one format. As of this writing, SCSI-2 is the most common interface. When attaching your SCSI drive, there are many important points to remember.
Although SCSI is king of the workstation, PCs have another choice: IDE. IDE tends to be cheaper and more available than SCSI devices with many motherboards offering direct IDE support. The advantage of using this kind of interface is its availability as well as lower cost. They are also simpler and require less configuration on your part. The down side to IDEs is that their simplicity comes at the cost of configurability and expandability. The IDE chain can only hold two devices, and not all motherboards come with more than one IDE chain. If your CD-ROM is IDE, you only have space for one disk. This is probably okay with a single person workstation, but as you can imagine, it's not going to fly well in a server environment. Another consideration is speed. SCSI was designed with the ability to perform I/O without the aid of the main CPU, which is one of the reasons it costs more. IDE, on the other hand, was designed with cost in mind. This resulted in a simplified controller; hence, the CPU takes the burden for working the drive. While IDE did manage to simplify the PC arena, it did come with the limitation of being unable to handle disks greater than 540M. Various tricks were devised to circumvent this, however, the clean solution is now predominantly available. Known as EIDE (Enhanced IDE), it is capable of supporting disks up to 8G and can support up to 4 devices on one chain. In weighing the pros and cons of EIDE versus SCSI in the PC environment, don't forget to think about the cost-to-benefit ratio. Having a high speed SCSI controller in a single person's workstation may not be as necessary as the user is convinced it is. Plus, with disks being released in 2+ gigabyte configurations, there is ample room on the typical IDE disk. Once you have decided on the disk subsystem to install, read the documentation that came with the machine for instructions on physically attaching the disk to the system. What Are Partitions and Why Do I Need Them?Partitions are UNIX's way of dividing the disk into usable pieces. UNIX requires that there be at least one partition; however, you'll find that creating multiple partitions, each with a specific function, is often necessary. The most visible reason for creating separate partitions is to protect the system from the users. The one required partition mentioned earlier is called the root partition. It is here that critical system software and configuration files (the kernel and mount tables) must reside. This partition must be carefully watched so that it never fills up. If it fills up, your system may not be able to come back up in the event of a system crash. Because the root partition is not meant to hold the users' data, you must create separate partitions for the users' home directories, temporarily files, and so forth. This enables their files to grow without the worry of crowding out the key system files. Dual boot configurations are becoming another common reason to partition, especially with the ever-growing popularity of Linux. You may find your users wanting to be able to boot to either Windows or Linux; therefore, you need to keep at least two partitions to enable them to do this. The last, but certainly not least, reason to partition your disks is the issue of backups. Backup software often works by dumping entire partitions onto tape. By keeping the different types of data on separate partitions, you can be explicit about what gets backed up and what doesn't. For example, daily backup of the system software isn't necessary, but backups of home directories are. By keeping the two on separate partitions, you can be more concise in your selection of what gets backed up and what doesn't. Another example relates more to company politics. It may be possible that one group does not want their data to be backed up to the same tape as another group's. (Note: common sense doesn't always apply to inter-group politicsÉ) By keeping the two groups on separate partitions, you can exclude one from your normal backups and exclude the others during your special backups. Which Partitions To Create As I mentioned earlier, the purpose of creating partitions is to separate the users from the system areas. So how many different partitions need to be created? While there is no right answer for every installation, here are some guidelines to take into account. You always need a root partition. In this partition, you'll have your /bin, /etc, and /sbin directories at the very least. Depending on your version of UNIX, this could require anywhere from 30 to 100 megabytes.
The Device EntryMost implementations of UNIX automatically create the correct device entry when you boot it with the new drive attached. Once this entry has been created, you should check it for permissions. Only root should be given read/write access to it. If your backups run as a nonroot user, you may need to give group read access to the backup group. Be sure that no one else is in the backup group. Allowing world read/write access to the disk is the easiest way to have your system hacked, destroyed, or both. Device entries under Linux IDE disks under Linux use the following scheme to name the hard disks: /dev/hd[drive][partition] Each IDE drive is lettered starting from a. So the primary disk on the first chain is a; the slave on the first chain is b; the primary on the secondary chain is c; and so on. Each disk's partition is referenced by number. For example, the third partition of the slave drive on the first chain is /dev/hdb3. SCSI disks use the same scheme except instead of using /dev/hd as the prefix, /dev/sd is used. So to refer to the second partition of the first disk on the SCSI chain, you would use /dev/sda2. To refer to the entire disk, specify all the information except the partition. For example, to refer to the entire primary disk on the first IDE chain, you would use /dev/hda. Device entries under IRIX SCSI disks under IRIX are referenced in either the /dev/dsk or /dev/rdsk directories. The following is the format: /dev/[r]dsk/dksCdSP where C is the controller number, S is the SCSI address, and P is the partition, s0,s1,s2, and so on. The partition name can also be vh for the volume header or vol to refer to the entire disk. Device entries under Solaris The SCSI disks under Solaris are referenced in either the /dev/dsk or /dev/rdsk directories. The following is the format: /dev/[r]dsk/cCtSd0sP where C is the controller number, S is the SCSI address, and P is the partition number. Partition 2 always refers to the entire disk and label information. Partition 1 is typically used for swap. Device entries under SunOS Disks under SunOS are referenced in the /dev directory. The following is the format: /dev/sdTP where T is the target number and P is the partition. Typically, the root partition is a, the swap partition is b, and the entire disk is referred to as partition c. You can have partitions from a through f. An important aspect to note is an oddity with the SCSI target and unit numbering: Devices that are target three need to be called target zero, and devices that are target zero need to be called target three. A Note About Formatting Disks"Back in the old days," disks needed to be formatted and checked for bad blocks. The procedure of formatting entailed writing the head, track, and sector numbers in a sector preamble and a checksum in the postamble to every sector on the disk. At the same time, any sectors that were unusable due to flaws in the disk surface were marked and, depending on the type of disk, an alternate sector mapped into its place. Thankfully, we have moved on. Both SCSI and IDE disks now come pre-formatted from the factory. Even better, they transparently handle bad blocks on the disk and remap them without any assistance from the operating system.
Partitioning Disks and Creating File SystemsIn this section, we will cover the step by step procedure for partitioning disks under Linux, IRIX, SunOS, and Solaris. Since the principles are similar across all platforms, each platform will also cover another method of determining how a disk should be partitioned up depending on its intended usage. Linux To demonstrate how partitions are created under Linux, we will setup a disk with a single user workstation in mind. It will need not only space for system software, but for application software and the user's home directories. Creating Partitions For this example, we'll create the partitions on a 1.6 GB IDE disk located on /dev/hda. This disk will become the boot device for a single user workstation. We will create the boot /usr, /var, /tmp, /home, and swap partitions. During the actual partitioning, we don't name the partitions. Where the partitions are mounted is specified with the /etc/fstab file. Should we choose to mount them in different locations later on, we could very well do that. However, by keeping the function of each partition in mind, we have a better idea of how to size them. A key thing to remember with the Linux fdisk command is that it does not commit any changes made to the partition table to disk until you explicitly do so with the w command. With the drive installed, we begin by running the fdisk command: # fdisk /dev/hda This brings us to the fdisk command prompt. We start by using the p command to print what partitions are currently on the disk. Command (m for help): p Disk /dev/hda: 64 heads, 63 sectors, 786 cylinders Units = cylinders of 4032 * 512 bytes Device Boot Begin Start End Blocks Id System Command (m for help): We see that there are no partitions on the disk. With 1.6 GB of space, we can be very liberal with allocating space to each partition. Keeping this policy in mind, we begin creating our partitions with the n command: Command (m for help): n e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-786): 1 Last cylinder or +size or +sizeM or +sizeK ([1]-786): +50M Command (m for help): The 50 MB partition we just created becomes our root partition. Because it is the first partition, it is referred to as /dev/hda1. Using the p command, we see our new partition: Command (m for help): p Disk /dev/hda: 64 heads, 63 sectors, 786 cylinders Units = cylinders of 4032 * 512 bytes Device Boot Begin Start End Blocks Id System /dev/hda1 1 1 26 52384+ 83 Linux native Command (m for help): With the root partition out of the way, we will create the swap partition. Our sample machine has 32 MB of RAM and will be running X-Windows along with a host of development tools. It is unlikely that the machine will get a memory upgrade for a while, so we'll allocate 64 MB to swap. Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (27-786): 27 Last cylinder or +size or +sizeM or +sizeK ([27]-786): +64M Command (m for help): Because this partition is going to be tagged as swap, we need to change its file system type to swap using the t command. Command (m for help): t Partition number (1-4): 2 Hex code (type L to list codes): 82 Changed system type of partition 2 to 82 (Linux swap) Command (m for help): Because of the nature of the user, we know that there will be a lot of local software installed on this machine. With that in mind, we'll create /usr with 500 MB of space. Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 3 First cylinder (60-786): 60 Last cylinder or +size or +sizeM or +sizeK ([60]-786): +500M If you've been keeping your eyes open, you've noticed that we can only have one more primary partition to use, but we want to have /home, /var, and /tmp to be in separate partitions. How do we do this? Extended partitions. The remainder of the disk is created as an extended partition. Within this partition, we can create more partitions for use. Let's create this extended partition: Command (m for help): n Command action e extended p primary partition (1-4) e Partition number (1-4): 4 First cylinder (314-786): 314 Last cylinder or +size or +sizeM or +sizeK ([314]-786): 786 Command (m for help): We can now create /home inside the extended partition. Our user is going to need a lot of space, so we'll create a 500 MB partition. Notice that we are no longer asked whether we want a primary or extended partition. Command (m for help): n First cylinder (314-786): 314 Last cylinder or +size or +sizeM or +sizeK ([314]-786): +500M Command (m for help): Using the same pattern, we create a 250 MB /tmp and a 180 MB /var partition. Command (m for help): n First cylinder (568-786): 568 Last cylinder or +size or +sizeM or +sizeK ([568]-786): +250M Command (m for help): n First cylinder (695-786): 695 Last cylinder or +size or +sizeM or +sizeK ([695]-786): 786 Command (m for help): Notice on the last partition we created that I did not specify a size, but instead specified the last track. This is to ensure that all of the disk is used. Using the p command, we look at our final work: Command (m for help): p Disk /dev/hda: 64 heads, 63 sectors, 786 cylinders Units = cylinders of 4032 * 512 bytes Device Boot Begin Start End Blocks Id System /dev/hda1 1 1 26 52384+ 83 Linux native /dev/hda2 27 27 59 66528 82 Linux swap /dev/hda3 60 60 313 512064 83 Linux native /dev/hda4 314 314 786 953568 5 Extended /dev/hda5 314 314 567 512032+ 83 Linux native /dev/hda6 568 568 694 256000+ 83 Linux native /dev/hda7 695 695 786 185440+ 83 Linux native Command (m for help): Everything looks good. To commit this configuration to disk, we use the w command: Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. (Reboot to ensure the partition table has been updated.) Syncing disks. Reboot the machine to ensure that the partition has been updated and you're done creating the partitions. Creating File Systems in Linux Creating a partition alone isn't very useful. In order to make it useful, we need to make a file system on top of it. Under Linux, this is done using the mke2fs command and the mkswap command. To create the file system on the root partition, we use the following commands: mke2fs /dev/hda1 The program only takes a few seconds to run and generates output similar to this: mke2fs 0.5b, 14-Feb-95 for EXT2 FS 0.5a, 95/03/19 128016 inodes, 512032 blocks 25601 blocks (5.00%) reserved for the super user First data block=1 Block size=1024 (log=0) Fragment size=1024 (log=0) 63 block groups 8192 blocks per group, 8192 fragments per group 2032 inodes per group Superblock backups stored on blocks: 8193,16385,24577,32769,40961,49153,57345,65537,73729, 81921,90113,98305,106497,114689,122881,131073,139265,147457, 155649,163841,172033,180225,188417,196609,204801,212993,221185, 229377,237569,245761,253953,262145,270337,278529,286721,294913, 303105,311297,319489,327681,335873,344065,352257,360449,368641, 376833,385025,393217,401409,409601,417793,425985,434177,442369, 450561,458753,466945,475137,483329,491521,499713,507905 Writing inode tables: done Writing superblocks and file system accounting information: done You should make a note of these superblock backups and keep them in a safe place. Should the day arise that you need to use fsck to fix a superblock gone bad, you will want to know where the backups are. Simply do this for all of the partitions, except for the swap partition. To create the swap file system, you need to use the mkswap command like this: mkswap /dev/hda2 Replace /dev/hda2 with the partition you chose to make your swap space. The result of the command will be similar to: Setting up swapspace, size = 35090432 bytes And the swap space is ready. To make the root file system bootable, you need to install the lilo boot manager. This is part of all the standard Linux distributions, so you shouldn't need to hunt for it on the Internet. Simply modify the /etc/lilo.conf file so that /dev/hda1 is set to be the boot disk and run: lilo The resulting output should look something like: Added linux * where linux is the name of the kernel to boot, as specified by the name= field in /etc/lilo.conf. SunOS In this example, we will be preparting a Seagate ST32550N as an auxiliary disk to an existing system. The disk will be divided into three partitions: one for use as a mail spool, one for use as a /usr/local, and the third as an additional swap partition. Creating the partitions
Once a disk has been attached to the machine, you should verify its connection and SCSI address by running the probe-scsi command from the PROM monitor if the disk is attached to the internal chain, or the probe-scsi-all command to see all the SCSI devices on the system. When you are sure the drive is properly attached and verified to be functioning, you're ready to start accessing the drive from the OS. After the machine has booted, run the dmesg command to collect the system diagnostic messages. You may want to pipe the output to grep so that you can easily find the information on disks. For example: dmesg | grep sd On our system this generated the following output: sd0: <SUN0207 cyl 1254 alt 2 hd 9 sec 36> sd1 at esp0 target 1 lun 0 sd1: corrupt label - wrong magic number sd1: Vendor 'SEAGATE', product 'ST32550N', 4194058 512 byte blocks root on sd0a fstype 4.2 swap on sd0b fstype spec size 32724K dump on sd0b fstype spec size 32712K This result tells us that we have an installed disk on sd0 that the system is aware of and using. The information from the sd1 device is telling us that it found a disk, but it isn't usable because of a corrupt label. Don't worry about the error. Until we partition the disk and create file systems on it, the system doesn't know what to do with it, hence the error. If you are using SCSI address 0 or 3, remember the oddity we mentioned earlier where device 0 needs to be referenced as 3 and device 3 needs to be referenced as 0. Even though we do not have to actually format the disk, we do need to use the format program that come with SunOS because it also creates the partitions and writes the label to the disk. To invoke the format program, simply run: format sd1 where sd1 is the name of the disk we are going to partition. The format program displays the following menu: FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector show - translate a disk address label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels quit format> We need to enter type at the format> prompt so that we can tell SunOS the kind of disk we have. The resulting menu looks something like: AVAILABLE DRIVE TYPES: 0. Quantum ProDrive 80S 1. Quantum ProDrive 105S 2. CDC Wren IV 94171-344 3. SUN0104 ... 13. other Specify disk type (enter its number): Because we are adding a disk this machine has not seen before, we need to select option 13, other. This begins a series of prompts requesting the disk's geometry. Be sure to have this information from the manufacturer before starting this procedure. The first question, Enter number of data cylinders: is actually a three-part question. After you enter the number of data cylinders, the program asks for the number of alternative cylinders and then the number of physical cylinders. The number of physical cylinders is the number your manufacturer provided you. Subtract two from there to get the number of data cylinders, and then just use the default value of 2 for the number of alternate cylinders. For our Seagate disk, we answered the questions as follows: Enter number of data cylinders: 3508 Enter number of alternate cylinders [2]: 2 Enter number of physical cylinders [3510]: 3510 Enter number of heads: 11 Enter number of data sectors/track: 108 Enter rpm of drive [3600]: Enter disk type name (remember quotes): "SEAGATE ST32550N" selecting sd1: <SEAGATE ST32550N> [disk formatted, no defect list found] No defined partition tables. Note that even though our sample drive actually rotates at 7200 rpm, we stick with the default of 3600 rpm because the software will not accept entering a higher speed. Thankfully, this doesn't matter because the operating system doesn't use the information. Even though format reported that the disk was formatted, it really wasn't. It only acquired information needed to later write the label. Now we are ready to begin preparations to partition the disk. These preparations entail computing the amount each cylinder holds and then approximating the number of cylinders we want in each partition. With our sample disk, we know that each cylinder is composed of 108 sectors on a track, with 11 tracks composing the cylinder. From the information we saw in dmesg, we know that each block is 512 bytes long. Hence, if we want our mail partition to be 1 GB in size, we perform the following math to compute the necessary blocks: 1 gigabyte = 1048576 kilobytes One cylinder = 108 sectors * 11 heads = 1188 blocks 1188 blocks = 594 kilobytes 1048576 / 594 = 1765 cylinders 1765 * 1188 = 2096820 blocks Obviously, there are some rounding errors since the exact one GB mark occurs in the middle of a cylinder and we need to keep each partition on a cylinder boundary. 1,765 cylinders is more than close enough. The 1,765 cylinders translates to 2,096,820 blocks. The new swap partition we want to make needs to be 64 MB in size. Using the same math as before, we find that our swap needs to be 130,680 blocks long. The last partition on the disk needs to fill the remainder of the disk. Knowing that we have a 2 GB disk, a 1 GB mail spool, and a 64 MB swap partition, this should leave us with about 960 MB for /usr/local. Armed with this information, we are ready to tackle the partitioning. From the format> prompt, type partition to start the partitioning menu. The resulting screen looks something like this: format> partition PARTITION MENU: a - change 'a' partition b - change 'b' partition c - change 'c' partition d - change 'd' partition e - change 'e' partition f - change 'f' partition g - change 'g' partition h - change 'h' partition select - select a predefined table name - name the current table print - display the current table label - write partition map and label to the disk quit partition> To create our mail partition, we begin by changing partition a. At the partition> prompt, type a. partition> a This brings up a prompt for entering the starting cylinder and the number of blocks to allocate. Because this is going to be the first partition on the disk, we start at cylinder 0. Based on the math we did earlier, we know that we need 2,096,820 blocks. partition a - starting cyl 0, # blocks 0 (0/0/0) Enter new starting cyl [0]: 0 Enter new # blocks [0, 0/0/0]: 2096820 partition> Now we want to create the b partition, which is traditionally used for swap space. We know how many blocks to use based on our calculations, but we don't know which cylinder to start from. To solve this, we simply display the current partition information for the entire disk using the p command: partition> p Current partition table (unnamed): partition a - starting cyl 0, # blocks 2096820 (1765/0/0) partition b - starting cyl 0, # blocks 0 (0/0/0) partition c - starting cyl 0, # blocks 0 (0/0/0) partition d - starting cyl 0, # blocks 0 (0/0/0) partition e - starting cyl 0, # blocks 0 (0/0/0) partition f - starting cyl 0, # blocks 0 (0/0/0) partition g - starting cyl 0, # blocks 0 (0/0/0) partition h - starting cyl 0, # blocks 0 (0/0/0) partition> We can see that partition a is allocated with 2,096,820 blocks and is 1,765 cylinders long. Because we don't want to waste space on the disk, we start the swap partition on cylinder 1765. (Remember to count from zero!) partition> b partition b - starting cyl 0, # blocks 0 (0/0/0) Enter new starting cyl [0]: 1765 Enter new # blocks [0, 0/0/0]: 130680 partition> Before we create our last partition, we need to take care of some tradition first, namely partition c. This is usually the partition that spans the entire disk. Before creating this partition, we need to do a little math. 108 cylinders x 11 heads x 3508 data cylinders = 4167504 blocks Notice that the number of blocks we compute here does not match the number actually on the disk. This number was computed based on the information we entered when giving the disk type information. It is important that we remain consistent. Since the c partition spans the entire disk, we specify the starting cylinder as 0. Creating this partition should look something like this: partition> c partition c - starting cyl 0, # blocks 0 (0/0/0) Enter new starting cyl [0]: 0 Enter new # blocks [0, 0/0/0]: 4167504 partition> We have only one partition left to create: /usr/local. Because we want to fill the remainder of the disk, we need to do one last bit of math to compute how many blocks are still free. This is done by taking the size of partition c (the total disk) and subtracting the sizes of the existing partitions. For our example, this works out to be: 4167504 - 2096820 - 130680 = 1940004 remaining blocks Now we need to find out which cylinder to start from. To do so, we run the p command again: partition> p Current partition table (unnamed): partition a - starting cyl 0, # blocks 2096820 (1765/0/0) partition b - starting cyl 1765, # blocks 130680 (110/0/0) partition c - starting cyl 0, # blocks 4167504 (3508/0/0) partition d - starting cyl 0, # blocks 0 (0/0/0) partition e - starting cyl 0, # blocks 0 (0/0/0) partition f - starting cyl 0, # blocks 0 (0/0/0) partition g - starting cyl 0, # blocks 0 (0/0/0) partition h - starting cyl 0, # blocks 0 (0/0/0) partition> To figure out which cylinder to start from, we add the number of cylinders used so far. Remember not to add the cylinders from partition c since it encompasses the entire disk. 1765 + 110 = 1875 Now that we know which cylinder to start from and how many blocks to make it, we create our last partition. partition> d partition d - starting cyl 0, # blocks 0 (0/0/0) Enter new starting cyl [0]: 1875 Enter new # blocks [0, 0/0/0]: 1940004 partition> Congratulations! You've made it through the ugly part. Before we can truly claim victory, we need to commit these changes to disk using the label command. When given the prompt, Ready to label disk, continue? simply answer y. partition> label Ready to label disk, continue? y partition> To leave the format program, type quit at the partition> prompt, and then quit again at the format> prompt. Creating File Systems Now comes the easy part. Simply run the newfs command on all the partitions we created except for the swap partition and the entire disk partition . Your output should look similar to this: # newfs sd1a /dev/rsd1a: 2096820 sectors in 1765 cylinders of 11 tracks, 108 sectors 1073.6MB in 111 cyl groups (16 c/g, 9.73MB/g, 4480 i/g) superblock backups (for fsck -b #) at: 32, 19152, 38272, 57392, 76512, 95632, 114752, 133872, 152992, 172112, 191232, 210352, 229472, 248592, 267712, 286832, 304160, 323280, 342400, 361520, 380640, 399760, 418880, 438000, 457120, 476240, 495360, 514480, 533600, 552720, 571840, 590960, 608288, 627408, 646528, 665648, 684768, 703888, 723008, 742128, 761248, 780368, 799488, 818608, 837728, 856848, 875968, 895088, 912416, 931536, 950656, 969776, 988896, 1008016, 1027136, 1046256, 1065376, 1084496, 1103616, 1122736, 1141856, 1160976, 1180096, 1199216, 1216544, 1235664, 1254784, 1273904, 1293024, 1312144, 1331264, 1350384, 1369504, 1388624, 1407744, 1426864, 1445984, 1465104, 1484224, 1503344, 1520672, 1539792, 1558912, 1578032, 1597152, 1616272, 1635392, 1654512, 1673632, 1692752, 1711872, 1730992, 1750112, 1769232, 1788352, 1807472, 1824800, 1843920, 1863040, 1882160, 1901280, 1920400, 1939520, 1958640, 1977760, 1996880, 2016000, 2035120, 2054240, 2073360, 2092480, Be sure to note the superblock backups. This is critical information when fsck discovers heavy corruption in your file system. Remember to add your new entries into /etc/fstab if you want them to automatically mount on boot. If you created the first partition with the intention of making it bootable, you have a few more steps to go. First, mount the new file system to /mnt. # mount /dev/sd1a /mnt Once the file system is mounted, you need to clone your existing boot partition using the dump command like this: # cd /mnt # dump 0f - / | restore -rf - With the root partition cloned, use the installboot command to make it bootable: # /usr/kvm/mdec/installboot /mnt/boot /usr/kvm/mdec/bootsd /dev/rsd1a Be sure to test your work by rebooting and making sure everything mounts correctly. If you created a bootable partition, be sure you can boot from it now. Don't wait for a disaster to find out whether or not you did it right. Solaris For this example, we are partitioning a disk that is destined to be a web server for an intranet. We need a minimal root partition, adequate swap, tmp, var, and usr space, and a really large partition, which we'll call /web. Because the web logs will remain on the /web partition, and there will be little or no user activity on the machine, /var and /tmp will be set to smaller values. /usr will be a little larger because it may be destined to house web development tools. Creating partitions
Once a disk has been attached to the machine, you should verify its connection and SCSI address by running the probe-scsi command from the PROM monitor if the disk is attached to the internal SCSI chain, probe-scsi-all to list all the SCSI devices on the system Once this shows that the drive is properly attached and verified to be functioning, you're ready to start accessing the drive from the OS. Boot the machine and login as root. In order to find the device name, we are going to use for this, we again use the dmesg command. # dmesg | grep sd ... sd1 at esp0: target 1 lun 0 sd1 is /sbus@1,f8000000/esp@0,800000/sd@1,0 WARNING: /sbus@1,f8000000/esp@0,800000/sd@1,0 (sd1): corrupt label - wrong magic number Vendor 'SEAGATE', product 'ST32550N', 4194058 512 byte blocks ... From this message, we see that our new disk is device /dev/[r]dsk/c0t1d0s2. The disk hasn't been set up for use on a Solaris machine before, which is why we received the corrupt label error. If you recall the layout of Solaris device names, you'll remember that the last digit on the device name is the partition number. Noting that, we see that Solaris refers to the entire disk in partition 2, much the same way SunOS refers to the entire disk as partition c. Before we can actually label and partition the disk, we need to create the device files. This is done with the drvconfig and disks commands. They should be invoked with no parameters: # drvconfig ; disks Now that the kernel is aware of the disk, we are ready to run the format command to partition the disk. # format /dev/rdsk/c0t1d0s2 This brings up the format menu as follows: FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show vendor, product and revision volname - set 8-character volume name quit format> To help the format command with partitioning, we need to tell it the disk's geometry by invoking the type command at the format> prompt. We will then be asked to select what kind of disk we have. Because this is the first time this system is seeing this disk, we need to select other. This should look something like this: format> type AVAILABLE DRIVE TYPES: 0. Auto configure 1. Quantum ProDrive 80S 2. Quantum ProDrive 105S 3. CDC Wren IV 94171-344 . . . 16. other Specify disk type (enter its number): 16 The system now prompts for the number of data cylinders. This is two less than the number of cylinders the vendor specifies because Solaris needs two cylinders for bad block mapping. Enter number of data cylinders: 3508 Enter number of alternate cylinders[2]: 2 Enter number of physical cylinders[3510]: 3510 The next question can be answered from the vendor specs as well. Enter number of heads: 14 The followup question about drive heads can be left as default. Enter physical number of heads[default]: The last question you must answer can be pulled from the vendor specs as well. Enter number of data sectors/track: 72 The remaining questions should be left as default. Enter number of physical sectors/track[default]: Enter rpm of drive[3600]: Enter format time[default]: Enter cylinder skew[default]: Enter track skew[default]: Enter tracks per zone[default]: Enter alternate tracks[default]: Enter alternate sectors[default]: Enter cache control[default]: Enter prefetch threshold[default]: Enter minimum prefetch[default]: Enter maximum prefetch[default]: The last question you must answer about the disk is its label information. Enter the vendor name and model number in double quotes for this question. For our sample disk, this would be: Enter disk type name (remember quotes): "SEAGATE ST32550N" With this information, Solaris makes creating partitions easy. Dare I say, fun? After the last question from the type command, you will be placed at the format> prompt. Enter partition to start the partition menu. format> partition PARTITION MENU: 0 - change '0' partition 1 - change '1' partition 2 - change '2' partition 3 - change '3' partition 4 - change '4' partition 5 - change '5' partition 6 - change '6' partition 7 - change '7' partition select - select a predefined table modify - modify a predefined partition table name - name the current table print - display the current table label - write partition map and label to the disk quit partition> At the partition> prompt, enter modify to begin creating the new partitions. This brings up a question about what template to use for partitioning. We want the All Free Hog method. partition> modify Select partitioning base: 0. Current partition table (unnamed) 1. All Free Hog Choose base (enter number)[0]? 1 The All Free Hog method enables you to select one partition to receive the remainder of the disk once you have allocated a specific amount of space for the other partitions. For our example, the disk hog would be the /web partition because you want it to be as large as possible. As soon as you select option 1, you should see the following screen: Part Tag Flag Cylinders Size Blocks 0 root wm 0 0 (0/0/0) 1 swap wu 0 0 (0/0/0) 2 backup wu 0 - 3507 1.99GB (3508/0/0) 3 unassigned wm 0 0 (0/0/0) 4 unassigned wm 0 0 (0/0/0) 5 unassigned wm 0 0 (0/0/0) 6 usr wm 0 0 (0/0/0) 7 unassigned wm 0 0 (0/0/0) Do you wish to continue creating a new partition table based on above table [yes]? yes Because the partition table appears reasonable, agree to use it as a base for your scheme. You will now be asked which partition should be the Free Hog Partition, the one that receives whatever is left of the disk when everything else has been allocated. For our scheme, we'll make that partition number 5. Free Hog Partition[6]? 5 Answering this question starts the list of questions asking how large to make the other partitions. For our web server, we need a root partition to be about 200 MB for the system software, a swap partition to be 64 MB, a /tmp partition to be 200 MB, a /var partition to be 200 MB, and a /usr partition to be 400 MB. Keeping in mind that partition 2 has already been tagged as the "entire disk" and that partition 5 will receive the remainder of the disk, you will be prompted as follows: Enter size of partition '0' [0b, 0c, 0.00mb]: 200mb Enter size of partition '1' [0b, 0c, 0.00mb]: 64mb Enter size of partition '3' [0b, 0c, 0.00mb]: 200mb Enter size of partition '4' [0b, 0c, 0.00mb]: 200mb Enter size of partition '6' [0b, 0c, 0.00mb]: 400mb Enter size of partition '7' [0b, 0c, 0.00mb]: 0 As soon as you finish answering these questions, the final view of all the partitions appears looking something like: Part Tag Flag Cylinders Size Blocks 0 root wm 0 - 344 200.13mb (345/0/0) 1 swap wu 345 - 455 64.39mb (111/0/0) 2 backup wu 0 - 3507 1.99GB (3508/0/0) 3 unassigned wm 456 - 800 200.13mb (345/0/0) 4 unassigned wm 801 - 1145 200.13mb (345/0/0) 5 unassigned wm 1146 - 2817 969.89mb (1672/0/0) 6 unassigned wm 2818 - 3507 400.25mb (690/0/0) 7 unassigned wm 0 0 (0/0/0) This is followed by the question: Okay to make this the correct partition table [yes]? yes Answer yes since the table appears reasonable. This brings up the question: Enter table name (remember quotes): "SEAGATE ST32550N" Answer with a description of the disk you are using for this example. Remember to include the quote symbols when answering. Given all of this information, the system is ready to commit this to disk. As one last check, you will be asked: Ready to label disk, continue? y As you might imagine, we answer yes to the question and let it commit the changes to disk. You have now created partitions and can quit the program by entering quit at the partition> prompt and again at the format> prompt. Creating file systems To create a file system, simply run: # newfs /dev/c0t1d0s0 where /dev/c0t1d0s0 is the partition on which to create the file system. Be sure to create a file system on all the partitions except for partitions 2 and 3, the swap, and entire disk, respectively. Be sure to note the backup superblocks that were created. This information is very useful when fsck is attempting to repair a heavily damaged file system. After you create the file systems, be sure to enter them into the /etc/vfstab file so that they are mounted the next time you reboot. If you need to make the root partition bootable, you still have two more steps. The first is to clone the root partition from your existing system to the new root partition using: # mount /dev/dsk/c0t1d0s0 /mnt # ufsdump 0uf - / | ufsrestore -rf - Once the file root partition is cloned, you can run the installboot program like this: # /usr/sbin/installboot /usr/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0 Be sure to test your new file systems before you need to rely on them in a disaster situation. IRIX For this example, we are creating a large scratch partition for a user who does modeling and simulations. Although IRIX has many GUI-based tools to perform these tasks, it is always a good idea to learn the command line versions just in case you need to do any kind of remote administration. Creating partitions Once the drive is attached, run a program called hinv to take a "hardware inventory." On the sample system, you saw the following output: ... Integral SCSI controller 1: Version WD33C93B, revision D Disk drive: unit 6 on SCSI controller 1 Integral SCSI controller 0: Version WD33C93B, revision D Disk drive: unit 1 on SCSI controller 0 ... Our new disk is external to the system, so we know it is residing on controller 1. Unit 6 is the only disk on that chain, so we know that it is the disk we just added to the system. To partition the disk, run the fx command without any parameters. It prompts us for the device name, controller, and drive number. Choose the default device name and enter the appropriate information for the other two questions. On our sample system, this would look like: # fx fx version 6.2, Mar 9, 1996 fx: "device-name" = (dksc) fx: ctlr# = (0) 1 fx: drive# = (1) 6 fx: lun# = (0) ...opening dksc(1,6,0) ...controller test...OK Scsi drive type == SEAGATE ST32550N 0022 ----- please choose one (? for help, .. to quit this menu)----- [exi]t [d]ebug/ [l]abel/ [b]adblock/ [exe]rcise/ [r]epartition/ fx> We see that fx found our Seagate and is ready to work with it. From the menu we select r to repartition the disk. fx displays what it knows about the disk and then presents another menu specifically for partitioning the disk. fx> r ----- partitions----- part type cyls blocks Megabytes (base+size) 7: xfs 3 + 3521 3570 + 4189990 2 + 2046 8: volhdr 0 + 3 0 + 3570 0 + 2 10: volume 0 + 3524 0 + 4193560 0 + 2048 capacity is 4194058 blocks ----- please choose one (? for help, .. to quit this menu)----- [ro]otdrive [u]srrootdrive [o]ptiondrive [re]size fx/repartition> Looking at the result, we see that this disk has never been partitioned in IRIX before. Part 7 represents the amount of partitionable space, part 8 the volume header, and part 10 the entire disk. Because this disk is going to be used as a large scratch partition, we want to select the optiondrive option from the menu. After you select that, you are asked what kind of file system you want to use. IRIX 6 and above defaults to xfs, while IRIX 5 defaults to efs. Use the one appropriate for your version of IRIX. Our sample system is running IRIX 6.3, so we accept the default of xfs: fx/repartition> o fx/repartition/optiondrive: type of data partition = (xfs) Next we are asked whether we want to create a /usr log partition. Because our primary system already has a /usr partition, we don't need one here. Type no. fx/repartition/optiondrive: create usr log partition? = (yes) no The system is ready to partition the drive. Before it does, it gives one last warning allowing you to stop the partitioning before it completes the job. Because you know you are partitioning the correct disk, you can give it "the go-ahead": Warning: you must reinstall all software and restore user data from backups after changing the partition layout. Changing partitions causes all data on the drive to be lost. Be sure you have the drive backed up if it contains any user data. Continue? y The system takes a few seconds to create the new partitions on the disk. Once it is done, it reports what the current partition list looks like. ----- partitions----- part type cyls blocks Megabytes (base+size) 7: xfs 3 + 3521 3570 + 4189990 2 + 2046 8: volhdr 0 + 3 0 + 3570 0 + 2 10: volume 0 + 3524 0 + 4193560 0 + 2048 capacity is 4194058 blocks ----- please choose one (? for help, .. to quit this menu)----- [ro]otdrive [u]srrootdrive [o]ptiondrive [re]size fx/repartition> Looks good. We can exit fx now by typing .. at the fx/repartition> prompt and exit at the fx> prompt. Our one large scratch partition is now called /dev/dsk/dks1d6s7. Creating the filesystem To create the file system, we use the mkfs command like this: # mkfs /dev/rdsk/dks1d6s7 This generates the following output: meta-data=/dev/dsk/dks1d6s7 isize=256 agcount=8, agsize=65469 blks data = bsize=4096 blocks=523748, imaxpct=25 log =internal log bsize=4096 blocks=1000 realtime =none bsize=65536 blocks=0, rtextents=0 Remember to add this entry into the /etc/fstab file so that the system automatically mounts the next time you reboot. SummaryAs you've seen in this chapter, creating, maintaining, and repairing filesystems is not a trivial task. It is, however, a task which should be well understood. An unmaintained file system can quickly lead to trouble and without its stability, the remainder of the system is useless. Let's make a quick rundown of the topics we covered:
In short, file systems administration is not a trivial task and should not be taken lightly. Good maintenance techniques not only help maintain your uptime, but your sanity as well. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|