Chapter 7

Setting Up Redundant Arrays of Inexpensive Disks (RAID)

Previous chapterNext chapterContents


In this chapter you learn how to

A redundant array of inexpensive disks (RAID) uses multiple fixed-disk drives, high-speed disk controllers, and special software drivers to increase the safety of your data and to improve the performance of your fixed-disk subsystem. All commercial RAID subsystems use the Small Computer System Interface (SCSI, pronounced "scuzzy"), which now is undergoing a transition from high-speed SCSI-2 to ultra-fast SCSI-3 (also called ultra-wide SCSI). Virtually all network servers now use narrow (8-bit) or wide (16-bit) SCSI-2 drives and controllers. Ultra-wide SCSI host adapters for the PCI bus can deliver up to 40M per second (40M/s) of data to and from the PC's RAM.

RAID protects your data by spreading it over multiple disk drives and then calculating and storing parity information. This redundancy allows any one drive to fail without causing the array itself to lose any data. A failed drive can be replaced and its contents reconstructed from the information on the remaining drives in the array.

RAID increases disk subsystem performance by distributing read tasks over several drives, allowing the same data to be retrieved from different locations, depending on which location happens to be closest to the read head(s) at the instant the data is requested.

There are different levels of RAID, each of which is optimized for various types of data handling and storage requirements. RAID can be implemented in hardware or as add-on software. Modern network operating systems, such as Windows NT Server, provide native support for one or more RAID levels.

The various component parts of RAID technology were originally developed for mainframes and minicomputers. Until recently, the deployment of RAID systems was limited by its high cost to those environments. In the past few years, however, RAID has become widely available in the PC LAN environment. The cost of disk drives has plummeted. Hardware RAID controllers have become, if not mass-market items, at least reasonably priced. The cost objections to implement RAID systems are now disappearing. Your server deserves to have a RAID system; don't even consider building a server that doesn't use RAID.

Most of the chapters of this book use the term fixed-disk drive to distinguish these drives from other data-storage devices, such as removable media devices (typified by Iomega's Zip and Jaz products), CD-ROM, magneto-optic, and other storage systems that use the term drive. In this chapter, the term drive means a fixed-disk (Winchester-type) drive.

Understanding RAID Levels

Although the various component parts of RAID have been used in the mainframe and minicomputer arenas for years, the RAID model was originally defined in a white paper published in 1987 by the University of California at Berkeley. This paper set the theoretical framework on which subsequent RAID implementations have been built.

The paper defines five levels of RAID, numbered 1 through 5. RAID levels aren't indicative of the degree of data safety or increased performance-they simply define how the data is divided and stored on the disk drives comprising the array, and how and where parity information is calculated and stored. In other words, the higher number isn't necessarily better.

Disk drives do only two things: write data and read data. Depending on the application, the disk subsystem may be called on to do frequent small reads and writes; or the drive may need to do less frequent, but larger, reads and writes. An application server running a client-server database, for example, tends toward frequent small reads and writes, whereas a server providing access to stored images tends toward less frequent, but larger, reads and writes. The various RAID levels vary in their optimization for small reads, large reads, small writes, and large writes. Although most servers have a mixed disk access pattern, choosing the RAID level optimized for the predominant environment maximizes the performance of your disk subsystem.

The various RAID levels are optimized for various data storage requirements, in terms of redundancy levels and performance issues. Different RAID levels store data bit-wise, byte-wise, or sector-wise over the array of disks. Similarly, parity information may be distributed across the array or contained on a single physical drive.

RAID levels 1 and 5 are very common in PC LAN environments. All hardware and software RAID implementations provide at least these two levels. RAID level 3 is used occasionally in specialized applications, and is supported by most hardware and some software RAID implementations. RAID levels 2 and 4 are seldom, if ever, used in PC LAN environments, although some hardware RAID implementations offer these levels.

Although RAID really has only levels 1 through 5 defined, you'll commonly see references to RAID 0, RAID 0/1, RAID 6, RAID 7, and RAID 10, all of which are de facto extensions of the original RAID specification. These uses have become so common that they're now universally accepted. Because RAID is a model or theoretical framework (rather than a defined protocol or implementation), manufacturers continue to market improved RAID technology with arbitrarily assigned RAID levels.

The following sections describe the RAID Advisory Board, which sets the standards for RAID systems, and the features that distinguish the RAID levels from one another.

The RAID Advisory Board

The RAID Advisory Board (RAB) is a consortium of manufacturers of RAID equipment and other interested parties. RAB is responsible for developing and maintaining RAID standards and has formal programs covering education, standardization, and certification. Supporting these three programs are six committees: Functional Test, Performance Test, RAID-Ready Drive, Host Interface, RAID Enclosure, and Education.

RAB sells several documents, the most popular of which is The RAIDbook, first published in 1993. The RAIDbook covers the fundamentals of RAID and defines each RAID level. It's a worthwhile acquisition if you want to learn more about RAID.

The RAB Certification Program awards logos to equipment that passes its compatibility- and performance-testing suites. The RAB Conformance Logo certifies that the component so labeled complies with the named RAID level designation as published in The RAIDbook. The RAB Gold Certificate Logo certifies that a product meets the functional and performance specifications published by RAB.

For more information about the RAID Advisory Board and its programs, contact Joe Molina, RAB Chairman, at the RAID Advisory Board, affiliated with Technology Forums Ltd., 13 Marie Lane, St. Peter, Minnesota 56082-9423, (507) 931-0967, fax (507) 931-0976, e-mail 0004706032@mcimail.com. The RAID Advisory Board can also be reached via the Web at http://www.andataco.com/rab/.

RAID 0

RAID 0 is a high-performance, zero-redundancy array option. RAID 0 isn't properly RAID at all. It stripes blocks of data across multiple disk drives to increase the throughput of the disk subsystem, as shown in figure 7.1, but it offers no redundancy. If one drive fails in a RAID 0 array, the data on all drives on the array is inaccessible. RAID 0 is used primarily for applications needing the highest possible reading and writing data rate.


7.1

A diagram of RAID 0 (Sector Striping) with two drives.

Nevertheless, there's a place for RAID 0. Understanding RAID 0 is important because the same striping mechanism used in RAID 0 is used to increase performance in other RAID levels. RAID 0 is inexpensive to implement for two reasons:

RAID 0 offers high performance on reads and writes of short and long data elements. If your application requires large amounts of fast disk storage and you've made other provisions for backing up this data to your satisfaction, RAID 0 is worth considering.

RAID 0 uses striping to store data. Striping means that data blocks are alternately written to the different physical disk drives that make up the logical volume represented by the array. For instance, your RAID 0 array might comprise three physical disk drives that are visible to the operating system as one logical volume. Suppose that your block size is 8K and that a 32K file is to be written to disk. With RAID 0, the first 8K block may be written to physical drive 1, the second block to drive 2, the third to drive 3, and the fourth and final 8K block again to drive 1. Your single 32K file is thus stored as four separate blocks residing on three separate physical hard disk drives.

This block-wise distribution of data across multiple physical hard disks introduces two parameters used to quantify a RAID 0 array. The size of the block used-in this case, 8K-is referred to as the chunk size. The chunk size determines how much data is written to a disk drive in each operation. The number of physical hard disk drives comprising the array determines the stripe width. Both chunk size and stripe width affect the performance of a RAID 0 array.

When a logical read request is made to the RAID 0 array (fulfillment of which requires that an amount of data larger than the chunk size be retrieved), this request is broken into multiple smaller physical read requests, each of which is directed to and serviced by the individual physical drives on which the multiple blocks are stored. Although these multiple read requests are generated serially, doing so takes little time. The bulk of the time needed to fulfill the read request is used to transfer the data itself. With sequential reads, which involve little drive head seeking, the bottleneck becomes the internal transfer rate of the drives themselves. Striping lets this transfer activity occur in parallel on the individual disk drives that make up the array, so the elapsed time until the read request is completely fulfilled is greatly reduced.

Striping doesn't come without cost in processing overhead, and this is where chunk size affects performance. Against the benefit of having multiple spindles at work to service a single logical read request, you must weigh the overhead processing cost required to write and then read this data from many disks rather than just one. (Spindle is a commonly used synonym for a physical drive.) Each SCSI disk access requires numerous SCSI commands to be generated and then executed, and striping the data across several physical drives multiplies the effort required accordingly.

Reducing the block size too far can cause the performance benefits of using multiple spindles to be swamped by the increased time needed to generate and execute additional SCSI commands. You can actually decrease performance by using too small a block size. The break-even point is determined by your SCSI host adapter and by the characteristics of the SCSI hard disk drives themselves, but, generally speaking, a block size smaller than 8K risks performance degradation. Using block sizes of 16K, 32K, or larger offers correspondingly greater performance benefits.

Sequential reads and writes make up a small percentage of total disk activity on a typical server disk subsystem. Most disk accesses are random; by definition, this means that you're probably going to need to move the heads to retrieve a particular block of data. Because head positioning is a physical process, relatively speaking it's very slow. The benefit of striping in allowing parallel data transfer from multiple spindles is much less significant in random access because all the system components are awaiting relatively slow head positioning to occur. Therefore, striping does little to benefit any particular random-access disk transaction. Strangely, however, it does benefit random-access disk throughput as a whole, as explained in the following paragraphs.

Imagine a scene at your local hardware store. There's only one checkout line, and the owner is considering opening more. The single existing checkout line works well when the store isn't busy, but at peak times customers have to stand in line too long. Some customers pay cash, whereas others use credit cards.

The owner opens four additional checkout lines, but decides to dedicate particular lines to particular items-one line for garden supplies, one for paint, one for tools, and so forth. He notices that, although this scheme reduces the average wait, sometimes one checkout line has people waiting in it while other lines are free. His next step is to allow any of the five lines to process any type of item. He immediately notices a big drop in average wait time and is satisfied with this arrangement until he notices that the queues haven't completely disappeared. Because some individual transactions take longer than others, any given line may move unpredictably more slowly than others, leaving customers standing in line while other checkout lines are free. His final modification is to install a serpentine queue ahead of the checkout lines to allow each customer in turn to use whichever checkout line becomes free first.

In this example of a branch of mathematics called queuing theory, the checkout lines are analogous to the physical hard drives in the array, and the customers are analogous to disk transactions. Customers who pay cash are analogous to disk reads, and those who use a credit card are analogous to disk writes. Just as a checkout clerk can ring up only so many items in a given amount of time, even a very fast hard drive is limited in the number of disk transactions per second it can execute. Just like many people can show up at the checkout line almost simultaneously, a server can generate many more disk requests in a short period of time than the disk can process. Because server requests tend to be bursty-many requests occurring nearly simultaneously followed by a period with few or no requests-the disk subsystem must buffer or queue outstanding requests at times of peak demand and then process these requests as demand slackens.

Because striping distributes the single logical volume's data across several physical drives, each of which can process disk transactions independently of the other drives, striping provides the equivalent of additional dedicated checkout lines. Requests are routed to the physical drive that contains the data needed, thereby dividing a single long queue into two or more shorter queues, depending on the number of drives in the array. The number of drives over which the data is distributed is called the stripe width. Because each drive has its own spindle and head mechanism, these requests are processed in parallel, shortening the time required on average to fulfill any particular disk request.

In most servers with a disk subsystem bottleneck, the problem is found to be an unequal workload distribution among the physical disk drives. It's not uncommon to see servers with several physical hard drives in which 90 percent or more of the total disk activity is confined to just one of these drives. RAID 0 and striping addresses this problem by distributing the workload evenly and eliminating any single drive as a bottleneck. RAID 0 improves read and write performance for both random small block I/O and sequential large block I/O. What RAID 0 doesn't do is protect your data. There's no redundancy, and the loss of any single drive in a RAID 0 array renders the contents of the remaining drives useless.

One of the primary applications for RAID 0 is the capture and playback of high-quality digital video and audio data. By adding multiple wide-SCSI 2 drives, such as the 4.3G Seagate Barracuda ST15150W, to a chain of devices connected to an Adaptec AHA-3940UW SCSI host adapter, you can obtain sustained data-transfer rates up to almost the 40M/s rating of the host adapter. Such data rates can support the 270Mbps (megabits per second) data rate of decompressed, component digital video that conforms to the international ITU-R BT.601 (D-1) standard used for broadcast television. This application for RAID 0 pertains to high-performance Windows NT Workstation 4.0 installations, but not to conventional network servers.

RAID 1

What do you do to make sure that you don't suffer by losing something? The obvious answer is to make a copy of it. RAID 1 works this way, making two complete copies of everything to mirrored or duplexed pairs of disk drives. This 100 percent redundancy means that if you lose a drive in a RAID 1 array, you have another drive with an exact duplicate of the failed drive's contents. RAID 1, shown in figure 7.2, offers the greatest level of redundancy, but at the highest cost for disk drives.


7.2

A diagram of RAID 1 (mirroring or duplexing) with two drives.

Mirroring means that each disk drive has a twin. Anything written to one drive is also written to the second drive simultaneously. Mirroring is 100 duplication of your drives. If one drive fails, its twin can replace it without loss of data.

Mirroring has two disadvantages:

Mirroring has two advantages:

Duplexing is similar to mirroring, but it adds a second host adapter to control the second drive or set of drives. The only disadvantage of duplexing, relative to mirroring, is the cost of the second host adapter-although duplex host adapters, such as the Adaptec AHA-3940W, are less costly than buying two AHA-2940W or AHA-2940UW single-host adapters. Duplexing eliminates the host adapter as a single point of failure.

RAID 1 is the most common level used in mainframes, where cost has always been a low priority relative to data safety. The rapidly dropping cost of disk storage has made RAID 1 a popular choice in PC LAN servers as well. Conventional wisdom says that RAID 1 is the most expensive RAID implementation, due to the requirement for buying twice as many disk drives. In reality, RAID 1 may be the most expensive way to implement RAID, or it may be the least expensive, depending on your environment.

In a large server environment, the cost of duplicating every disk drive quickly adds up, making RAID 1 very expensive. With smaller servers, however, the economics can be very different. If your server has only one SCSI hard disk drive installed, you may find that you can implement RAID 1 for only the relatively small cost of buying another similar disk drive. When this book was written, the cost of 1G of high-performance SCSI-2 storage was about $250, based on the $1,150 street price of a 4.3G Seagate ST15150W drive.

RAID 1 is provided as a standard software feature with most network operating systems, including Windows NT Server 4.0. You may also find that your SCSI host adapter offers RAID 1 support-although it may be called something else. If the host adapter manual doesn't mention RAID 1, check for references to hardware support for mirroring or duplexing. If you find that your SCSI adapter does support hardware mirroring, you have what you need to implement RAID 1 mirroring in hardware. Simply install another drive similar to the existing one (or identical, depending on the host adapter requirements), reconfigure it in accordance with the directions in the manual, and you're running RAID 1. If you have a choice between using either Windows NT Server 4.0 native software RAID 1 support or that provided by your SCSI host bus adapter, choose the hardware solution. Implementing RAID in hardware offers better performance and doesn't put any additional load on the server.

RAID 1 Read Performance.

RAID 1 reads are usually faster than those of a stand-alone drive. To return to the hardware store analogy, there are now multiple checkout lines, each of which can handle any customer. With RAID 1 reads, any given block of data can be read from either drive, thereby shortening queues, lowering drive usage, and increasing read performance. This increase occurs only with multithreaded reads. Single-threaded reads show no performance difference, just as though all but one of the checkout lines were closed.

Most RAID 1 implementations offer two alternative methods for optimizing read performance. The first is referred to as circular queue or round-robin scheduling. Using this method, read requests are simply alternated between the two physical drives, with each drive serving every second read request. This method equalizes the read workload between the drives and is particularly appropriate for random-access environments, where small amounts of data-record or block sized-are being accessed frequently. It's less appropriate for sequential-access environments, where large amounts of data are being retrieved. Most disk drives have buffers used to provide read-ahead optimization, and the drive hardware itself reads and stores the data immediately following the requested block on the assumption that this data is most likely to be requested next. Alternating small block requests between two physical drives can eliminate the benefit of this read-ahead buffering.

The second method used in RAID 1 to increase read performance is called geometric, regional, or assigned cylinder scheduling. This method depends on the fact that head positioning is by far the slowest activity that a disk drive engages in. By giving each of the two drives comprising the RAID 1 array responsibility for covering only half of the physical drive, this head positioning time can be minimized. For example, by using mirrored drives, each of which has 1,024 cylinders, the first drive might be assigned responsibility for fulfilling all requests for data that's stored on cylinders 0 through 511, with the second drive covering cylinders 512 through 1,023.

Although this method is superficially attractive, it seldom works in practice. First, few drives have their data distributed in such a way that any specific cylinder is equally likely to be accessed. Operating system files, swap files, user applications, and other frequently read files are likely to reside near the front of the disk. In this situation, your first disk may be assigned literally 90 percent or more of the read requests. Second, even if the data were to be distributed to equalize access across the portion of the disk occupied by data, few people run their drives at full capacity, so the second drive would have correspondingly less to do. This problem can be addressed by allowing a user-defined split ratio, perhaps assigning disk 1 to cover the first 10 percent or 20 percent of the physical drive area and disk 2 to cover the remainder. In practice, no known RAID 1 systems allow user tuning to this extent.

RAID 1 Write Performance.

RAID 1 writes are more problematic. Because all data has to be written to both drives, it appears that there's a situation where customers have to go through one checkout line to complete a transaction. They then have to go to the back of the other checkout line, wait in the queue, and then again complete the same transaction at the other register. RAID 1, therefore, provides a high level of data safety by replicating all data, an increase in read performance by allowing either physical drive to fulfill the read request, and a lower level of write performance due to the necessity of writing the same information to both drives.

Overall RAID 1 Performance.

It might seem that RAID 1 would have little overall impact on performance, because the increase in read performance would be balanced by the decrease in write performance. In reality, this is seldom the case.

First, in most server environments, reads greatly outnumber writes. In a database, for example, any particular record may be read 10 times or 100 times for every single time it's written. Similarly, operating system executables, user application program files, and overlays are essentially read only. Any factor that benefits read performance at the expense of write performance will greatly increase overall performance for most servers most of the time.

Second, although it may seem reasonable to assume that writing to two separate drives would halve write performance, in reality the performance hit is usually only 10 percent to 20 percent for mirrored writes. Here's why: Although both physical writes must be executed before the logical write to the array can be considered complete, and the two write requests themselves are generated serially, the actual physical writes to the two drives occur in parallel. Because it's the head positioning and subsequent writing that occupy the bulk of the time required for the entire transaction, the extra time needed to generate the second write request has just a small impact on the total time required to complete the write.

RAID 2

RAID 2 is a proprietary RAID architecture patented by Thinking Machines Inc. RAID 2 distributes the data across multiple drives at the bit level. RAID 2 uses multiple dedicated disks to store parity information and, thus, requires that an array contain a relatively large number of individual disk drives. For example, a RAID 2 array with four data drives requires three dedicated parity drives. RAID 2 has the highest redundancy of any of the parity-oriented RAID schemes.

The bit-wise orientation of RAID 2 means that every disk access occurs in parallel. RAID 2 is optimized for applications such as imaging, which requires transfer of large amounts of contiguous data.

RAID 2 isn't a good choice for random-access applications, which require frequent, small reads and writes. The amount of processing overhead needed to fragment and reassemble data makes RAID 2 slow, relative to other RAID levels. The large number of dedicated parity drives required makes RAID 2 expensive. Because nearly all PC LAN environments have heavy random-disk access, RAID 2 has no place in a PC LAN. However, RAID 2 does have some specific advantages for special-purpose digital video servers.

RAID 3

RAID 3, shown in figure 7.3, stripes data across drives, usually at the byte level, although bit-level implementations are possible. RAID 3 dedicates one drive in the array to storing parity information.


7.3

A diagram of RAID 3 (byte striping with dedicated parity disk) with three drives.

Like RAID 2, RAID 3 is optimized for long sequential disk accesses in applications such as imaging and digital video storage, and is inappropriate for random-access environments such as PC LANs. Any single drive in a RAID 3 array can fail without causing data loss, because the data can be reconstructed from the remaining drives. RAID 3 is sometimes offered as an option on PC-based RAID controllers, but is seldom used.

RAID 3 can be considered an extension of RAID 0, in that RAID 3 stripes small chunks of data across multiple physical drives. In a RAID 3 array that comprises four physical drives, for example, the first block is written to the first physical drive, the second block to the second drive, and the third block to the third drive. The fourth block isn't written to the fourth drive, however; it's written to the first drive to begin the round-robin again.

The fourth drive isn't used directly to store user data. Instead, the fourth drive stores the results of parity calculations performed on the data written to the first three drives. This small chunk striping provides good performance on large amounts of data, because all three data drives operate in parallel. The fourth, or parity, drive provides the redundancy to ensure that the loss of any one drive doesn't cause the array to lose data.

For sequential data transfers, RAID 3 offers high performance due to striping, and low cost due to its reliance on a single parity drive. It's this single parity drive, however, that's the downfall of RAID 3 for most PC LAN applications. By definition, no read to a RAID 3 array requires that the parity drive be accessed unless data corruption has occurred on one or more of the data drives. Reads, therefore, proceed quickly. However, every write to a RAID 3 array requires that the single parity drive be accessed and written to in order to store the parity information for the data write that just occurred. The random access typical of a PC LAN environment means that the parity drive in a RAID 3 array is overused, with long queues for pending writes, whereas the data drives are underused because they can't proceed until parity information is written to the dedicated parity drive.

To return to the hardware store analogy, RAID 3 allows multiple checkout lines, all but one of which accept only cash. The sole remaining checkout line accepts only credit cards. As long as most of your customers pay cash, this scheme works well. If, instead, many customers decide to pay by credit card, the queue for the single checkout line that accepts credit cards grows longer and longer while the checkout clerks in the cash lines have nothing to do. In the same way, RAID 3 works well in read-intensive environments, but it breaks down in the random-access read/write environments typical of PC LANs.

RAID 3 is a common option on hardware RAID implementations. In practical terms, RAID 5 is a universally available option and is usually used in preference to RAID 3, because it offers most of the advantages of RAID 3 and has none of the drawbacks. Consider using RAID 3 only in very specialized applications where large sequential reads predominate-for example, a dedicated imaging server or for distributing (but not capturing) digital video data. Otherwise, use RAID 5.

RAID 4

RAID 4 is similar to RAID 3, except RAID 4 stripes data at the block or sector level rather than at the byte level, thereby providing better read performance than RAID 3 for small random reads. The small chunk size of RAID 3 means that every read requires participation from every disk in the array. The disks in a RAID 3 array are, therefore, referred to as being synchronized, or coupled. The larger chunk size used in RAID 4 means that small, random reads can be completed by accessing only a single disk drive instead of all data drives. RAID 4 drives are, therefore, referred to as being unsynchronized, or decoupled.

Like RAID 3, RAID 4 suffers from having a single, dedicated parity disk that must be accessed for every write. RAID 4 has all the drawbacks of RAID 3 and doesn't have the performance advantage of RAID 3 on large read transactions. About the only environment for which RAID 4 would make any sense at all is one in which nearly 100 percent of disk activity is small random reads. Because this situation isn't seen in real-world server environments, don't consider using RAID 4 for your PC LAN.

RAID 5

RAID 5, shown in figure 7.4, is the most common RAID level used in PC LAN environments. RAID 5 stripes both user and parity data across all the drives in the array, consuming the equivalent of one drive for parity information.


7.4

A diagram of RAID 5 (sector striping with distributed parity) with five drives.

With RAID 5, all drives are of the same size, and one drive is unavailable to the operating system. For example, in a RAID 5 array with three 1G drives, the equivalent of one of those drives is used for parity, leaving 2G visible to the operating system. Adding a fourth 1G drive to the array, the equivalent of one drive is still used for parity, leaving 3G visible to the operating system.

RAID 5 is optimized for transaction processing activity, in which users frequently read and write relatively small amounts of data. It's the best RAID level for nearly any PC LAN environment, and is particularly well-suited for database servers.

The single most important weakness of RAID levels 2 through 4 is that they dedicate a single physical disk drive to parity information. Reads don't require accessing the parity drive, so they aren't degraded. This parity drive must be accessed for each write to the array, however, so RAID levels 2 through 4 don't allow parallel writes. RAID 5 eliminates this bottleneck by striping the parity data onto all physical drives in the array, thereby allowing both parallel reads and writes.

RAID 5 Read Performance.

RAID 5 reads, like RAID levels 2 through 4 reads, don't require access to parity information unless one or more of the data stripes is unreadable. Because both data and parity stripes are optimized for sequential read performance where the block size of the requested data is a multiple of the stripe width, RAID 5 offers random read performance similar to that of RAID 3. Because RAID 5 allows parallel reads (unlike RAID 3), RAID 5 offers substantially better performance on random reads.

RAID 5 matches or exceeds RAID 0 performance on sequential reads because RAID 5 stripes the data across one more physical drive than does RAID 0. RAID 5 performance on random reads at least equals RAID 0, and it's usually somewhat better.

RAID 5 Write Performance.

RAID 5 writes are more problematic. A RAID 0 single-block write involves only one access to one physical disk to complete the write. With RAID 5, the situation is considerably more complex. In the simplest case, two reads are required-one for the existing data block and the other for the existing parity block. Parity is recalculated for the stripe set based on these reads and the contents of the pending write. Two writes are then required-one for the data block itself and one for the revised parity block. Completing a single write, therefore, requires a minimum of four disk operations, compared with the single operation required by RAID 0.

The situation worsens when you consider what must be done to maintain data integrity. The modified data block is written to disk before the modified parity block. If a system failure occurs, the data block may be written successfully to disk, but the newly calculated parity block may be lost. This leaves new data with old parity and thereby corrupts the disk. Such a situation must be avoided at all costs.

Transaction Processing with RAID 5.

RAID 5 addresses the problem of keeping data blocks and parity blocks synchronized by borrowing a concept from database transaction processing. Transaction processing is so named because it treats multiple component parts of a related whole as a single transaction. Either the whole transaction completes successfully, or none of it does.

For example, when you transfer money from your checking account to your savings account, your savings account is increased by the amount of the transfer and, at the same time, your checking account is reduced by the same amount. This transaction obviously involves updates to at least two separate records, and possibly more. It wouldn't do at all to have one of these record updates succeed and the other fail. On the one hand, you'd be very upset if your checking account was decreased without a corresponding increase in your savings account. Similarly, the bank wouldn't like it much if your savings account was increased but your checking account stayed the same.

The way around this problem is a process called two-phase commit. Rather than simply write the altered records individually, a two-phase commit first creates a snapshot image of the entire transaction and stores it. It then updates the affected records and verifies that all components of the transaction completed successfully. After this is verified, the snapshot image is deleted. If the transaction fails, the snapshot image can be used to roll back the status of the records that had already been updated, leaving the system in an unmodified and coherent state.

RAID 5 uses a two-phase commit process to ensure data integrity, further increasing write overhead. It first does a parallel read of every data block belonging to the affected stripe set, calculating a new parity block based on this read and the contents of the new data block to be written. The changed data and newly calculated parity information are written to a log area, along with pointers to the correct locations. After the log information is written successfully, the changed data and parity information are written in parallel to the stripe set. When the RAID controller verifies that the entire transaction completed successfully, it deletes the log information.

Caching RAID 5 Data.

The two-phase commit process obviously introduces considerable overhead to the write process, and in theory slows RAID 5 writes by 50 percent or more, relative to RAID 0 writes. In practice, the situation isn't as bad as might be expected. Examining the process shows that the vast majority of extra time involved in these overhead operations is consumed by physical positioning of drive heads. This brings up the obvious question of caching.

On first glance, caching might appear to be of little use for drive arrays. Drive arrays range in size from a few gigabytes to the terabyte range. (A terabyte is 1,024 gigabytes.) Most arrays service mainly small random read requests; even frequent large sequential reads can be-in this context at least-considered random relative to the overall size of the array. Providing enough RAM to realistically do read caching on this amount of disk space would be prohibitive simply on the basis of cost. Even if you were willing to buy this much RAM, the overhead involved in doing cache searches and maintaining cache coherency would swamp any benefits you might otherwise gain.

Write caching, however, is a different story. Existing RAID 5 implementations avoid most of the lost time by relocating operations, where possible, from physical disk to non-volatile or battery-backed RAM. This caching, with deferred writes to frequently updated data, reduces overhead by an order of magnitude or more and allows real-world RAID 5 write performance that approaches that of less-capable RAID versions.

To return to the hardware store analogy, RAID 5 allows multiple checkout lines, all of which accept both cash (disk reads) and credit cards (disk writes). Because each checkout line is equipped with a scanner, it doesn't take much longer to process many items (large sequential disk access) than it does to process only a few items (small random disk access). As long as most customers pay cash, this scheme works well. The queues are short, transactions are completed quickly, and nobody has to wait long. Even though some customers pay by credit card, the queues remain relatively short because most transactions in any given queue will be cash. If, instead, many customers decide to pay by credit card, the queues at each checkout line grow longer because checkout clerks take much longer to process credit-card transactions than they do to accept cash. In the same way, RAID 5 works well in environments such as typical PC LANs, which involve mostly reads with less frequent writes.

Proprietary and Non-Standard RAID Levels

RAID is the hottest topic in mass storage today. Only a year or two ago, articles on RAID were seen only in magazines intended for LAN managers. Today, you find RAID systems discussed in mass-market computer magazines such as PC Computing. Inevitably, the first discussions of using RAID in workstations, rather than only in servers, are beginning to appear.

As is usually the case with a "hot product" category, manufacturers push the envelope to develop their own proprietary extensions to the standards-based architectures. And, as usual, some of these extensions originate with the engineering folks and represent real improvements to the genre. Others come from the marketing department and represent nothing but an attempt to gain a competitive advantage with vaporware.

RAID 6.

The term RAID 6 is now being used in at least three different ways. Some manufacturers simply take a RAID 5 array, add redundant power supplies and perhaps a hot spare disk, and refer to this configuration as RAID 6. Others add an additional disk to the array to increase redundancy, allowing the array to suffer simultaneous failure of two disks without causing data loss. Still others modify the striping method used by RAID 5 and refer to the result as RAID 6.

Any of these modifications may yield worthwhile improvements. Be aware, however, that when you see the term RAID 6 used, you must question the vendor carefully as to exactly what's meant by the vendor's interpretation of RAID 6.

RAID 7.

RAID 7 is patented by Storage Computer Corporation. From published documents, it appears that RAID 7, architecturally, most resembles RAID 4, with the addition of caching. RAID 7 uses a dedicated microprocessor-driven controller running an embedded propriety real-time operating system called SOS. Storage Computer equips its arrays with dual fast SCSI-2 multiple channel adapters, allowing one array to be simultaneously connected to more than one host, including mainframes, minicomputers, and PC LAN servers.

Storage Computer claims that RAID 7 provides performance equal to or better than RAID 3 on large sequential reads, while at the same time equaling or bettering RAID 5 on small random reads and writes. Anecdotal reports claim performance increases of between three and nine times, compared with traditional RAID 3 and RAID 5 arrays.

The claimed benefits of RAID 7 have been hotly debated on the Internet since RAID 7 was introduced. Some of those posting comments have reported significant increases in performance, whereas others have questioned the benefits and even the safety of RAID 7, particularly in a UNIX environment. The jury is still out on RAID 7.

Stacked RAID

One characteristic of all RAID implementations is that the array is seen as a single logical disk drive by the host operating system. This means that it's possible to stack arrays, with the host using one RAID level to control an array of arrays, in which individual disk drives are replaced with second-level arrays operating at the same or a different RAID level. Using stacked arrays allows you to gain the individual benefits of more than one RAID level while offsetting the drawbacks of each. In essence, stacking makes the high-performance RAID element visible to the host while concealing the low-performance RAID element used to provide data redundancy.

One common stacked RAID implementation is referred to as RAID 0/1, which is also marketed as a proprietary implementation called RAID 10 (see fig. 7.5). This method combines the performance of RAID 0 striping with the redundancy of RAID 1 mirroring. RAID 0/1 simply replaces each individual disk drive used in a RAID 0 array with a RAID 1 array. The host computer sees the array as a simple RAID 0, so performance is enhanced to RAID 0 levels. Each drive component of the RAID 0 array is actually a RAID 1 mirrored set; thus, data safety is at the same level you would expect from a full mirrored set.


7.5

A diagram of RAID 0/1 (sector striping to mirrored target arrays) with four drives.

Other stacked RAID implementations are possible. For example, replacing the individual drives in a RAID 5 array with subsidiary RAID 3 arrays results in a RAID 53 configuration.

Another benefit of stacking is in building very large capacity arrays. For reasons described earlier, RAID 5 is the most popular choice for PC LAN arrays. However, for technical reasons described later, a RAID 5 array should normally be limited to five or six disk drives. The largest disk drives available when this book was written for PC LANs hold about 9G, placing the upper limit on a simple RAID 5 array at about 50G. Replacing the individual disk drives in a simple RAID 5 array with subsidiary RAID 5 arrays allows extending this maximum to 250G or more. In theory, it's possible to use three tiers of RAID-an array of arrays of arrays-to further extend capacity to the terabyte range.

Seagate has announced a 23G version of its 5 1/4-inch Elite product line, which increases the practical upper size limit of a simple RAID 5 array to about 150G.

Sizing the Array

So far this chapter has discussed redundancy, but it hasn't explained in detail what happens when a drive fails. In the case of RAID 0, the answer is obvious. The failed drive contained half of your data, and the half remaining on the good drive is unusable. With RAID 1, the answer is equally obvious. The failed drive was an exact duplicate of the remaining good drive, all your data is still available, and all your redundancy is gone until you replace the failed drive. With RAID 3 and RAID 5, the issue becomes much more complex.

Because RAID 3 and RAID 5 use parity to provide data redundancy rather than physically replicate the data as does RAID 1, the implications of a drive failure aren't as obvious. In RAID 3, the failure of the parity drive has no effect on reads, because the parity drive is never accessed for reads. For RAID 3 writes, failure of the parity drive removes all redundancy until the drive is replaced, because all parity information is stored on that single drive. When a data drive fails in a RAID 3 array, the situation becomes more complicated. Reads of data formerly stored on the failed drive must be reconstructed using the contents of the other data drives and the parity drive. This results in a greatly increased number of read accesses and correspondingly lowered performance.

With RAID 5, the situation is similar to a failed RAID 3 data drive. Because every drive in a RAID 5 array contains data and parity information, the failure of any drive results in the loss of both data and parity. An attempt to read data formerly residing on the failed drive requires that every remaining drive in the array be read and parity used to recalculate the missing data. In a RAID 5 array containing 15 drives, for example, a read and reconstruction of lost data would require 14 separate read operations and a recalculation before a single block of data could be returned to the host. Writes to a RAID 5 array with one failed drive also require numerous disk accesses.

To make matters worse, when the failed drive is replaced, its contents must be reconstructed and stored on the replacement drive. This process, usually referred to as automatic rebuild, normally occurs in the background while the array continues to fulfill user requests. Because the automatic rebuild process requires heavy disk access to all the other drives in an already crippled array, performance of the array can degrade unacceptably. The best way to limit this degradation is to use a reasonably small stripe width, limiting the number of physical drives in the array to five or six at most.

Specifying a RAID Implementation

Up to this point, the discussion of RAID levels has been limited to theoretical issues, such as relative performance in read and write mode. The following sections offer concrete recommendations to implement a RAID subsystem for your Windows NT Server 4.0.

Picking the Best RAID Level for Your Needs

In theory, there are two important considerations in selecting the best RAID implementation for your particular needs.

The first consideration is the type of data to be stored on the array. The various RAID levels are optimized for differing storage requirements. The relative importance in your environment of small random reads versus large sequential reads and of small random writes versus large sequential writes, as well as the overall percentage of reads versus writes, determines-in theory, at least-the best RAID level to use.

The second consideration is the relative importance to you of performance versus the safety of your data. If data safety is paramount, you may choose a lower performing alternative that offers greater redundancy. Conversely, if sheer performance is the primary issue, you may choose a higher performing alternative that offers little or no redundancy and instead use backups and other means to ensure the safety of your data.

Always lurking in the background is also, of course, the real-world issue of cost.

These issues can be summarized for each RAID level as follows:

Understanding RAID Product Features

You can implement RAID in various ways. The SCSI host adapter in your current server may provide simple RAID functionality. You can replace your current SCSI host adapter with a host adapter that offers full RAID support. Software RAID support is provided natively by most network operating systems, including Windows NT Server 4.0. If you're buying new server hardware, chances are that the vendor provides hardware RAID support standard or as an option. If you need to upgrade your existing server, you can choose among various external RAID arrays that provide features, functionality, and performance similar to that provided by internal server RAID arrays.

Each method has advantages and drawbacks in terms of cost, performance, features, and convenience. Only you can decide which method best suits your needs. The following sections describe the most important features of various RAID implementations and the tradeoffs involved with each implementation.

Hot Swappable Disks and Hot Spare Disks.

Most external RAID subsystems, and many servers with internal RAID subsystems, allow hard disk drives to be removed and replaced without turning off the server. This feature, known as hot swapping, allows a failed disk drive to be replaced without interrupting ongoing server operations.

A similar method, known as hot sparing, goes one step further by providing a spare drive that's installed and powered up at all times. This drive can automatically take the place of a failed drive on a moment's notice. Most systems that provide hot sparing also support hot swapping, to allow the failed drive to be replaced at your leisure.

Obviously, the drive itself and the system case must be designed to allow hot swapping and/or hot sparing. Most internal server RAID arrays and nearly all external RAID arrays are designed with front external access to hard drives for this reason. Hot swapping is a necessity for production servers; hot sparing is a very desirable option.

Automatic Rebuild.

With either hot swapping or hot sparing, the integrity of the array itself is restored by doing a rebuild to reconstruct the data formerly contained on the failed drive and to re-create it on the replacement drive. Because rebuilding is a very resource-intensive process, a well-designed RAID subsystem gives you the choice of taking the array down and doing a static rebuild, or of allowing the rebuild to occur dynamically in the background while the array continues to service user requests. Ideally, the array should also let you specify a priority level for the background rebuild, allowing you to balance users' need for performance against the time needed to re-establish redundancy.

In practice, performance on an array with a failed drive-particularly on a RAID 5 array-may already be degraded to the extent that trying any sort of rebuild while users continue to access the array isn't realistic. The best solution in this case is usually to allow the users to continue to use the array (as is) for the rest of the day and then to do the rebuild overnight. Your choice in this situation is a static rebuild, which is far faster than a dynamic background rebuild.

Disk Drive Issues.

In the scramble to choose the best RAID level, fastest RAID controller, and so forth, one issue that's frequently overlooked is that of the disk drives themselves. The RAID implementation you select determines how much flexibility you have in choosing the drives to go with it.

External RAID arrays and internal server RAID arrays typically offer the least flexibility in choice of drives. Although these subsystems use industry-standard disk drives, the standard drives are repackaged into different physical form factors to accommodate custom drive bay designs as well as the proprietary power and data connections needed to allow hot swapping. Because these proprietary designs fit only one manufacturer's servers or even just one particular model, they're made and sold in relatively small numbers. This combination of low volume and a single source makes the drives quite expensive.

Another related issue is that of continuing availability of compatible drives. Consider what might happen a year or two from now when you want to upgrade or replace drives. The best designs simply enclose industry-standard drives in a custom chassis that provides the mechanical and electrical connections needed to fit the array. These designs allow the user to upgrade or replace drives simply by installing a new supported standard drive in the custom chassis. Beware of other designs that make the chassis an integral part of the drive assembly. You'll pay a high price for replacement drives, if you can find them at all.

Third-party hardware RAID controllers offer more flexibility in choosing drives, at the expense of not providing hot swapping. These controllers simply replace your existing standard SCSI host adapter and are designed to support standard SCSI disk drives. The situation isn't quite as simple as it seems, however. You might reasonably expect these controllers to be able to use any standard SCSI drive. The reality is different. Most of these controllers support only a very limited number of disk drive models, and they often specify the exact ROM revision level required on the drive. Before you buy such a controller, make sure that the drives you intend to use appear on this compatibility list. Also, make sure that the controller's drive tables can be easily updated via flash ROM (or similar means), and that the manufacturer has a history of providing such updates.

Software-based RAID offers the most flexibility in selecting drives. Because software-based RAID subsystems are further isolated from the disk drives than are hardware-based RAID implementations, most software-based RAID implementations-both those native to NOSs and those provided by third parties-care little about the specifics of your disk drives. Software RAID depends on a standard SCSI host adapter to communicate with the disk drives. As long as your host adapter is supported by your software and your drives are in turn supported by the host adapter, you're likely to find few compatibility problems with software-based RAID. Typical software-based mirroring, for example, doesn't even require that the second drive in a mirror set be identical to the first, but simply that it's at least as large as the first drive. Windows NT Server's implementation of software-based RAID is described later in the section "Understanding Windows NT Server 4.0 Software RAID."

Power Supplies.

Most external RAID arrays and some internal server RAID arrays use dedicated redundant power supplies for the disk drives. The arrangement of these power supplies significantly affects the reliability of the array as a whole. Some systems provide a dedicated power supply for each individual disk drive. Although this seems to increase redundancy superficially, in fact it simply adds more single points of failure to the drive component. Failure of a power supply means failure of the drive that it powers. Whether the failure is the result of a dead drive or a dead power supply, the result is the same.

A better solution is to use dual load-sharing power supplies. In this arrangement, each power supply can power the entire array on its own. The dual power supplies are linked in a harness that allows each to provide half of the power needed by the array. If one power supply fails, the other provides all the power needed by the array until the failed unit can be replaced. Another benefit of this arrangement is that because the power supplies normally run well below their full capacity, their lives are extended and their reliability is enhanced when compared with a single power supply running at or near capacity. Power supplies also can be hot swappable (although this feature is more commonly called hot pluggable when referring to power supplies).

Stacked and Multiple-Level Independent RAID Support.

Some environments require a stacked array for performance, redundancy, or sizing reasons. Others may require multiple independent arrays, each running a different RAID level or mix of RAID levels. If you find yourself in either situation, the best solution is probably either an external RAID array or a high-end internal server RAID array.

The obvious issue is whether a given RAID implementation offers the functionality needed to provide stacks and multiple independent arrays. The not-so-obvious issue is the sheer number of drives that must be supported.

External RAID arrays support many disk drives in their base chassis, and they usually allow expansion chassis to be daisy-chained, extending the maximum number of disks supported even further. High-end servers support as many as 28 disk drives internally, and again often make provision for extending this number via external chassis. Mid-range servers are typically more limited, both in the number of drives they physically support and in their provisions for stacking and multiple independent arrays. A typical mid-range server RAID array doesn't support multiple independent arrays, but it may offer simple RAID 0/1 stacking.

Manageability.

Look for a RAID implementation that provides good management software. In addition to providing automatic static and dynamic rebuild options, a good RAID management package monitors your array for loading, error rates, read and write statistics by type, and other key performance data. The better packages even help you decide how to configure your RAID array for optimum performance.

Implementing RAID in Hardware

Hardware-based RAID implementations usually offer the best performance for a given choice of RAID level and drive performance. Another advantage of hardware RAID is that server resources aren't devoted to calculating parity and determining which drive is to receive which block of data. The following sections offer recommendations for the specification of hardware-based RAID subsystems for your server.

RAID as a Server Option.

If you're buying a new server, by all means consider the RAID options offered by the server manufacturer. Any system seriously positioned for use as a server offers RAID as an option, if not as standard equipment. Low-end servers may offer RAID as an option. Mid-range and high-end servers come standard with RAID and often offer optional external enclosures to expand your disk storage beyond that available in the server chassis alone.

Purchasing RAID as a part of your server has the following advantages, most of which are related to single-source procurement:

Upgrading an Existing Server to Hardware RAID.

If your current server is otherwise suitable, upgrading the server to hardware RAID may be a viable alternative to buying a new server. This upgrade can range from something as simple and inexpensive as adding another disk drive and enabling mirroring on your SCSI host adapter, to a process as complex, and potentially expensive, as adding an external RAID array cabinet. Somewhere in the middle (in cost and complexity) is replacing your existing SCSI host adapter with a dedicated RAID controller.

Each solution provides the basic reliability and performance benefits of hardware RAID. Each solution varies in the level of the features, convenience, and extended RAID functionality it provides.

Mirroring with Your Current SCSI Host Adapter.

The SCSI host adapter in your current server may support RAID 0, RAID 1, or both. Even if it doesn't support simple RAID, replacing the host adapter with one that offers RAID 0 or RAID 1 support is an inexpensive alternative. If your server has only one or two SCSI hard drives, this method allows you to implement mirroring at the cost of simply buying a matching drive for each existing drive.

This approach buys you 100 percent redundancy and decent performance, and does so inexpensively. What it doesn't provide are other features of more expensive hardware RAID implementations, such as hot swappable drives and redundant power supplies. Still, for smaller servers, this is a set-and-forget choice. If your server is small enough that buying the extra disk drives is feasible, and if you don't care that you'll need to take down the server to replace a failed drive, this method may well be the best choice. It gives you about 95 percent of the benefits of a full-blown RAID 5 implementation for a fraction of the cost.

Adding a Dedicated RAID 5 Controller Card.

The next step up in hardware RAID, in terms of cost and performance, are the dedicated RAID controller cards. These cards replace your existing SCSI host adapter and include a dedicated microprocessor to handle RAID 5 processing. They range in price from less than $1,000 to perhaps $2,500, depending on their feature sets, the number of SCSI channels provided, the amount and type of on-board cache supplied, and other accouterments.

All of these cards support at least RAID 1 and RAID 5, and most offer a full range of RAID levels, often including various enhanced non-standard RAIDs. The Adaptec AAA-130 host adapter, as an example, is a low-cost (about $500 street price) host adapter that supports RAID 0, 1, 5, and 0/1 (10) with hot-swappable drives in the RAID 5 configuration. The AAA-130 is designed for entry-level servers, which Adaptec Inc. defines as serving 60 or fewer clients. Figure 7.6 shows the Adaptec AAC-330 host adapter designed for mid-range servers. The AAC-330 uses an Intel i960 RISC microprocessor for improved performance when serving a large number of clients.


7.6

The Adaptec AAC-330 RAIDport host adapter, which uses an Intel i960 RISC processor to provide mid-range server capabilities. (Courtesy of Adaptec Inc.)

The price of dedicated RAID controller cards has been coming down rapidly, both due to increasing sales volume and to competition from RAID software alternatives. The best examples of these cards offer RAID functionality and performance comparable to internal server RAID arrays and external RAID array enclosures. In terms of convenience features, these adapter cards suffer an inherent disadvantage, because they don't provide hot-swap capabilities, redundant power supplies, and other high-end features.

Most RAID controller cards are sold through original equipment manufacturer (OEM) arrangements with server vendors. For instance, the Mylex DAC960-one of the better examples of this type of adapter-is used by Hewlett-Packard to provide RAID support in its NetServer line of servers. HP modifies the BIOS and makes other changes to optimize the DAC960 for use in the company's servers. The Adaptec AAC330 is sold through OEM and VAR (value-added reseller) channels; the Adaptec AAA130 is available from distributors and some computer retailers.

Think long and hard before you decide to buy one of these cards as an individual item, rather than as a part of a packaged solution. Although the card itself appears to be inexpensive relative to the quoted price for an external RAID enclosure, you usually find that after adding up the cost of disk drives, cabling, and possibly an external enclosure, you meet or exceed the price of the turnkey solution. It's still up to you to do the systems integration, locate and install the appropriate disks and drivers, and maintain the subsystem. If you decide to use one of these cards, budget for two of them. Few organizations accept having their LAN down for an extended period if the RAID controller fails. On-site maintenance is the exception rather than the rule for these cards. Even doing a swap via overnight courier usually means that your LAN will be down for at least a day or two.

Using an External RAID Enclosure.

External RAID enclosures are the high end of hardware RAID. They offer all the features of internal server arrays and more. Hot-pluggable, load-balancing dual power supplies are a common feature, as are hot-swappable drives, extensive management capabilities, a full range of RAID options, and provision for stacked RAID. Most of these units support multiple independent RAID arrays, and some allow connection of more than one host. Most units also allow you to add additional slave enclosures to expand your disk capacity. As you might expect, all of this functionality substantially increases the acquisition cost.

External RAID subsystems are of two types. The first is based on one of the dedicated RAID controller cards described in the preceding section. In this type of unit, called a dumb external array, all the RAID intelligence is contained on the card installed in the server cabinet, and the external enclosure simply provides mounting space and power for the disk drives. The enclosure makes provision for hot-swapping and redundant power supplies, but the actual RAID functionality remains with the server. RAID configuration and management is done at the server. Although such subsystems physically resemble more sophisticated external arrays, in concept these units are really just simple extensions of the dedicated RAID controller card method, and accordingly are relatively inexpensive. They're usually priced in the $3,000 to $5,000 range for the enclosure and controller, before being populated with disk drives.

Dumb external arrays are often created from their component parts by mail-order and second- and third-tier computer companies to permit offering a RAID solution for their servers. These arrays suffer from most of the same drawbacks as the dedicated RAID controller cards-limited drive type support, infrequent driver updates, lack of on-site maintenance, and the possibility of vendor insolvency.

The second type of unit, called a smart external array, relocates RAID processing to the external enclosure and provides one or more SCSI connectors by which the host server (or servers) is connected to the array. The host server sees a smart external array as just another standard SCSI disk drive or drives.

With this type of smart array, RAID configuration and management are done at the array itself. Because these arrays are intended for use in diverse environments-including Novell NetWare, Microsoft Windows NT Server, and UNIX-these arrays usually offer various methods for setup and programming. A typical unit might be programmable in a UNIX environment by connecting a dumb terminal to a serial port on the external array or by using Telnet. In a Windows NT Server 4.0 environment, you use provided client software for the network operating system. These arrays have available full software support-drivers and management utilities-for several operating systems, although they often come standard with support for only one operating system of your choice. Support for additional operating systems and for extended functionality with your chosen operating system is often an extra-cost option. Smart external RAID arrays start at around $8,000 or $10,000, without drives, and go up rapidly from there.

Smart external arrays offer everything you might want in a RAID unit, including support for stacked RAID, multiple independent arrays, and multiple host support. Because manufacturers realize that these are critical components, on-site maintenance is available, provided either by the manufacturer or by a reputable third-party service organization. The construction of these units resembles minicomputer and mainframe practices rather than typical PC peripherals.

The first major concern when you use smart external arrays is drive support. Some units allow you to add or replace drives with any SCSI drive of the appropriate type that's at least as large as the old drive. Other units require that you use only drives that exactly match the existing drives by make, model, and sometimes even by ROM revision level. Still other units can use only drives supplied by the array manufacturer, because the drives themselves have had their firmware altered. These manufacturers tell you that these firmware changes are required for performance and compatibility reasons, which may be true. However, the net effect is that you can then buy new and replacement drives only from the array manufacturer, which is usually a very expensive alternative.

The second major concern is software support. With smart external arrays, you're at the mercy of the array manufacturer for NOS support, drivers, and management utilities. Make absolutely certain before buying one of these arrays that it has software support available for Windows NT Server 4.0. It does you no good to accept a vendor's assurance that the array supports Windows NT, only to find later that the array supports only versions 3.5 and earlier.

Check the array manufacturer's history of providing support for Windows NT upgrades immediately after release of the upgrade. Although this isn't a perfect means of prediction-companies can, and do, change-a history of frequent updates for Windows NT Server is a reasonable indicator that the company is committed to providing continuing support for its users.

External RAID enclosures can be your best choice, particularly if you require large amounts of disk storage, have multiple servers, or use more than one NOS. Don't rule out external RAID enclosures simply on the basis of sticker shock. Examine the true cost involved in acquiring, maintaining, and managing one of these units versus the cost involved, including increased staff time, of providing similar functionality using other means.

Understanding Windows NT Server 4.0 Software RAID

All the RAID implementations examined so far are implemented by specialized hardware. It's possible, however, to use the server CPU to perform RAID processing, thereby avoiding buying additional hardware. Microsoft Windows NT Server 4.0 includes as a standard feature RAID 0, RAID 1, and RAID 5 functionality built into the NOS software, allowing you to build a RAID subsystem using only standard SCSI host adapters and drives. Windows NT Server 4.0 provides these RAID options, so you might wonder why anyone would purchase expensive additional hardware to accomplish the same thing.

The first reason is performance. In theory, at least, using software RAID can have scalability performance advantages. Because software RAID runs as another process on the server, upgrading the server processor or increasing the number of processors simultaneously upgrades RAID processing. In practice, this potential advantage usually turns out to be illusory. Although Microsoft has done a good job of incorporating RAID functionality into Windows NT Server 4.0, a well-designed hardware RAID solution always offers better performance, particularly on larger arrays. Benchmark tests nearly always show software RAID bringing up the rear of the pack in performance relative to hardware RAID, and even Microsoft admits that its software RAID solution is outperformed by a well-designed hardware RAID. Also, although using the server CPU to perform RAID processing can be acceptable on a small or lightly loaded server, doing so on a more heavily loaded server-particularly one running as an application server-steals CPU time from user applications and, therefore, degrades overall server performance.

The second reason that may mandate against using Windows NT Server software RAID is that of flexibility, convenience, and server uptime. In terms of reliability, Windows NT Server software RAID secures your data just as well as hardware RAID does. What it doesn't do, however, is provide redundant power supplies, hot swapping of disks, background rebuild, and other hardware RAID features designed to minimize server downtime. As a result, a server running Windows NT software RAID is no more likely to lose data than a server running hardware RAID, but is considerably more likely to be down for extended periods while failed disk drives are repaired and rebuilt. Unless you're running Windows NT Server on a system equipped with hot swappable drives and other RAID amenities-which would usually be equipped with a hardware RAID controller anyway-this lost ability to hot swap drives and otherwise maintain the array without taking down the server may be unacceptable.

The three primary advantages of a RAID subsystem are increased data security, increased disk performance, and decreased server downtime. Windows NT Server 4.0 native software RAID, with its full support for RAID 1 and RAID 5, does an excellent job of securing your data. It does a reasonably good job of increasing disk subsystem performance, although not to the extent that a well-designed hardware RAID does. It's only in terms of decreasing server downtime that Windows NT Server software RAID falls short, and this is characteristic of any pure software RAID solution. If yours is a small array on a server supporting a limited number of users, the drawbacks of Windows NT Server software RAID may be an acceptable tradeoff for reduced costs. For larger arrays and critical environments, buy the appropriate RAID hardware solution.

Windows NT Server RAID Options.

On the reasonable assumption that software RAID is better than no RAID, using Windows NT Server 4.0 to provide RAID functionality makes sense, particularly for small servers that wouldn't otherwise be equipped with RAID functionality. Following are the RAID options available with Windows NT Server:

Configuration Considerations: Volume Sets, Extensibility, and Booting.

In addition to mirror sets and stripe sets, Microsoft Windows NT Server provides a similar disk-management function called a volume set. Volume sets, although often confused with RAID, provide neither the data safety of RAID 1 or RAID 5, nor the performance benefits of RAID 0. A volume set simply allows a single logical volume to span more than one physical disk drive. With a volume set, data isn't striped to multiple disk drives, but instead is written sequentially to each disk drive in the volume set as the preceding drive is filled. A volume set allows you to combine the capacity of two or more smaller disk drives to provide a single larger volume. Because volume sets are accessed as a single logical unit, the failure of any single drive in a volume set renders the data on the remaining drives inaccessible.

Because volume sets provide neither data redundancy nor performance benefits, and because by their nature they increase the chances of data loss due to drive failure, volume sets are normally a poor choice for configuring your disk storage. If your data storage requirements exceed the capacity of the largest disk drives available to you, a far better choice is to use RAID 5 to provide a single large volume with data redundancy. The chief advantage of volume sets is that, unlike stripe sets and mirror sets, they are dynamically extensible. If a volume set begins to fill up, you can increase the capacity of the volume set simply by installing another physical disk drive and adding its space to the existing volume set. If, on the other hand, a stripe set or mirror set approaches its total capacity, your only option is to tear down the existing set, add drive capacity, build a new set, and restore your data.

One final issue to consider before you decide to implement Windows NT Server software RAID is that of system booting. Windows NT Server doesn't allow the system to boot from either a stripe set or a stripe set with parity. It does allow the system to boot from a mirror set. This means that to implement a stripe set, your server must have at least three disk drives-one boot drive, and at least two drives to comprise the stripe set. Similarly, implementing a stripe set with parity requires at least four drives, one from which the system boots and at least three more drives for the volume of the stripe set with parity. It's therefore common for small Windows NT Server systems to have five disk drives, the first two of which comprise a mirror set from which the system boots and on which applications reside, and the final three of which comprise a stripe set with parity on which most of the data is stored.

Creating Windows NT Server Stripe and Mirror Sets

After you install a sufficient number of drives and make sure that your SCSI host adapter recognizes each drive, you can create one or more of the three levels of RAID supported by Windows NT Server 4.0. The following sections provide the instructions for implementing Windows NT 4.0's software RAID 0, 1, and 5. RAID 0 and 1 require two physical drives; RAID 5 requires three physical drives. Each drive must have unused space in which to create the stripe set or mirror volume.

See "Setting Up Your Fixed-Disk Drives," (Ch 6)

The computer used in the following example is OAKLEAF3, a 133MHz Pentium server that dual boots Windows 95 for digital video editing and Windows NT Server 4.0 for network test purposes. OAKLEAF3 is equipped with two Seagate ST15150W 4.3G Barracuda Wide SCSI drives (Bcuda1, Disk 0, and Bcuda2, Disk 1); a 1G Micropolis narrow SCSI drive (Microp1, Disk 2); and a Toshiba 4X SCSI CD-ROM drive connected to an Adaptec AHA-2940UW host controller. Bcuda1 has five 800K FAT partitions (Bcuda1p1 through Bcuda1p5), and Bcuda2 has two 4G FAT partitions (Bcuda2p1 and Bcuda2p2). Microp1 is divided into 800K (Microp1p1) and 200K (Microp1p2) partitions. The Bcuda1p4, Bcuda2p2, and Microp1p2 partitions were deleted (converted to free space) before performing the steps described in the following sections.

Creating RAID 0 Stripe Sets.

Creating a RAID 0 stripe set is the simplest of the three processes offered by Windows NT 4.0's Disk Administrator. To create a RAID 0 stripe set, proceed as follows:

  1. Log on as an administrator and run Disk Administrator by choosing Programs, Administrative Tools, and Disk Administrator from the Start menu.
  2. Click to select an unused area on the first disk drive.
  3. Physical drives commonly are called spindles to avoid confusion with logical drives created by multiple partitioning of a single physical drive.

  4. Pressing and holding the Ctrl key, click to select additional unused areas on other disk drives, up to a total of as many as 32 disk drives. You may select only one area on each disk drive. Figure 7.7 shows 800M and 2G areas of free space selected on Disk 0 and Disk 1, respectively.

  5. 7.7

    Selecting free space areas on two physical drives to create a RAID 0 stripe set.

  6. From the Partition menu, choose Create Stripe Set to open the Create Stripe Set dialog, which displays in the Create Stripe Set of Total Size box the default total size of the stripe set spanning all selected drives (see fig. 7.8). This total size takes into account the smallest area selected on any disk drive, adjusting (if necessary) the sizes of the areas on the other selected drives to set all to identical size. The total size value is approximately the size of the smallest disk area multiplied by the number of drives in the stripe set.

  7. 7.8

    Setting the total size of the stripe set in the Create Stripe Set dialog.

  8. Click OK to accept the default size. Windows NT Server prepares to create the stripe set and assigns the stripe set a single default drive letter (see fig. 7.9). At this point, no changes have been made to the drives.

  9. 7.9

    The stripe set before making the changes to the selected partitions.

  10. From the Partition menu choose Commit Changes Now. A dialog appears to inform you that changes have been made to your disk configuration (see fig. 7.10). Click Yes to accept and save the changes.

  11. 7.10

    Committing changes to your drives for the RAID 0 stripe set.

  12. A second message box appears to notify you that the update was successful (see fig. 7.11) and that you should save the disk configuration information and create a new emergency repair disk. (These steps are performed at the end of this procedure.) Click OK.

  13. 7.11

    Disk Administrator's confirmation of changes to your drive configuration.

    If another message box appears, to inform you that you must restart the computer for the changes to take effect, click OK to begin system shutdown. After Windows NT Server is restarted, again log on as an administrator and run Disk Administrator. The need to shut down and restart Windows NT depends on the status of the selected disk regions before the beginning of this process.

  14. Click to select the newly created but unformatted stripe set (see fig. 7.12). From the Tools menu choose Format to open the Format Drive D: dialog.

  15. 7.12

    Selecting the unformatted stripe set.

  16. Type a name for the volume in the Label text box (see fig. 7.13). You're given the opportunity to choose an NTFS or FAT file system; select NTFS (the default). Marking the Quick Format check box bypasses drive sector checking during the formatting process. Click OK to continue.

  17. 7.13

    Adding a volume label and selecting the file system type.

    If you haven't previously formatted the free space used for the stripe set, clear the Quick Format check box. A full format tests for bad sectors during the formatting process and marks bad sectors unusable in the drive's sector map.

  18. A confirmation dialog appears, to warn you that continuing overwrites the contents of the volume (see fig. 7.14). Click Yes to continue formatting.

  19. 7.14

    Confirming the formatting step.

  20. If you marked the Quick Format check box, the Quick Format progress indicator appears only briefly. For a conventional formatting operation, the progress indicator shown in figure 7.15 appears for a minute or more, depending on the size of the volume.

  21. 7.15

    The format progress indicator for a full format (Quick Format not selected).

  22. When formatting is complete, a dialog informs you of the total available disk space on the new volume (see fig. 7.16). Click OK. The new volume is ready for use (see fig. 7.17).

  23. 7.16

    The Format Complete dialog displaying the size of the stripe set volume.


    7.17

    Disk Administrator displaying the newly created RAID 0 stripe set.

    The status bar at the bottom of Disk Administrator's window isn't updated at the completion of the formatting process. To update the status bar, select another volume, and then reselect the newly created set.

Creating Drive Configuration and Emergency Repair Diskettes.

After making permanent changes to your drive configuration, always save the configuration changes, replace the repair information on the fixed disk, and create a new emergency repair disk. Follow these steps to assure that you can restore your existing configuration in the event of a system failure:

  1. To save the current drive configuration, choose Configuration and then Save from the Partition menu to display the Insert Disk dialog (see fig. 7.18). Insert a formatted diskette and click OK to save the configuration. In the event of a major catastrophe, you can use the diskette to restore the current drive configuration by choosing Configuration and then Restore from the Partition menu.

  2. 7.18

    Creating a drive configuration diskette.

  3. Choose Run from the Start menu, type rdisk in the Open text box, and click OK to open the Repair Disk Utility dialog (see fig. 7.19).

  4. 7.19

    The opening dialog of the Repair Disk Utility.

  5. Click the Update Repair Info button. The confirmation dialog shown in figure 7.20 appears. Click OK to continue the update process. The progress meter displays the status of the update (see fig. 7.21), which takes a minute or two.

  6. 7.20

    Confirming the update of locally stored repair information.


    7.21

    Metering the progress of updating the locally stored repair information.

  7. After the local repair information is updated, a message box appears, asking whether you want to create an emergency repair diskette (see fig. 7.22). After updating local repair information, it's imperative that you create a new emergency repair diskette. Insert a diskette, which doesn't need to be formatted, in the diskette drive and click Yes.

  8. 7.22

    Creating the updated emergency repair diskette.

  9. Replace your existing emergency repair diskette with the new emergency repair diskette, which should be stored in a safe location. (Consider making two emergency repair diskettes, storing one diskette off site.) The old diskette is unusable with the new repair information stored on the local fixed disk.

If you don't create an emergency repair diskette after updating local repair information, it's likely that the repair diskette won't work when used in the event of a major system failure. In this case, your only alternative is to reinstall Windows NT Server 4.0.

Creating RAID 1 Mirror Sets.

Mirror sets differ from stripe sets; whereas stripe sets may span as many as 32 drives, mirror sets are created on a paired drive basis. You must first create a standard formatted volume, and then create the mirror drive.

Creating and Formatting a New Standard Volume.

To create and format a new separate volume from the free space available on a single drive, follow these steps:

  1. Log on as an administrator and run Disk Administrator by choosing Programs, Administrative Tools, and Disk Administrator from the Start menu.
  2. Click to select an unused area on the fixed disk drive (see fig. 7.23).

  3. 7.23

    Selecting an unused area of a drive in which to create a new standard volume.

  4. From the Partition menu, choose Create to open the Create Logical Drive dialog, which displays in the Create Logical Drive of Size box the default total size of the free space of the selected drive (see fig. 7.24). Accept the default size unless you want to create a volume of a smaller size.

  5. 7.24

    Setting the size of the volume in the Create Logical Drive dialog.

  6. Click OK. Windows NT Server prepares to create the stripe set, assigning the volume a single default drive letter. At this point no changes have been made to the drive and the drive (H in fig. 7.25) is disabled.

  7. 7.25

    The proposed volume, before making the changes to the selected partition.

    You can't format the inactive volume shown in figure 7.25. The Tools menu's Format option is disabled at that point. You must commit the changes to create the new volume's partition, and then format the partition.

  8. From the Partition menu choose Commit Changes Now. A dialog informs you that changes have been made to your disk configuration (refer to fig. 7.10). Click Yes to accept and save the changes. A second message box appears, to notify you that the update was successful, and that you should save the disk configuration information and create a new emergency repair disk (refer to fig. 7.11). These two steps are performed after you complete the drive reconfiguration process. Click OK.
  9. If another message box appears, to inform you that you must restart the computer for the changes to take effect, click OK to begin system shutdown. After Windows NT Server is restarted, again log on as an administrator and run Disk Administrator. The need to shut down and restart Windows NT depends on the status of the selected disk regions before the beginning of this process.

  10. Click to select the newly created but unformatted volume (H in fig. 7.26). From the Tools menu choose Format to open the Format Drive D: dialog.

  11. 7.26

    Selecting the active but unformatted volume.

  12. Type a name for the volume in the Label text box (refer to fig. 7.13). You're given the opportunity to choose an NTFS or FAT file system; select NTFS (the default). Marking the Quick Format check box bypasses drive sector checking during the formatting process. Click OK to continue.
  13. As recommended in the preceding section on RAID 0 stripe sets, don't use Quick Format if you haven't previously formatted the free space used for the new volume. A full format tests for bad sectors during the formatting process and marks bad sectors unusable in the drive's sector map.

  14. A confirmation dialog warns you that continuing overwrites the contents of the volume (refer to fig. 7.14). Click Yes to continue formatting. If you marked the Quick Format check box, the Quick Format progress indicator appears only briefly. For a conventional formatting operation, the progress indicator shown earlier in figure 7.15 appears.
  15. When formatting is complete, a dialog informs you of the total available disk space on the new volume (refer to fig. 7.16). Click OK. The new volume is ready for use as an independent volume or as a member of a mirror set.
  16. If your new volume is intended as an independent volume (not a member of a mirror set), update the configuration and repair information as described earlier in the section "Creating Drive Configuration and Emergency Repair Diskettes."

Creating the Mirror of the Standard Volume.

Mirroring creates a formatted volume of the same size as the new standard volume, but on another physical drive. To create the mirror partition, follow these steps:

  1. From Disk Administrator, select the newly formatted standard volume.
  2. Pressing and holding the Ctrl key, select an unused area on another disk drive that's at least as large as the newly created volume (see fig. 7.27).

  3. 7.27

    Selecting the formatted volume and the free space on another physical drive to create the mirror set.

  4. From the Fault Tolerance menu choose Establish Mirror.
  5. From the Partition menu choose Commit Changes Now. Windows NT Server creates the mirror set and assigns the drive letter of the first drive of the set (H in fig. 7.28).

  6. 7.28

    Disk Administrator displaying the newly created mirror set (drive H).

    The mirror set is not created immediately on completion of step 4. Setting up the mirror partition and formatting the partition occurs as a background task. Disk Administrator's status bar displays INITIALIZING during the process. To determine when the process is complete, periodically select another volume, and then reselect the mirror set. The process is complete when you see HEALTHY on the status bar.

  7. Update the configuration and repair information as described earlier in the section "Creating Drive Configuration and Emergency Repair Diskettes."

Creating RAID 5 Stripe Sets with Parity.

The process of creating a stripe set with parity is very similar to that used to create a RAID 0 stripe set. While a RAID 0 stripe set can be created on only two physical drives, a RAID 5 stripe set with parity requires a minimum of three drives-one for parity information and at least two for data.

To create a stripe set with parity, proceed as follows:

  1. Log on as an administrator and run Disk Administrator by choosing Programs, Administrative Tools, and then Disk Administrator from the Start menu.
  2. Click to select an unused area on the first disk drive.
  3. Pressing and holding the Ctrl key, select at least two additional unused areas on other disk drives, up to a total of as many as 32 disk drives. You may choose only one area on each disk drive. Figure 7.29 shows three unused areas selected on drives 0, 1, and 2.

  4. 7.29

    Selecting a minimum of three areas of free space to create a RAID 5 stripe set with parity.

  5. From the Fault Tolerance menu choose Create Stripe Set with Parity. The Create Stripe Set with Parity dialog appears, displaying the total size of the stripe set with parity spanning all selected drives (see fig. 7.30). This total size takes into account the smallest area selected on any disk drive, adjusting if necessary the sizes of the areas on the other selected drives to set all to identical size.

  6. 7.30

    Setting the size of the RAID 5 stripe set in the Create Stripe Set with Parity dialog.

  7. Click OK to accept the default. Windows NT Server prepares to create the stripe set with parity and assigns a drive letter (see fig. 7.31). The stripe set with parity must now be prepared for use.

  8. 7.31

    The three partitions proposed for the RAID 5 stripe set.

  9. From the Partition menu choose Commit Changes Now. A dialog appears, to tell you that changes have been made to your disk configuration. Click Yes to accept and save the changes. A message box appears, to notify you that the update was successful. Click OK.
  10. If another message box appears, to inform you that you must restart the computer for the changes to take effect, click OK to begin system shutdown. After Windows NT Server is restarted, again log on as an administrator and run Disk Administrator. The need to shut down and restart Windows NT depends on the status of the selected disk regions before the beginning of this process.

  11. Select the newly created but unformatted stripe set with parity, and from the Tools menu choose Format to open the Format Drive D: dialog.
  12. Type a name for the volume in the Label text box (see fig. 7.32). You're given the opportunity to choose an NTFS or FAT file system; select NTFS (the default). The Quick Format check box is disabled when you create a RAID 5 stripe set. Click OK to continue.

  13. 7.32

    Adding a label to the RAID 5 volume and selecting the file system.

  14. When formatting is complete, a dialog appears, to inform you of the total available disk space on the new volume.
  15. The size shown in figure 7.33 (409M) varies from that shown earlier in figure 7.30 (606M); 409M is the correct volume size.

  16. 7.33

    Disk Administrator confirming the correct size of the RAID 5 volume.

  17. Click OK. The new RAID 5 volume is ready for use (see fig. 7.34).

  18. 7.34

    The formatted RAID 5 volume ready for use.

    As is the case with mirror sets, RAID 5 volumes aren't ready for use until the background processing to create the additional partitions and format the partitions completes. This process may take several minutes when creating large RAID 5 volumes. The process is complete and you can use the drive when HEALTHY appears on the status bar, as shown in figure 7.34.

Summarizing RAID Recommendations

Given the wide diversity of RAID products available and the equally broad range of needs and budgets, it's difficult to make hard-and-fast recommendations for the most appropriate means of implementing RAID. However, the following observations serve as useful guidelines:

From Here...

Although RAID arrays are found on virtually every production server, many network designers and administrators don't have a firm grasp of the relative advantages and disadvantages of RAID levels and implementation strategies. Thus, this chapter provides a detailed description and comparison of these two subjects of major importance to server performance and reliability.

The remaining chapters of Part II cover the other basic elements of setting up Windows NT Server, and connecting Windows 95 and other clients to your network:


Previous chapterNext chapterContents