Figure 1 – Peak Advertised Performance (PAP) vs. Real Application Performance (RAP). The graphic above shows the various hardware components involved with disk IO: the processors, memory/system bus, PCI bus, SCSI and IDE buses, and individual disks. For each component, the PAP is shown in bold while the RAP we measured is shown below in parentheses.
We first measured single disk sequential I/O performance on NT4SP6 and Windows 2000 Advanced Server. We then compared our results to those of the original sequential I/O study of NT4SP3 by Riedel, et. al.
Performance of all three operating systems was similar. Windows NT4SP6 and Windows 2000 Advanced Server have very similar performance, and show almost identical performance to NT4SP3 except:
The overhead for large buffered read and write requests was substantially higher on NT4SP3.
Small (2KB and 4KB) requests no longer show the 33% to 66% decrease in throughput seen in NT4SP3.
The buffered IO throughput “dip” of NT4SP3 above 64KB is corrected.
Sequential I/O performance under NT4SP6 shows a few improvements compared to NT4SP3. Differences were incremental rather than radical: the models of NT performance are still valid. Win2K compared to NT4 similarly shows incremental improvement in I/O performance. Basic volumes and the new dynamic volumes have similar sequential I/O performance.
With the sequential IO throughput of disks and controllers increasing, the bottleneck has shifted to the one thing that has not improved much: the PCI bus. Our modern workstation was capable of 98.5MBps across its PCI bus. When compared to the 72MBps Riedel was able to achieve, our workstation’s PCI bus is only 37% faster while its disks are 300% faster. What this means is while two years ago it would take nine disks spread over three adapters to saturate a PCI bus, today three to four disks on one adapter can saturate a PCI bus. The multiple 64-bit 66 MHz PCI buses found on high-end servers, and the future Infiniband™ IO interfaces will likely change this, but today the PCI bus is a bottleneck for sequential IO on low-end servers.
Of course, most applications do random rather than sequential IO. If applications are doing random 8KB IOs against modern disks, then each disk can deliver about 1MBps, and so a modern controller can manage many (16 or more) disks, and a single PCI bus can carry the load of 64 randomly accessed disks. Faster PCI technologies such as PCI-X and the 64bit / 66MHz flavors of PCI are still premium products.
Along with the advances in SCSI drive technology, IDE drives are beginning to grow beyond the desktop market into the workstation and server markets, bringing with them considerably lower prices. The ANSI standard ATA (Advanced Technology Attachment) interface, more commonly called Integrated Drive Electronics (IDE), was first created to provide cheap hard drives for the PC user. As the Intel x86 architecture has become more popular, price conscious consumers have purchased IDE drives rather than pay a premium for drives with more expensive interfaces such as SCSI. Today, IDE drives have evolved to include DMA and 66MHz (500Mbps) connections and still hold a wide margin over SCSI drives in terms of units shipped. With their higher volume, IDE prices benefit from economies of scale. At present, a terabyte of IDE drives costs $6,500 while a terabyte of SCSI drives costs $16,000. When optimizing for cost, IDE drives are hard to beat.
Of course, cost is only one of the factors in purchasing decisions. Another is undoubtedly performance. Since IDE was designed as an inexpensive and simple interface, it lacks many SCSI features like tagged command queuing, multiple commands per channel, power sequencing, hot-swap, and reserve-release. The common perception of IDE is that it should only be used when performance isn’t critical. Conventional wisdom says that SCSI is the choice for applications that want high performance or high integrity. As such, most desktops and portables use IDE while most workstations and servers pay the SCSI price premium. This is despite the fact that most drive manufacturers today use the same drive mechanism across both their IDE and SCSI lines – the only difference is the drive controller.
It is possible to mitigate some of IDE’s performance penalties by using a host bus adapter card that makes the IDE drives appear to be SCSI drives. Indeed, Promise Technology and 3ware are two companies that sell such cards. These cards add between $38 and $64 to the price of each disk. These controller cards are typically less expensive than corresponding SCSI controller cards, so they potentially provide a double advantage – SCSI functionality at about half the price.
Among other things, this report compares IDE and SCSI performance using micro-benchmarks. We used a 3ware 3W-5400 IDE RAID card to allow us to connect four IDE drives to our test machine. In summary, we found that individual IDE drive performance was very good. In our comparison of an SCSI Quantum Atlas 10K 10,000 RPM drive to an IDE Quantum lcs08 5400 RPM drive, the IDE drive proved to be 20% slower on sequential loads and at most 44% slower in random loads. But the SCSI drive was more than 250% more expensive than IDE drive. Even with the lower random performance per drive, buying two IDE drives would be cheaper, and a mirrored pair would give both fault-tolerance and read IO performance superior to a single SCSI drive. The IDE price/performance advantage gets even better for sequential workloads. For the same price as SCSI, IDE delivers almost double the sequential throughput of a single SCSI disk.
However, SCSI features like tagged command queuing, more than two disks per string, long cable lengths, and hot swap don’t currently exist for native IDE – although 3ware promises hot-swap in their next model (3W-6000)
The report also examined the performance of file IO between a client and a file server using the CIFS/SMB protocol, either as a mapped drive or via UNC names. These studies showed that clients can get about 40 MBps reading and half that writing. But, there are several strange aspects of remote file IO. Unfortunately, there was not time to explore the details of this problem.