Figure 16 – Windows 2000 unbuffered overhead. The fixed overhead for requests on dynamic volumes is higher than that of basic volumes. At small request sizes, this overhead becomes more predominant causing dynamic volumes to be more expensive. This may be related to the increased throughput of basic volumes over dynamic volumes on small 2KB and 4KB requests.
Windows 2000 Multiple SCSI Disk Performance
In this section, we explore the effects of multiple disks and striping. Striping improves performance through creating a large logical volume out of smaller disks by interleaving fixed size blocks among the actual disks. We first created a two-disk RAID0 stripe set using the Win2K built-in software RAID support. In Windows 2000, the default block size is 64KB. Since blocks are interleaved, sequential requests are optimal for taking advantage of the additional parallelism afforded by multiple disks. In a two-disk configuration, the first request can go to the first drive, followed immediately by a second request going to the second drive. With two drives streaming data off disk at the same time, the potential throughput is doubled. As Figure 17 shows, adding an additional drive doubles throughput for requests larger than 4KB. Small requests don’t benefit from additional drives without increased request depth as they don’t allow the system to take advantage of the additional parallelism provided. This is due to the disks being interleaved in 64KB chunks. For a small request of 2KB for example, thirty-two requests must be issued before we reach the next chunk on the next drive. This effectively serializes access to the disks since, with 2KB requests, the system is spending most of its time accessing only one disk at a time.
Because Windows 2000 can only create dynamic volume stripe sets, and since basic and dynamic volumes show little difference in performance for simple and striped volumes, dynamic volumes were used for all the measurements below.
|
| Figure 17 – One and two disk stripped unbuffered throughput. Two disks show a doubling of throughput over that of one disk with three deep requests. The drives are still the main bottleneck. The write plateau is an artifact of the Quantum Atlas 10K disk controller and the controllers of some other drives. |
As more drives are added, limitations in other parts of the system become more critical. Each disk is capable of approximately 25MBps. With three disks, there should be 75MBps of bandwidth if disks were the limiting factor. At four disks, there should be 100MBps of bandwidth. However, the host bus adapter and the PCI bus soon become the limiting factor. Figure 18 shows three drives peaking at 73MBps and four drives at 87MBps. The controller could only handle 50MBps on writes which lead to marginal peak throughput gains of 2.5MBps and 2.8MBps for a third and fourth drive respectively. We tested PCI bandwidth by reading directly from disk caches, eliminating media speed as a factor. 87MBps was the maximum PCI bandwidth that we were able to achieve at 64KB requests. Due to this PCI limitation, adding more than three drives on our system gives little to no appreciable additional sequential IO read performance.
|