Windows Server® 2008 R2 performs well out of the box while consuming the least energy possible for most customer workloads. However, you might have business needs that are not met by using the default server settings. You might need the lowest possible energy consumption, or the lowest possible latency, or the maximum possible throughput on your server. This guide describes how you can further tune the server settings and obtain incremental performance or energy efficiency gains, especially when the nature of the workload varies little over time.
To have the most impact, your tuning changes should consider the hardware, the workload, the power budgets, and the performance goals of your server. This guide describes important tuning considerations and settings that can result in improved performance or energy efficiency. This guide describes each setting and its potential effect to help you make an informed decision about its relevance to your system, workload, performance, and energy usage goals.
Since the release of Windows Server 2008, customers have become increasingly concerned about energy efficiency in the datacenter. To address this need, Microsoft and its partners invested a large amount of engineering resources in developing and optimizing the features, algorithms, and settings in Windows Server 2008 R2 to maximize energy efficiency with minimal effects on performance. This paper describes energy consumption considerations for servers and provides guidelines for meeting your energy usage goals. Although “power consumption” is a more commonly used term, “energy consumption” is more accurate because power is an instantaneous measurement (Energy = Power *Time). Power companies typically charge datacenters for both the energy consumed (megawatt-hours) and the peak power draw required (megawatts).
Note: Registry settings and tuning parameters changed significantly from Windows Server 2003 and Windows Server 2008 to Windows Server 2008 R2. Be sure to use the latest tuning guidelines to avoid unexpected results.
As always, be careful when you directly manipulate the registry. If you must edit the registry, back it up before you make any changes.
In This Guide
This guide contains key performance recommendations for the following components:
This guide also contains performance tuning considerations for the following server roles:
It is important to select the proper hardware to meet your expected performance and power goals. Hardware bottlenecks limit the effectiveness of software tuning. This section provides guidelines for laying a good foundation for the role that a server will play.
It is important to note that there is a tradeoff between power and performance when choosing hardware. For example, faster processors and more disks will yield better performance but can also consume more energy. See “Choosing Server Hardware: Power Considerations” later in this guide for more details about these tradeoffs. Later sections of this paper provide tuning guidelines that are specific to a server role and include diagnostic techniques for isolating and identifying performance bottlenecks for certain server roles.
Table 1 lists important items that you should consider when you choose server hardware. Following these guidelines can help remove artificial performance bottlenecks that might impede the server’s performance.
Table 1. Server Hardware Recommendations
Choose 64-bit processors for servers. 64-bit processors have significantly more address space, and 64-bit processors are required for Windows Server 2008 R2. No 32-bit editions of the operating system will be provided, but 32-bit applications will run on the 64-bit Windows Server 2008 R2 operating system.
To increase the computing resources in a server, you can use a processor with higher-frequency cores, or you can increase the number of processor cores. If CPU is the limiting resource in the system, a core with 2x frequency typically provides a greater performance improvement than two cores with 1x frequency. Multiple cores are not expected to provide a perfect linear scaling, and the scaling factor can be even less if hyper-threading is enabled because hyper-threading relies on sharing resources of the same physical core.
It is important to match and scale the memory and I/O subsystem with the CPU performance and vice versa.
Do not compare CPU frequencies across manufacturers and generations because the comparison can be a misleading indicator of speed.
Choose large L2 or L3 processor caches. The larger caches generally provide better performance and often play a bigger role than raw CPU frequency.
When your computer runs low on memory and needs more immediately, modern operating systems use hard disk space to supplement system RAM through a procedure called paging. Too much paging degrades overall system performance.
You can optimize paging by using the following guidelines for pagefile placement:
Place the pagefile and operating system files on separate physical disk drives.
Place the pagefile on a drive that is not fault-tolerant. Note that, if the disk fails, a system crash is likely to occur. If you place the pagefile on a fault-tolerant drive, remember that fault-tolerant systems often experience slower data writes because they write data to multiple locations.
Use multiple disks or a disk array if you need additional disk bandwidth for paging. Do not place multiple pagefiles on different partitions of the same physical disk drive.
In Windows Server 2008 R2, the primary storage and network interfaces must be PCIe. Choose servers with PCIe buses. Also, to avoid bus speed limitations, use PCIe x8 and higher slots for Gigabit Ethernet adapters.
Choose disks with higher rotational speeds to reduce random request service times (2 ms on average when you compare 7,200- and 15,000-RPM drives) and to increase sequential request bandwidth. However, there are cost, power, and other considerations associated with disks that have high rotational speeds.
The latest generation of 2.5-inch enterprise-class disks can service a significantly larger number of random requests per second compared to equivalent 3.5-inch drives.
Store “hot” data – especially sequentially accessed data – near the “beginning” of a disk because this roughly corresponds to the outermost (fastest) tracks.
Be aware that consolidating small drives into fewer high-capacity drives can reduce overall storage performance. Fewer spindles mean reduced request service concurrency and therefore potentially lower throughput and longer response times (depending on the workload intensity).
Table 2 lists the recommended characteristics for network and storage adapters for high-performance servers. These settings can help prevent your networking or storage hardware from being the bottleneck when they are under heavy load.
The adapter has passed the Windows® Hardware Quality Labs (WHQL) certification test suite.
Adapters that are 64-bit-capable can perform direct memory access (DMA) operations to and from high physical memory locations (greater than 4 GB). If the driver does not support DMA greater than 4 GB, the system double-buffers the I/O to a physical address space of less than 4 GB.
Copper and fiber (glass) adapters
Copper adapters generally have the same performance as their fiber counterparts, and both copper and fiber are available on some Fibre Channel adapters. Certain environments are better suited to copper adapters, whereas other environments are better suited to fiber adapters.
Dual- or quad-port adapters
Multiport adapters are useful for servers that have a limited number of PCI slots.
To address SCSI limitations on the number of disks that can be connected to a SCSI bus, some adapters provide two or four SCSI buses on a single adapter card. Fibre Channel disks generally have no limits to the number of disks that are connected to an adapter unless they are hidden behind a SCSI interface.
Serial Attached SCSI (SAS) and Serial ATA (SATA) adapters also have a limited number of connections because of the serial nature of the protocols, but you can attach more attached disks by using switches.
Network adapters have this feature for load-balancing or failover scenarios. Using two single-port network adapters usually yields better performance than using a single dual-port network adapter for the same workload.
PCI bus limitation can be a major factor in limiting performance for multiport adapters. Therefore, it is important to consider placing them in a high-performing PCIe slot that provides enough bandwidth.
Some adapters can moderate how frequently they interrupt the host processors to indicate activity or its completion. Moderating interrupts can often result in reduced CPU load on the host but, unless interrupt moderation is performed intelligently, the CPU savings might increase latency.
Offload capability and other advanced features such as message-signaled interrupt (MSI)-X
Offload-capable adapters offer CPU savings that yield improved performance. For more information, see “Choosing a Network Adapter” later in this guide.
Dynamic interrupt and deferred procedure call (DPC) redirection
Windows Server 2008 R2 has functionality that enables PCIe storage adapters to dynamically redirect interrupts and DPCs. This capability, originally called “NUMA I/O,” can help any multiprocessor system by improving workload partitioning, cache hit rates, and on-board hardware interconnect usage for I/O-intensive workloads.