The term “interrupt affinity” refers to the binding of interrupts from a specific device to one or more specific logical processors in a multiprocessor server. The binding forces interrupt processing to run on a specified logical processor or processors, unless the device specifies otherwise during its initialization. For some scenarios, such as a file server, the network connections and file server sessions remain on the same network adapter. In those scenarios, binding interrupts from a network adapter to a logical processor allows for processing incoming packets (SMB requests and data) on a specific set of logical processors, which improves locality and scalability.
You can use the old Interrupt-Affinity Filter tool (IntFiltr) to change the CPU affinity of the interrupt service routine (ISR). The tool runs on most servers that run Windows Server 2008 R2, regardless of what logical processor or interrupt controller is used. For IntFiltr to work on some systems, you must set the MAXPROCSPERCLUSTER=0 boot parameter. However, on some systems with more than eight logical processors or for devices that use MSI or MSI-X, the tool is limited by the Advanced Programmable Interrupt Controller (APIC) protocol. The new Interrupt-Affinity Policy (IntPolicy) tool does not encounter this issue because it sets the CPU affinity through the affinity policy of a device. For more information about the Interrupt-Affinity Policy tool, see “Resources” later in this guide. You can use either tool to direct any device's ISR to a specific processor or to a set of processors (instead of sending interrupts to any of the CPUs in the system). Note that different devices can have different interrupt affinity settings. On some systems, directing the ISR to a processor on a different Non-Uniform Memory Access (NUMA) node can cause performance issues. Also, if an MSI or MSI-X device has multiple interrupt “messages,” each message can be affinitized to a different logical processor or set of processors.
We recommend that you use IntPolicy to bind interrupts only for devices whose driver models do not support affinitization functionality. For devices that support it, you should use the device-specific mechanism for binding interrupts. For example, most modern server NICs support Receive Side Scaling (RSS), which is the recommended method for controlling interrupts. Similarly, modern storage controllers implement multi-message MSI-X and take advantage of NUMA I/O optimization provided by the operating system (Windows Server 2008 and later). Regardless of device functionality, IRQ affinity specified by the operating system is only a suggestion that the device driver can choose to honor or not. IntPolicy has no effect on the synthetic devices within a VM in a Hyper-V server. You cannot use IntPolicy to distribute the synthetic interrupt load of a guest VM.
Performance Tuning for the Networking Subsystem
Figure 2 shows the network architecture, which covers many components, interfaces, and protocols. The following sections discuss tuning guidelines for some components of server workloads.
The network architecture is layered, and the layers can be broadly divided into the following sections:
The network driver and Network Driver Interface Specification (NDIS).
These are the lowest layers. NDIS exposes interfaces for the driver below it and for the layers above it such as TCP/IP.
The protocol stack.
This implements protocols such as TCP/IP and UDP/IP. These layers expose the transport layer interface for layers above them.
These are typically transport data interface extension (TDX) or Winsock Kernel (WSK) clients and expose interfaces to user-mode applications. The WSK interface was a new feature for Windows Server 2008 and Windows Vista® and is exposed by Afd.sys. The interface improves performance by eliminating the switching between user mode and kernel mode.
These are typically Microsoft solutions or custom applications.
Tuning for network-intensive workloads can involve each layer. The following sections describe some tuning recommendations.
Choosing a Network Adapter
Network-intensive applications require high-performance network adapters. This section covers some considerations for choosing network adapters.
Offloading tasks can reduce CPU usage on the server, which improves overall system performance. The Microsoft network stack can offload one or more tasks to a network adapter if you choose one that has the appropriate offload capabilities. Table 5 provides more details about each offload capability.
The network stack can offload the calculation and validation of both Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) checksums on sends and receives. It can also offload the calculation and validation of both IPv4 and IPv6 checksums on sends and receives.
IP security authentication and encryption
The TCP/IP transport can offload the calculation and validation of encrypted checksums for authentication headers and Encapsulating Security Payloads (ESPs). The TCP/IP transport can also offload the encryption and decryption of ESPs.
Segmentation of large TCP packets
The TCP/IP transport supports Large Send Offload v2 (LSOv2). With LSOv2, the TCP/IP transport can offload the segmentation of large TCP packets to the hardware.
The TCP offload engine (TOE) enables a network adapter that has the appropriate capabilities to offload the entire network stack.
Receive-Side Scaling (RSS)
Windows Server 2008 R2 supports Receive Side Scaling (RSS) out of the box, as does Windows Server 2008. RSS distributes incoming network I/O packets among processors so that packets that belong to the same TCP connection are on the same processor, which preserves ordering. This helps improve scalability and performance for receive-intensive scenarios that have fewer networking adapters than available processors. Research shows that distributing packets to logical processors that share the same physical processor (for example, hyper-threading) degrades performance. Therefore, packets are only distributed across physical processors. Windows Server 2008 R2 offers the following optimizations for improved scalability with RSS:
RSS considers NUMA node distance (latency between nodes) when selecting processors for load balancing incoming packets.
Improved initialization and processor selection algorithm.
At boot time, the Windows Server 2008 R2 networking stack considers the bandwidth and media connection state when assigning CPUs to RSS-capable adapters. Higher-bandwidth adapters get more CPUs at startup. Multiple NICs with the same bandwidth receive the same number of RSS CPUs.
More control over RSS on a per-NIC basis.
Depending on the scenario and the workload characteristics, you can use the following registry parameters to choose on a per-NIC basis how many processors can be used for RSS, the starting offset for the range of processors, and which node the NIC allocates memory from:
Note: The asterisk (*) is part of the registry parameter.
For more information about RSS, see the document about Scalable Networking in "Resources" later in this guide.
Message-Signaled Interrupts (MSI/MSI-X)
Network adapters that support MSI/MSI-X can target their interrupts to specific processors. If the adapters also support RSS, then a processor can be dedicated to servicing interrupts and DPCs for a given TCP connection. This preserves the cache locality of TCP structures and greatly improves performance.
A few network adapters actively manage their resources to achieve optimum performance. Several network adapters let the administrator manually configure resources by using the Advanced Networking tab for the adapter. For such adapters, you can set the values of a number of parameters including the number of receive buffers and send buffers.
To control interrupt moderation, some network adapters either expose different interrupt moderation levels, or buffer coalescing parameters (sometimes separately for send and receive buffers), or both. You should consider buffer coalescing or batching when the network adapter does not perform interrupt moderation. Interrupt Moderation helps reduce overall CPU utilization by minimizing per-buffer processing cost, but the moderation of interrupts and buffer batching can have a negative impact on latency-sensitive scenarios.
Suggested Network Adapter Features for Server Roles
Table 6 lists high-performance network adapter features that can improve performance in terms of throughput, latency, or scalability for some server roles.
Table 6. Benefits from Network Adapter Features for Different Server Roles
TCP offload engine (TOE)
Receive-side scaling (RSS)
Mail server (short-lived connections)
Disclaimer: The recommendations in Table 6 are intended to serve as guidance only for choosing the most suitable technology for specific server roles under a deterministic traffic pattern. User experience can be different, depending on workload characteristics and the hardware that is used.
If your hardware supports TOE, then you must enable that option in the operating system to benefit from the hardware’s capability. You can enable TOE by running the following command: