You can optimize network throughput and resource usage by tuning the network adapter, if any tuning options are exposed by the adapter. Remember that the correct tuning settings depend on the network adapter, the workload, the host computer resources, and your performance goals.
Enabling Offload Features
Turning on network adapter offload features is usually beneficial. Sometimes, however, the network adapter is not powerful enough to handle the offload capabilities at high throughput. For example, enabling segmentation offload can reduce the maximum sustainable throughput on some network adapters because of limited hardware resources. However, if the reduced throughput is not expected to be a limitation, you should enable offload capabilities even for such network adapters. Note that some network adapters require offload features to be independently enabled for send and receive paths.
For network adapters that allow for the manual configuration of resources such as receive and send buffers, you should increase the allocated resources. Some network adapters set their receive buffers low to conserve allocated memory from the host. The low value results in dropped packets and decreased performance. Therefore, for receive-intensive scenarios, we recommend that you increase the receive buffer value to the maximum. If the adapter does not expose manual resource configuration, then it either dynamically configures the resources or it is set to a fixed value that cannot be changed.
Enabling Interrupt Moderation
To control interrupt moderation, some network adapters expose different interrupt moderation levels, buffer coalescing parameters (sometimes separately for send and receive buffers), or both. You should consider interrupt moderation for CPU-bound workloads and consider the trade-off between the host CPU savings and latency versus the increased host CPU savings because of more interrupts and less latency. If the network adapter does not perform interrupt moderation but does expose buffer coalescing, then increasing the number of coalesced buffers allows for more buffers per send or receive, which improves performance.
Enabling RSS for Web Scenarios
RSS can improve Web scalability and performance when there are fewer NICs than processors on the server. When all the Web traffic is going through the RSS-capable NICs, incoming Web requests from different connections can be simultaneously processed across different CPUs. It is important to note that due to logic in RSS and HTTP for load distribution, performance can be severely degraded if a non-RSS-capable NIC accepts Web traffic on a server that has one or more RSS-capable NICs. We recommend that you either use RSS-capable-NICs or disable RSS from the Advanced Properties tab. To determine whether a NIC is RSS-capable, view the RSS information in the Advanced Properties tab for the device.
Binding Each Adapter to a CPU
The method to use for binding network adapters to a CPU depends on the number of network adapters, the number of CPUs, and the number of ports per network adapter. Important factors are the type of workload and the distribution of the interrupts across the CPUs. For a workload such as a Web server that has several networking adapters, partition the adapters on a processor basis to isolate the interrupts that the adapters generate.
Starting with Windows Server 2008, one of the most significant changes to the TCP stack is TCP receive window auto-tuning. Previously, the network stack used a fixed-size receive-side window that limited the overall potential throughput for connections. You can calculate the total throughput of a single connection when you use this fixed size default as:
Total achievable throughput in bytes = TCP window * (1 / connection latency)
For example, the total achievable throughput is only 51 Mbps on a 1-GB connection with 10-ms latency (a reasonable value for a large corporate network infrastructure). With auto-tuning, however, the receive-side window is adjustable and can grow to meet the demands of the sender. It is entirely possible for a connection to achieve a full line rate of a 1-GB connection. Network usage scenarios that might have been limited in the past by the total achievable throughput of TCP connections now can fully use the network.
Remote file copy is a common network usage scenario that is likely to increase demand on the infrastructure because of this change. Many improvements have been made to the underlying operating system support for remote file copy that now let large file copies perform at disk I/O speeds. If many concurrent remote large file copies are typical within your network environment, your network infrastructure might be taxed by the significant increase in network usage by each file copy operation.
Windows Filtering Platform
The Windows Filtering Platform (WFP) that was introduced in Windows Vista and Windows Server 2008 provides APIs to third-party independent software vendors (ISVs) to create packet processing filters. Examples include firewall and antivirus software. Note that a poorly written WFP filter significantly decreases a server’s networking performance. For more information about WFP, see "Resources" later in this guide.
This section lists the counters that are relevant to managing network performance. Most of the counters are straightforward. We provide guidelines for the counters that typically require explanation.
Datagrams received per second.
Datagrams sent per second.
Datagrams received per second.
Datagrams sent per second.
Network Interface > [adapter name]
Bytes received per second.
Bytes sent per second.
Packets received per second.
Packets sent per second.
Output queue length.
This counter is the length of the output packet queue (in packets). If this is longer than 2, delays occur. You should find the bottleneck and eliminate it if you can. Because NDIS queues the requests, this length should always be 0.
Packets received errors.
This counter is the number of incoming packets that contain errors that prevent them from being deliverable to a higher-layer protocol. A zero value does not guarantee that there are no receive errors. The value is polled from the network driver, and it can be inaccurate.
Packets outgoing errors.
Percent of processor time.
Interrupts per second.
DPCs queued per second.
This counter is an average rate at which DPCs were added to the processor's DPC queue. Each processor has its own DPC queue. This counter measures the rate at which DPCs are added to the queue, not the number of DPCs in the queue. It displays the difference between the values that were observed in the last two samples, divided by the duration of the sample interval.