Performance Tuning Guidelines for Windows Server 2012 April 12, 2013 Abstract

Download 1,5 Mb.
bet	26/218
Sana	21.03.2017
Hajmi	1,5 Mb.
	#1065

1 ... 22 23 24 25 26 27 28 29 ... 218

Monitoring and Data Collection

The following list of performance counters is considered a base set of counters when you monitor the resource usage of the Application Server while you are running a non-virtualized SAP enterprise resource planning Sales and Distribution workload. Log the performance counters to a local (.blg) performance counter log. It is less expensive to collect all instances by using the wildcard character (*), and then extract particular instances while post-processing by using Relog.exe as follows:

\Cache\*
\IPv4\*

\LogicalDisk(*)\*
\Memory\*
\Network Interface(*)\*
\Paging File(*)\*
\PhysicalDisk(*)\*
\Process(*)\*
\Processor Information(*)\*
\Synchronization(*)\*
\System\*
\TCPv4\*
\SQLServer:Buffer Manager\Lazy writes/sec

Note If applicable, add the \IPv6\* and \TCPv6\* objects.

For virtualized configurations of the SAP enterprise resource planning Sales and Distribution workload the following list of performance counters can be collected for performance monitoring of the Hyper-V host:

\Hyper-V Hypervisor Partition(*)\*

\Hyper-V Hypervisor Root Partition(*)\*

\Hyper-V Hypervisor Logical Processor(*)\*

\Hyper-V Hypervisor Root Virtual Processor(*)\*

\Hyper-V Hypervisor Virtual Processor(*)\*

\Hyper-V Dynamic Memory Balancer(*)\*

\Hyper-V Dynamic Memory VM(*)\*

\Hyper-V VM Vid Numa Node(*)\*

\Hyper-V VM Vid Partition(*)\*

\PhysicalDisk(*)\*

\Hyper-V Virtual Storage Device(*)\*

\Network Interface(*)\*

\Hyper-V Virtual Network Adapter(*)\*

\Hyper-V Virtual Switch(*)\*

\Hyper-V Virtual Switch Processor(*)\*

Performance Tuning for OLTP Workloads

The TPC-E benchmark is one of the database workloads used to evaluate online transaction processing (OLTP) performance in SQL Server and Windows Server. TPC-E uses a central database that runs transactions simulating the activities of a brokerage firm. The primary metric for TPC-E is Trade-Result transactions per second (tpsE).

A non-clustered TPC-E benchmark setup consists of two parts: a set of client systems and the server under test. To achieve maximum system utilization and throughput, you can tune the operating system, SQL Server, storage, memory, processors, and network.

For more information about the TPC-E benchmark, see the Transaction Processing Performance Council website.

Server Under Test Tunings

The following are general tunings that are applicable across native and virtualized server root configurations for the database server:

Set the power scheme to High Performance.

To enable SQL Server to use large pages, enable the Lock pages in memory user right assignment for the account that will run the SQL Server:

From the Group Policy MMC snap-in (Gpedit.msc), navigate to Computer Configuration > Windows Settings > Security Settings > Local Policies > User Rights Assignment. Double-click Lock pages in memory and add the accounts that have credentials to run SQL Server.

The following are settings that are recommended for the native, non-virtualized configurations only:

Configure page files for best performance:

Navigate to Performance Settings > Advanced > Virtual memory and configure one or more fixed-size page files with Initial Size equal to Maximum Size. The page file size should be equal to the total virtual memory requirement of the workload. Make sure that no system-managed page files are in the virtual memory on the application server.
Navigate to Performance Settings > Visual Effects, and then select Adjust for best performance.

Configure network devices:
For virtualized setups, consider utilizing VMQ and dynamic VMQ where applicable. The following tunings are applicable for native TPC-E only.

The number of network devices is determined from previous runs. Network device utilization should not be higher than 65%-75% of total network adapter bandwidth. Use 1 Gbps network adapters at minimum.
From the Device Manager MMC snap-in (Devmgmt.msc), navigate to Network Adapters, and then determine the network devices to be used. Disable devices that are not being used.
For advanced network tuning information, see Performance Tuning for the Networking Subsystem earlier in this guide.

Configure storage devices as follows:

Disable low priority I/O. For each logical volume in HKLM\SYSTEM\CurrentControlSet\Enum\SCSI under Device Parameters\ClassPnp, create the registry entry IdlePrioritySupported (REG_DWORD) and set the value to 0.
For advanced storage tuning information, see Performance Tuning for the Storage Subsystem earlier in this guide.

Configure disks for advanced performance as follows:

From the Disk Management MMC snap-in (Diskmgmt.msc), select each disk in use, right-click Properties, click Policies, and then select Advanced Performance if it is enabled for the disk.

Note This setting is for native configurations only (for direct attached storage). For virtualized setups, consider the recommendations that are provided in the Storage I/O Performance topic under Performance Tuning for Virtualization Servers earlier in this guide.

SQL Server Tunings for OLTP Workloads

The following SQL Server tunings can improve performance and scalability for workloads characterized by large memory usage, high transaction rates, and high CPU utilization.

Important: The tunings in this section are specifically for OLTP benchmarking and should not be perceived as general SQL tuning guidance.

Use the -T834 start flag to enable SQL Server to use large pages.

Start SQL Server as a process instead of a service and use the -x flag for native operating system configurations to disable SQL Server performance counters and avoid potential overhead:

From the Services MMC snap-in (Services.msc), stop and disable SQL Server services.
Run the following command from the SQL Server binn directory:

sqlservr.exe –c –x –T661 –T834

For virtualized setups, leave SQL Server to run as a service with the Manual startup type as follows:

net start MSSQLSERVER /x /T661 /T834

The purpose of each parameter is as follows:

-x : Disable SQL Server perfmon counters

-T661: Disable the ghost record removal process

-T834: Use Microsoft Windows large-page allocations for the buffer pool

Note Other parameters that are sometimes used are:

-T652: Disable page prefetching scans

-T8744: Disallow prefetch for ranges

Enable the TCP/IP protocol to allow communication with client systems:

Navigate to Start Menu > Programs > Microsoft SQL Server 2012 > Configuration Tools > SQL Server Configuration Manager. Then navigate to SQL Server Network Configuration > Protocols for MSSQL Server, right-click TCP/IP, and click Enable.

Configure SQL Server according to the guidance in the following list. You can configure SQL Server by using the sp_configure stored procedure. Set the Show advanced options value to 1 to display more available configuration options. For detailed information about the sp_configure stored procedure, see Server Configuration Options in the MSDN Library.

Set the SQL processor affinity mask option to isolate system resources for the SQL Server instance from other SQL Server instances or other applications running on the same system. You can also set the SQL processor affinity mask option to not use a set of logical processors that handle I/O interrupt traffic for the network and the disk.

You can set the SQL processor affinity mask option as follows, depending on processor count:

Partition SQL process to run on specific cores, up to 32 logical processors.
To set affinity on more than 32 logical processors, but fewer than 64, processors, use affinity64 mask.
In SQL Server 2008 R2 and SQL Server 2012, you can apply equivalent settings to configure CPU affinity on as many as 640 logical processors by using the ALTER SERVER CONFIGURATION Transact-SQL statement because the sp_configure affinity mask options are announced for deprecation.
Use the alter server configuration set process affinity cpu =’ command to set affinity to the desired range or ranges of processors, separated by commas.

For more information about best practices for installations on native operating system configurations with more than 64 logical processors, see ALTER SERVER CONFIGURATION (Transact-SQL).

Note Windows Server 2012 supports guests with no more than 64 virtual processors, thus any SQL affinity masks for a virtualized SQL Server instance should be set accordingly.

Set a fixed amount of memory for the SQL Server process to use. About 3% of the total available memory is used for the system, and another 1% is used for memory management structures. SQL Server can use the remaining available memory, but not more.

Use the following equation to calculate the total memory to be used by SQL Server:

TotalMemory – (1%memory * (numa_nodes)) – 3%memory – 1GB memory

Leave the lightweight pooling value set to the default of 0. This enables SQL Server to run in thread mode. Thread mode performance is comparable to fiber mode.

Set the maximum worker threads value to approximately the number of connected users if it appears that the default settings do not allow sufficient concurrent transactions based on a throughput value lower than expected for the system and benchmark configuration. Monitor the sys.dm_os_schedulers Dynamic Management Views to determine whether you need to increase the number of worker threads.

Set the priority boost value to 1.

Consider partitioning the entire system so that dedicated NUMA nodes perform storage and/or network I/O processings and the remaining the NUMA nodes run SQL Server process for the largest scale systems. (This is for native systems only, and it does not apply in a virtualized environment.)

For network I/O, choose the number of network adapters to use and number of RSS processors for each network adapter. Find the smallest number of NUMA nodes that can satisfy the RSS requirement. Assign RSS processors to these nodes. For more information, see Performance Tuning for the Networking Subsystem earlier in this guide.
For storage I/O, choose the number of processors to handle the interrupt for each storage host bus adapters (HBA). Each storage HBA will have its own dedicated interrupt target processor. Find the smallest number of NUMA nodes that can act as the interrupt targets for all storage HBAs. Assign the interrupt affinity of each HBA to processors in the chosen NUMA nodes.
For each network adapter or storage HBA, ideally, the chosen NUMA node to handle its I/O processing is local to the device.
For the rest of the NUMA node in the system, assign the SQL affinity.

Disk Storage Tunings

Tune the disk storage as follows:

For disk storage redundancy, you can use RAID 1 0 if you have enough storage capacity. If you do not have enough capacity, you can use RAID 5 .

If you use rotational disks, configure logical drives so that all spindles are used for database disks, if possible. Additional spindles improve overall disk subsystem performance.

You can improve performance with proper write caching in the case of battery-backed write caching, which is able to avoid data loss in the case of power failure. Enable 100% write caching for the log disk.

TPC-E Database Size and Layout

The database size and layout can also be tuned for performance as follows:

You can perform the following fine tuning on the database layout for rotational media:

Database tables that have higher access frequency should be placed on the outer edge of the disk if rotational disks are used.
The default TPC-E kit can be changed, and new file groups can be created. That way, file groups can consist of higher frequency access table(s), and they can be placed on the outer edge of the disk for better performance.

For solid-state drives, consider the following configuration: For each logical disk use 80% of available disk space for partitions and leave 20% unused.

Selecting a storage configuration for the virtualized TPC-E environment can present trade-offs between performance and factors such as portability or cost saving. Pass-through disks can be considered as a high performance option for virtualized workloads with high SQL I/O disk rates such as TPC-E.

When you consider VHD or VHDX disks, fixed VHDs can deliver better throughtput over differencing or dynamic VHDs. Based on the TPC-E database size, consider placing the entire TPC-E database in a single VHDX (for small TPC-E DB) and store the VHDX on high-performance storage such as solid-state drives.

Another option for larger TPC-E databases is to partition the TPC-E database into more than one VHD or VHDX file placed in different logical or physical storage units when high physical disk response times are observed.

Client Systems Tunings

Tune the client systems as follows:

Configure client systems the same way that the server under test is configured. See Server Under Test Tunings earlier in this guide.

In addition to tuning the client systems, you should monitor client performance and eliminate any bottlenecks. Follow these client performance guidelines:

CPU utilization on clients should not be higher than 80% to accommodate activity bursts.
If any of the logical processors has high CPU utilization, consider using CPU affinity for benchmark processes to even out CPU utilization. If CPU utilization is still high, consider upgrading clients to the latest processors, or add more clients.
Verify that the time is synchronized between the master client and the server under test.

Monitoring and Data Collection

The following list of performance counters is considered a base set of counters when you monitor the resource usage of the database server for the TPC-E workload. Log the performance counters to a local (.blg) performance counter log. It is less expensive to collect all instances by using the wildcard character (*), and then extract particular instances while post-processing by using Relog.exe or Perfmon:

\IPv4\*
\Memory\*

\Network Interface(*)\*
\PhysicalDisk(*)\*
\Processor Information(*)\*
\Synchronization(*)\* for Windows Server 2012 and Windows Server 2008 R2

\SynchronizationNUMA(*)\* for Windows Server 2012

\System\*
\TCPv4\*

\NUMA Node Memory(*)\* for Windows Server 2012

Note If applicable, add the \IPv6\* and \TCPv6\* objects. To monitor overall performance, you can use the performance counter chart that is displayed in Figure 17 and the throughput chart displayed in Figure 18 to visualize run characteristics.

The first part of the run in Figure 17 represents the warm-up stage where I/O consists of mostly reads. As the run progresses, the lazy writer starts flushing caches to the disks and as write I/O increases, read I/O decreases. The beginning of steady state for the run is when the read I/O and write I/O curves seem to be parallel to each other.

Figure 17: TPC-E Perfmon counters chart

$c:\users\dariac.ntdev\appdata\local\microsoft\windows\temporary internet files\content.outlook\fk0md562\tpce throughput (3).png$

Figure 18. TPC-E throughput chart

You can use other tools such as Xperf to perform additional analysis.

Root Counters

The following list of performance counters is considered a base set of counters when you monitor the resource usage from the root operating system:

\Cache\*

\Hyper-V Hypervisor Logical Processor(*)\*

\Hyper-V Hypervisor Partition(*)\*

\Hyper-V Hypervisor Root Partition(*)\*

\Hyper-V Hypervisor Root Virtual Processor(*)\*

\Hyper-V Hypervisor Virtual Processor(*)\*

\Hyper-V Hypervisor\*

\Hyper-V Virtual IDE Controller (Emulated)(*)\*

\Hyper-V Virtual Machine Bus\*

\Hyper-V Virtual Storage Device(*)\*

\Hyper-V Virtual Switch Port(*)\*

\Hyper-V VM Vid Numa Node(*)\*

\Hyper-V VM Vid Partition(*)\*

\NUMA Node Memory(*)\*

\IPv4\*

\Memory\*

\Network Interface(*)\*

\Per Processor Network Activity Cycles(*)\*

\Per Processor Network Interface Card Activity(*)\*

\PhysicalDisk(*)\*

\Processor Information(*)\*

\SynchronizationNuma(*)\*

\System\*

\TCPv4\*
If needed, you can collect (from the context of a guest) information such as:

SQL Server performance counters

Memory utilization

Physical disk size

Resources

Websites

Windows Server 2012

http://www.microsoft.com/en-us/server-cloud/windows-server/2012-default.aspx

Windows Server 2008 R2

http://www.microsoft.com/windowsserver2008/R2.aspx

Windows Server Performance Team Blog

http://blogs.technet.com/winserverperformance/

Windows Server Catalog

http://www.windowsservercatalog.com/

SAP Global Benchmark: Sales and Distribution (SD)

http://www.sap.com/solutions/benchmark/sd.epx

Windows Sysinternals

http://technet.microsoft.com/sysinternals/default.aspx

Transaction Processing Performance Council

http://www.tpc.org/

Power Management

Power Policy Configuration and Deployment in Windows

http://msdn.microsoft.com/windows/hardware/gg463243.aspx

Using PowerCfg to Evaluate System Energy Efficiency

http://msdn.microsoft.com/windows/hardware/gg463250.aspx

Interrupt-Affinity Policy Tool

http://msdn.microsoft.com/windows/hardware/gg463378.aspx

Networking Subsystem

Scalable Networking: Eliminating the Receive Processing Bottleneck—Introducing RSS

http://download.microsoft.com/download/5/D/6/5D6EAF2B-7DDF-476B-93DC-7CF0072878E6/NDIS_RSS.doc

Windows Filtering Platform

http://msdn.microsoft.com/windows/hardware/gg463267.aspx

Networking Deployment Guide: Deploying High-Speed Networking Features

http://download.microsoft.com/download/8/E/D/8EDE21BC-0E3B-4E14-AAEA-9E2B03917A09/HSN_Deployment_Guide.doc

NT Testing TCP Tool (NTTTCP) 3.0

http://msdn.microsoft.com/en-us/windows/hardware/gg463264.aspx

Web Capacity Analysis Tool (WCAT)

http://www.iis.net/community/default.aspx?tabid=34&g=6&i=1466

File Server Capacity Tool (FSCT)

http://www.microsoft.com/download/details.aspx?id=27284

Windows Server 2012 NIC Teaming (LBFO) Deployment and Management

http://go.microsoft.com/fwlink/?LinkId=215654

Network Workload

Ttcp

http://en.wikipedia.org/wiki/Ttcp

How to Use NTttcp to Test Network Performance

http://msdn.microsoft.com/windows/hardware/gg463264.aspx

Storage Subsystem

Disk Subsystem Performance Analysis for Windows

(Note: Parts of this document are out of date, but many of the general observations and guidelines are still accurate.)

http://msdn.microsoft.com/windows/hardware/gg463405.aspx

Web Servers

10 Tips for Writing High-Performance Web Applications

http://go.microsoft.com/fwlink/?LinkId=98290

File Servers

Performance Tuning Guidelines for Microsoft Services for Network File System

http://technet.microsoft.com/library/bb463205.aspx

[MS-FSSO]: File Access Services System Overview

http://msdn.microsoft.com/library/ee392367(v=PROT.10).aspx

How to disable the TCP autotuning diagnostic tool

http://support.microsoft.com/kb/967475

Active Directory Servers

Active Directory Performance for 64-bit Versions of Windows Server 2003

http://www.microsoft.com/downloads/details.aspx?FamilyID=52e7c3bd-570a-475c-96e0-316dc821e3e7

How to configure Active Directory diagnostic event logging in Windows Server 2003 and in Windows 2000 Server

http://support.microsoft.com/kb/314980

Virtualization Servers

Hyper-V Dynamic Memory Configuration Guide

http://technet.microsoft.com/library/ff817651(WS.10).aspx

NUMA Node Balancing

http://blogs.technet.com/b/winserverperformance/archive/2009/12/10/numa-node-balancing.aspx

Hyper-V WMI Provider

http://msdn2.microsoft.com/library/cc136992(VS.85).aspx

Hyper-V WMI Classes

http://msdn.microsoft.com/library/cc136986(VS.85).aspx

What’s New in Hyper-V

http://technet.microsoft.com/library/hh831410.aspx

About Virtual Machines and Guest Operating Systems

http://technet.microsoft.com/library/cc794868(v=ws.10)

Sales and Distribution Two-Tier Workload and TPC-E Workload

Setting Server Configuration Options

http://go.microsoft.com/fwlink/?LinkId=98291

How to: Configure SQL Server to Use Soft-NUMA

http://go.microsoft.com/fwlink/?LinkId=98292

How to: Map TCP/IP Ports to NUMA Nodes

http://go.microsoft.com/fwlink/?LinkId=98293

ALTER SERVER CONFIGURATION (Transact-SQL)

http://msdn.microsoft.com/library/ee210585.aspx

SAP with Microsoft SQL Server 2008 and SQL Server 2005:

Best Practices for High Availability, Maximum Performance, and Scalability

http://www.sdn.sap.com/irj/sdn/sqlserver?rid=/library/uuid/4ab89e84-0d01-0010-cda2-82ddc3548c65

Server Tuning Tools

Server Performance Advisor 3.0

http://msdn.microsoft.com/windows/hardware/hh367834.aspx

1 See Collaborative Processor Performance Control in the Advanced Configuration and Power Interface.

Download 1,5 Mb.

1 ... 22 23 24 25 26 27 28 29 ... 218

Download 1,5 Mb.