The following list of performance counters is considered a base set of counters when you monitor the resource usage of the Application Server while you are running a non-virtualized SAP enterprise resource planning Sales and Distribution workload. Log the performance counters to a local (.blg) performance counter log. It is less expensive to collect all instances by using the wildcard character (*), and then extract particular instances while post-processing by using Relog.exe as follows:
Note If applicable, add the \IPv6\* and \TCPv6\* objects.
For virtualized configurations of the SAP enterprise resource planning Sales and Distribution workload the following list of performance counters can be collected for performance monitoring of the Hyper-V host:
\Hyper-V Hypervisor Partition(*)\*
\Hyper-V Hypervisor Root Partition(*)\*
\Hyper-V Hypervisor Logical Processor(*)\*
\Hyper-V Hypervisor Root Virtual Processor(*)\*
\Hyper-V Hypervisor Virtual Processor(*)\*
\Hyper-V Dynamic Memory Balancer(*)\*
\Hyper-V Dynamic Memory VM(*)\*
\Hyper-V VM Vid Numa Node(*)\*
\Hyper-V VM Vid Partition(*)\*
\Hyper-V Virtual Storage Device(*)\*
\Hyper-V Virtual Network Adapter(*)\*
\Hyper-V Virtual Switch(*)\*
\Hyper-V Virtual Switch Processor(*)\*
Performance Tuning for OLTP Workloads
The TPC-E benchmark is one of the database workloads used to evaluate online transaction processing (OLTP) performance in SQL Server and Windows Server. TPC-E uses a central database that runs transactions simulating the activities of a brokerage firm. The primary metric for TPC-E is Trade-Result transactions per second (tpsE).
A non-clustered TPC-E benchmark setup consists of two parts: a set of client systems and the server under test. To achieve maximum system utilization and throughput, you can tune the operating system, SQL Server, storage, memory, processors, and network.
The following are general tunings that are applicable across native and virtualized server root configurations for the database server:
Set the power scheme to High Performance.
To enable SQL Server to use large pages, enable the Lock pages in memory user right assignment for the account that will run the SQL Server:
From the Group Policy MMC snap-in (Gpedit.msc), navigate to Computer Configuration > Windows Settings > Security Settings > Local Policies > User Rights Assignment. Double-click Lock pages in memory and add the accounts that have credentials to run SQL Server.
The following are settings that are recommended for the native, non-virtualized configurations only:
Configure page files for best performance:
Navigate to Performance Settings > Advanced > Virtual memory and configure one or more fixed-size page files with Initial Size equal to Maximum Size. The page file size should be equal to the total virtual memory requirement of the workload. Make sure that no system-managed page files are in the virtual memory on the application server.
Navigate to Performance Settings > Visual Effects, and then select Adjust for best performance.
Configure network devices:
For virtualized setups, consider utilizing VMQ and dynamic VMQ where applicable. The following tunings are applicable for native TPC-E only.
The number of network devices is determined from previous runs. Network device utilization should not be higher than 65%-75% of total network adapter bandwidth. Use 1 Gbps network adapters at minimum.
From the Device Manager MMC snap-in (Devmgmt.msc), navigate to Network Adapters, and then determine the network devices to be used. Disable devices that are not being used.
Disable low priority I/O. For each logical volume in HKLM\SYSTEM\CurrentControlSet\Enum\SCSI under Device Parameters\ClassPnp, create the registry entry IdlePrioritySupported (REG_DWORD) and set the value to 0.
For advanced storage tuning information, see Performance Tuning for the Storage Subsystem earlier in this guide.
Configure disks for advanced performance as follows:
From the Disk Management MMC snap-in (Diskmgmt.msc), select each disk in use, right-click Properties, click Policies, and then select Advanced Performance if it is enabled for the disk.
Note This setting is for native configurations only (for direct attached storage). For virtualized setups, consider the recommendations that are provided in the Storage I/O Performance topic under Performance Tuning for Virtualization Servers earlier in this guide.
SQL Server Tunings for OLTP Workloads
The following SQL Server tunings can improve performance and scalability for workloads characterized by large memory usage, high transaction rates, and high CPU utilization.
Important: The tunings in this section are specifically for OLTP benchmarking and should not be perceived as general SQL tuning guidance.
Use the -T834 start flag to enable SQL Server to use large pages.
Start SQL Server as a process instead of a service and use the -x flag for native operating system configurations to disable SQL Server performance counters and avoid potential overhead:
From the Services MMC snap-in (Services.msc), stop and disable SQL Server services.
Run the following command from the SQL Server binn directory:
-T834: Use Microsoft Windows large-page allocations for the buffer pool
Note Other parameters that are sometimes used are:
-T652: Disable page prefetching scans
-T8744: Disallow prefetch for ranges
Enable the TCP/IP protocol to allow communication with client systems:
Navigate to Start Menu > Programs > Microsoft SQL Server 2012 > Configuration Tools > SQL Server Configuration Manager. Then navigate to SQL Server Network Configuration > Protocols for MSSQL Server, right-click TCP/IP, and click Enable.
Configure SQL Server according to the guidance in the following list. You can configure SQL Server by using the sp_configure stored procedure. Set the Show advanced options value to 1 to display more available configuration options. For detailed information about the sp_configure stored procedure, see Server Configuration Options in the MSDN Library.
Set the SQL processor affinity mask option to isolate system resources for the SQL Server instance from other SQL Server instances or other applications running on the same system. You can also set the SQL processor affinity mask option to not use a set of logical processors that handle I/O interrupt traffic for the network and the disk.
You can set the SQL processor affinity mask option as follows, depending on processor count:
Partition SQL process to run on specific cores, up to 32 logical processors.
To set affinity on more than 32 logical processors, but fewer than 64, processors, use affinity64 mask.
In SQL Server 2008 R2 and SQL Server 2012, you can apply equivalent settings to configure CPU affinity on as many as 640 logical processors by using the ALTER SERVER CONFIGURATION Transact-SQL statement because the sp_configure affinity mask options are announced for deprecation.
Use the alter server configuration set process affinity cpu =’ command to set affinity to the desired range or ranges of processors, separated by commas.
For more information about best practices for installations on native operating system configurations with more than 64 logical processors, see ALTER SERVER CONFIGURATION (Transact-SQL).
Note Windows Server 2012 supports guests with no more than 64 virtual processors, thus any SQL affinity masks for a virtualized SQL Server instance should be set accordingly.
Set a fixed amount of memory for the SQL Server process to use. About 3% of the total available memory is used for the system, and another 1% is used for memory management structures. SQL Server can use the remaining available memory, but not more.
Use the following equation to calculate the total memory to be used by SQL Server:
Leave the lightweight pooling value set to the default of 0. This enables SQL Server to run in thread mode. Thread mode performance is comparable to fiber mode.
Set the maximum worker threads value to approximately the number of connected users if it appears that the default settings do not allow sufficient concurrent transactions based on a throughput value lower than expected for the system and benchmark configuration. Monitor the sys.dm_os_schedulers Dynamic Management Views to determine whether you need to increase the number of worker threads.
Set the priority boost value to 1.
Consider partitioning the entire system so that dedicated NUMA nodes perform storage and/or network I/O processings and the remaining the NUMA nodes run SQL Server process for the largest scale systems. (This is for native systems only, and it does not apply in a virtualized environment.)
For network I/O, choose the number of network adapters to use and number of RSS processors for each network adapter. Find the smallest number of NUMA nodes that can satisfy the RSS requirement. Assign RSS processors to these nodes. For more information, see Performance Tuning for the Networking Subsystem earlier in this guide.
For storage I/O, choose the number of processors to handle the interrupt for each storage host bus adapters (HBA). Each storage HBA will have its own dedicated interrupt target processor. Find the smallest number of NUMA nodes that can act as the interrupt targets for all storage HBAs. Assign the interrupt affinity of each HBA to processors in the chosen NUMA nodes.
For each network adapter or storage HBA, ideally, the chosen NUMA node to handle its I/O processing is local to the device.
For the rest of the NUMA node in the system, assign the SQL affinity.
For disk storage redundancy, you can use RAID 1 0 if you have enough storage capacity. If you do not have enough capacity, you can use RAID 5 .
If you use rotational disks, configure logical drives so that all spindles are used for database disks, if possible. Additional spindles improve overall disk subsystem performance.
You can improve performance with proper write caching in the case of battery-backed write caching, which is able to avoid data loss in the case of power failure. Enable 100% write caching for the log disk.
TPC-E Database Size and Layout
The database size and layout can also be tuned for performance as follows:
You can perform the following fine tuning on the database layout for rotational media:
Database tables that have higher access frequency should be placed on the outer edge of the disk if rotational disks are used.
The default TPC-E kit can be changed, and new file groups can be created. That way, file groups can consist of higher frequency access table(s), and they can be placed on the outer edge of the disk for better performance.
For solid-state drives, consider the following configuration: For each logical disk use 80% of available disk space for partitions and leave 20% unused.
Selecting a storage configuration for the virtualized TPC-E environment can present trade-offs between performance and factors such as portability or cost saving. Pass-through disks can be considered as a high performance option for virtualized workloads with high SQL I/O disk rates such as TPC-E.
When you consider VHD or VHDX disks, fixed VHDs can deliver better throughtput over differencing or dynamic VHDs. Based on the TPC-E database size, consider placing the entire TPC-E database in a single VHDX (for small TPC-E DB) and store the VHDX on high-performance storage such as solid-state drives.
Another option for larger TPC-E databases is to partition the TPC-E database into more than one VHD or VHDX file placed in different logical or physical storage units when high physical disk response times are observed.
Client Systems Tunings
Tune the client systems as follows:
Configure client systems the same way that the server under test is configured. See Server Under Test Tunings earlier in this guide.
In addition to tuning the client systems, you should monitor client performance and eliminate any bottlenecks. Follow these client performance guidelines:
CPU utilization on clients should not be higher than 80% to accommodate activity bursts.
If any of the logical processors has high CPU utilization, consider using CPU affinity for benchmark processes to even out CPU utilization. If CPU utilization is still high, consider upgrading clients to the latest processors, or add more clients.
Verify that the time is synchronized between the master client and the server under test.
Monitoring and Data Collection
The following list of performance counters is considered a base set of counters when you monitor the resource usage of the database server for the TPC-E workload. Log the performance counters to a local (.blg) performance counter log. It is less expensive to collect all instances by using the wildcard character (*), and then extract particular instances while post-processing by using Relog.exe or Perfmon:
\Synchronization(*)\* for Windows Server 2012 and Windows Server 2008 R2
\SynchronizationNUMA(*)\* for Windows Server 2012
\NUMA Node Memory(*)\* for Windows Server 2012
Note If applicable, add the \IPv6\* and \TCPv6\* objects. To monitor overall performance, you can use the performance counter chart that is displayed in Figure 17 and the throughput chart displayed in Figure 18 to visualize run characteristics.
The first part of the run in Figure 17 represents the warm-up stage where I/O consists of mostly reads. As the run progresses, the lazy writer starts flushing caches to the disks and as write I/O increases, read I/O decreases. The beginning of steady state for the run is when the read I/O and write I/O curves seem to be parallel to each other.
Figure 17: TPC-E Perfmon counters chart
Figure 18. TPC-E throughput chart
You can use other tools such as Xperf to perform additional analysis.