The ratio between the number of Dialog (D) processes versus Update (U) processes in the SAP ERP installation might vary, but usually a ratio of 1D:1U or 2D:1U per logical processor is a good start for the SD workload. Ensure that in a SAP dialog instance, the number of worker processes and users does not exceed the capacity of the SAP dispatcher for that dialog instance (the current maximum is approximately 2,000 users per instance). On NUMA-class hardware, consider installing one or more SAP dialog instances per NUMA node (depending on the number of logical processors per NUMA node that you want to use with SAP worker processes). The D:U ratio, and the overall number of SAP dialog instances per NUMA node or system wide, might be improved based on the analysis of previous experiments.
To further partition within an SAP instance, use the processor affinity capabilities in the SAP instance profiles to partition each worker process to a subset of the available logical processors and achieve better CPU and memory locality. Affinity setting in the SAP instance profiles is supported for as many as 64 logical processors.
Use the FLAT memory model that SAP AG released on November 23, 2006, with the SAP Note No. 1002587 “Flat Memory Model on Windows” for SAP kernel 7.00 Patch Level 87.
Windows Server 2008 R2 supports more than 64 logical processors. On such NUMA-class systems, consider setting preferred NUMA nodes in addition to setting hard affinities by using the following steps:
Set the preferred NUMA node for the SAP Win32 service and SAP Dialog Instance services (processes instantiated by Sapstartsrv.exe). When you enter commands on the local system, you can omit the server parameter. For the commands below, use the service short name:
Use the following command to set the preferred NUMA node:
%windir%\system32\sc.exe [server] preferrednode You need administrator permissions to set the preferred node. Use %windir%\system32\sc.exe preferrednode to display help text.
Use the following command to query the setting:
%windir%\system32\sc.exe [server] qpreferrednode This command fails if the service has no preferred node settings. Use %windir%\system32\sc.exe qpreferrednode to display help text.
To allow each SAP worker process in a dialog instance to inherit the ideal NUMA node from its Win32 service, create registry key entries under the following key for each of the Sapstartsrv.exe, Msg_server.exe, Gwrd.exe, and Disp work.exe images and set the "NodeOptions"=dword:00000100 value:
If the preferred NUMA node is used without hard affinity settings for SAP worker processes, or if time measurement issues are observed as described by SAP Note No. 532350 released on November 29, 2004, apply the recommendation to let SAP processes use the Query Performance Counter (QPC) timer to stabilize the benchmark environment. Set the following system environment variable:
If applicable, use the IntPolicy tool as described in the “Interrupt Affinity” section earlier in this guide to set an optimal interrupt affinity for storage or network devices.
You can use the Coreinfo tool from Windows Sysinternals to provide topology details about logical and physical processors, processor sockets, NUMA nodes, and processor cache. For more information, see “Resources” later in this guide.
Monitoring and Data Collection
The following list of performance counters is considered a base set of counters when you monitor the resource usage of the Application Server while you are running the two-tier SAP ERP SD workload. Log the performance counters to a local, raw (blg) performance counter log. It is less expensive to collect all instances (‘*’ wide character) and then extract particular instances while post-processing by using Relog.exe:
Note: If applicable, add the \IPv6\* and \TCPv6\* objects.
Performance Tuning for TPC-E Workload
TPC-E online transaction processing (OLTP) is one of the primary database workloads used to evaluate SQL Server and Windows Server performance. TPC-E uses a central database that executes transactions related to a brokerage firm’s customer accounts. The primary metric for TPC-E is Trade-Result transactions per second (tpsE). Note that Trade-Result transactions account for 10% of the transaction mix. For more information about the TPC-E benchmark, see the TPC-E website listed in “Resources” later in this guide.
A non-clustered TPC-E benchmark setup consists of two parts: a set of client systems and the server under test (SUT). To achieve maximum system utilization and throughput, you can tune the operating system, SQL Server, storage, memory, processors, and network.
Important: The tunings in this section are specifically for OLTP benchmarking and should not be perceived as general SQL tuning guidance.
Server Under Test (SUT) Tunings
Use the following SUT tunings:
Set the power scheme to High Performance.
Configure pagefiles for best performance:
Navigate to Performance Settings > Advanced > Virtual memory and configure one or more fixed-size pagefiles with Initial Size equal to Maximum Size. The pagefile size should be equal to the total virtual memory requirement of the workload. Make sure that no system-managed pagefiles are in the virtual memory on the application server.
Navigate to Performance Settings > Visual Effects and select Adjust for best performance.
To enable SQL Server to use large pages, enable the Lock pages in memory user right assignment for the account that will run the SQL Server:
From the Group Policy MMC snap-in (Gpedit.msc), navigate to Computer Configuration > Windows Settings > Security Settings > Local Policies > User Rights Assignment. Double-click Lock pages in memory and add the accounts that have credentials to run SQL Server.
Configure network devices:
The number of network devices is determined from previous runs. Network device utilization should not be higher than 65%-75% of total NIC bandwidth. Use 1-Gbps NICs at minimum.
From the Device Manager MMC snap-in (Devmgmt.msc), navigate to Network Adapters and determine the network devices to be used. Disable devices that are not being used.
If interrupt partitioning is necessary in high interrupt rates per NIC port scenarios, and the device supports interrupt affinity configuration, set network device interrupt affinity:
Using the IntPolicy tool, set interrupt affinity in a round-robin fashion starting from processor 0. If the SUT is a multinode system, determine on which nodes the NICs reside and set the affinity to processors that belong to the node on which each NIC resides. For detailed information on the IntPolicy tool, see "Resources" later in this guide.
For advanced network tuning information, see “Performance Tuning for the Networking Subsystem” earlier in this guide.
Configure storage devices:
If the operating system is Windows Server 2008 R2, DPC redirection optimization is available on some storage drivers. If the storage device driver supports DPC redirection optimization, there is no need to set interrupt affinity on storage devices. If the storage device driver does not support DPC redirection, or if storage device driver interrupts are not distributed to processors on the same NUMA node where the device resides, set the interrupt affinity for each device by using IntPolicy as advised for networking devices.
For advanced storage tuning information, see “Performance Tuning for the Storage Subsystem” earlier in this guide.
Configure disks for advanced performance:
From the Disk Management MMC snap-in (Diskmgmt.msc), select each disk in use, right-click to Properties > Policies and select Advanced Performance if it is enabled for the disk.
SQL Server Tunings for TPC-E Workload
The following SQL Server tunings improve performance and scalability in environments such as TPC-E:
You can use the -T834 start flag to enable SQL Server to use large pages.
If you disable SQL Server performance counters to avoid potential overhead, start SQL Server as a process instead of a service and use the -x flag:
From the Services MMC snap-in (Services.msc), stop and disable SQL Services.
Execute the following command from the SQL Server Binn directory:
sqlservr.exe –c –x
Enable the TCP/IP protocol to allow communication with client systems:
Navigate to Start Menu > Programs > Microsoft SQL Server R2 > Configuration Tools > SQL Server Configuration Manager. Then navigate to SQL Server Network Configuration > Protocols for MSSQL Server, right-click TCP/IP, and click Enable.
Configure SQL Server according to the guidance in the following list. You can configure SQL Server by using the sp_configure stored procedure. Set the “show advanced options” value to 1 to display more available configuration options. Detailed information about the sp_configure stored procedure is available in “Resources” later in this guide:
You can set CPU affinity for the SQL process to isolate system resources for the SQL Server instance from other SQL Server instances or other applications running on the same system. You can also set CPU affinity for the SQL process to not use a set of logical processors that handle I/O interrupt traffic (network and disk).
You can set CPU affinity for the SQL process in different ways, depending on processor count: Set affinity mask to partition the SQL process on specific cores up to 32 logical processors. To set affinity on more than 32 logical processors but fewer than 64 processors, use affinity64mask. Starting with SQL Server 2008 R2, youcan apply equivalent settings for configuring CPU affinity on as many as 256 logical processors using the ALTER SERVER CONFIGURATION SET PROCESS AFFINITY Data Definition Language (DDL) TSQL statement as the sp_configure affinity mask options are announced for deprecation. Use the ‘alter server configuration set process affinity cpu =’ command to set affinity to the desired range or ranges of processors, separated by commas. For more information on best practices for installations with more than 64 logical processors, and for more information on DDL, see “Resources” later in this guide.
You can set a fixed amount of memory for the SQL Server process to use. About 3% of the total available memory is used for the system, and another 1% is used for memory management structures. SQL Server can use the rest of available memory, but not more.
The following equation is available to calculate total memory to be used by SQL Server:
Leave the lightweight pooling value set to the default of 0. This enables SQL Server to run in threads mode. Threads mode performance is comparable to fibers mode.
If it appears that the default settings do not allow sufficient concurrent transactions based on a throughput value lower than expected for the system and benchmark configuration, set the maximum worker threads value to approximately the number of connected users. Monitor the sys.dm_os_schedulers DMV to determine whether you need to increase the number of worker threads.
Set the default trace enabled value to 0.
Set the priority boost value to 1.
Disk Storage Tunings
Tune the disk storage:
The TPC-E benchmark rules require disk storage redundancy. You can use RAID 1 0 if you have enough storage capacity. If you do not have enough capacity, you can use RAID 5.
If you use rotational disks, configure logical drives so that all spindles are used for database disks, if possible. Additional spindles improve overall disk subsystem performance.
The TPC-E workload consists of two disk I/O workloads: random reads/writes in a 9:1 ratio on database tables, and sequential writes on the log. You can improve performance with proper write caching on the log disk only in the case of battery backed up disk configurations that are able to avoid data loss in case of power failure:
Enable 100% write caching for the log disk.
TPC-E Database Size and Layout
Tune the database size and layout:
The TPC-E database consists of several file groups, and it can vary between different benchmark kits. Size is measured in number of customers, and for the database to be auditable, the ratio of database size (customers) to throughput (tpsE) should be approximately 500.
You can perform more fine tuning on the database layout :
Database tables that have higher access frequency should be placed on the outer edge of the disk if rotational disks are used.
The default TPC-E kit can be changed, and new file groups can be created. That way, file groups can consist of higher frequency access table(s) and they can be placed on the outer edge of the disk for better performance.
Client Systems Tunings
Tune the client systems:
Configure client systems the same way that the SUT is configured. See “Server Under Test (SUT) Tunings” earlier in this guide.
In addition to tuning the client systems, you should monitor client performance and eliminate any bottlenecks. Follow these client performance guidelines:
CPU utilization on clients should not be higher than 80%, to accommodate activity bursts.
If any of the processors has high CPU utilization, consider using CPU affinity for benchmark processes to even out CPU utilization. If CPU utilization is still high, consider upgrading clients to the latest processors, or add more clients.
Verify that time is synchronized between the master client and the SUT.
Monitoring and Data Collection
The following list of performance counters is considered a base set of counters when you monitor the resource usage of the database server for the TPC-E workload. Log the performance counters to a local, raw (blg) performance counter log. It is less expensive to collect all instances (‘*’ wide character) and then extract particular instances while post-processing by using Relog.exe or Perfmon:
Note: If applicable, add the \IPv6\* and \TCPv6\* objects. To monitor overall performance, you can use the performance counter chart displayed in Figure 9 and the throughput chart displayed in Figure 10 to visualize run characteristics. The first part of the run in Figure 9 represents the warm-up stage where I/O consists of mostly reads. As the run progresses, the lazy writer starts flushing caches to the disks and as write I/O increases, read I/O decreases. The beginning of steady state for the run is when the read I/O and write I/O curves seem to be parallel to each other.
Figure 9: TPC-E Perfmon Counters Chart
Figure 10. TPC-E Throughput Chart
You can use other tools such as Xperf to perform additional analysis.