• Monitoring and Data Collection
  • Performance Tuning for TPC-E Workload
  • Server Under Test (SUT) Tunings
  • SQL Server Tunings for TPC-E Workload
  • TPC-E Database Size and Layout
  • Web Servers 10 Tips for Writing High-Performance Web Applications
  • Active Directory Servers
  • Remote Desktop Session Host Capacity Planning
  • Sales and Distribution Two-Tier Workload and TPC-E Workload
  • Performance Tuning Guidelines for Windows Server 2008 R2 April 12, 2013 Abstract




    Download 0.5 Mb.
    bet24/24
    Sana21.03.2017
    Hajmi0.5 Mb.
    1   ...   16   17   18   19   20   21   22   23   24

    Tunings on the SAP Application Server


    The ratio between the number of Dialog (D) processes versus Update (U) processes in the SAP ERP installation might vary, but usually a ratio of 1D:1U or 2D:1U per logical processor is a good start for the SD workload. Ensure that in a SAP dialog instance, the number of worker processes and users does not exceed the capacity of the SAP dispatcher for that dialog instance (the current maximum is approximately 2,000 users per instance). On NUMA-class hardware, consider installing one or more SAP dialog instances per NUMA node (depending on the number of logical processors per NUMA node that you want to use with SAP worker processes). The D:U ratio, and the overall number of SAP dialog instances per NUMA node or system wide, might be improved based on the analysis of previous experiments.

    To further partition within an SAP instance, use the processor affinity capabilities in the SAP instance profiles to partition each worker process to a subset of the available logical processors and achieve better CPU and memory locality. Affinity setting in the SAP instance profiles is supported for as many as 64 logical processors.

    Use the FLAT memory model that SAP AG released on November 23, 2006, with the SAP Note No. 1002587 “Flat Memory Model on Windows” for SAP kernel 7.00 Patch Level 87.

    Windows Server 2008 R2 supports more than 64 logical processors. On such NUMA-class systems, consider setting preferred NUMA nodes in addition to setting hard affinities by using the following steps:



    1. Set the preferred NUMA node for the SAP Win32 service and SAP Dialog Instance services (processes instantiated by Sapstartsrv.exe). When you enter commands on the local system, you can omit the server parameter. For the commands below, use the service short name:

    • Use the following command to set the preferred NUMA node:

    %windir%\system32\sc.exe [server] preferrednode
    You need administrator permissions to set the preferred node. Use %windir%\system32\sc.exe preferrednode to display help text.

    • Use the following command to query the setting:

    %windir%\system32\sc.exe [server] qpreferrednode
    This command fails if the service has no preferred node settings. Use %windir%\system32\sc.exe qpreferrednode to display help text.

    • Use the following command to remove the setting:

    %windir%\system32\sc.exe [server] preferrednode -1


    1. To allow each SAP worker process in a dialog instance to inherit the ideal NUMA node from its Win32 service, create registry key entries under the following key for each of the Sapstartsrv.exe, Msg_server.exe, Gwrd.exe, and Disp work.exe images and set the "NodeOptions"=dword:00000100 value:

    HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ (IMAGE NAME)\ (REG_DWORD)


    1. If the preferred NUMA node is used without hard affinity settings for SAP worker processes, or if time measurement issues are observed as described by SAP Note No. 532350 released on November 29, 2004, apply the recommendation to let SAP processes use the Query Performance Counter (QPC) timer to stabilize the benchmark environment. Set the following system environment variable:

    %windir%\system32\setx.exe /M SAP_USE_WIN_TIMER YES


    1. If applicable, use the IntPolicy tool as described in the “Interrupt Affinity” section earlier in this guide to set an optimal interrupt affinity for storage or network devices.

    You can use the Coreinfo tool from Windows Sysinternals to provide topology details about logical and physical processors, processor sockets, NUMA nodes, and processor cache. For more information, see “Resources” later in this guide.

    Monitoring and Data Collection


    The following list of performance counters is considered a base set of counters when you monitor the resource usage of the Application Server while you are running the two-tier SAP ERP SD workload. Log the performance counters to a local, raw (blg) performance counter log. It is less expensive to collect all instances (‘*’ wide character) and then extract particular instances while post-processing by using Relog.exe:

    \Cache\*
    \IPv4\*


    \LogicalDisk(*)\*
    \Memory\*
    \Network Interface(*)\*
    \Paging File(*)\*
    \PhysicalDisk(*)\*
    \Process(*)\*
    \Processor Information(*)\*
    \Synchronization(*)\*
    \System\*
    \TCPv4\*
    \SQLServer:Buffer Manager\Lazy writes/sec

    Note: If applicable, add the \IPv6\* and \TCPv6\* objects.

    Performance Tuning for TPC-E Workload


    TPC-E online transaction processing (OLTP) is one of the primary database workloads used to evaluate SQL Server and Windows Server performance. TPC-E uses a central database that executes transactions related to a brokerage firm’s customer accounts. The primary metric for TPC-E is Trade-Result transactions per second (tpsE). Note that Trade-Result transactions account for 10% of the transaction mix. For more information about the TPC-E benchmark, see the TPC-E website listed in “Resources” later in this guide.

    A non-clustered TPC-E benchmark setup consists of two parts: a set of client systems and the server under test (SUT). To achieve maximum system utilization and throughput, you can tune the operating system, SQL Server, storage, memory, processors, and network.



    Important: The tunings in this section are specifically for OLTP benchmarking and should not be perceived as general SQL tuning guidance.

    Server Under Test (SUT) Tunings


    Use the following SUT tunings:

    Set the power scheme to High Performance.

    Configure pagefiles for best performance:

    Navigate to Performance Settings > Advanced > Virtual memory and configure one or more fixed-size pagefiles with Initial Size equal to Maximum Size. The pagefile size should be equal to the total virtual memory requirement of the workload. Make sure that no system-managed pagefiles are in the virtual memory on the application server.

    Navigate to Performance Settings > Visual Effects and select Adjust for best performance.
    To enable SQL Server to use large pages, enable the Lock pages in memory user right assignment for the account that will run the SQL Server:

    From the Group Policy MMC snap-in (Gpedit.msc), navigate to Computer Configuration > Windows Settings > Security Settings > Local Policies > User Rights Assignment. Double-click Lock pages in memory and add the accounts that have credentials to run SQL Server.


    Configure network devices:

    The number of network devices is determined from previous runs. Network device utilization should not be higher than 65%-75% of total NIC bandwidth. Use 1-Gbps NICs at minimum.

    From the Device Manager MMC snap-in (Devmgmt.msc), navigate to Network Adapters and determine the network devices to be used. Disable devices that are not being used.

    If interrupt partitioning is necessary in high interrupt rates per NIC port scenarios, and the device supports interrupt affinity configuration, set network device interrupt affinity:



        • Using the IntPolicy tool, set interrupt affinity in a round-robin fashion starting from processor 0. If the SUT is a multinode system, determine on which nodes the NICs reside and set the affinity to processors that belong to the node on which each NIC resides. For detailed information on the IntPolicy tool, see "Resources" later in this guide.

    For advanced network tuning information, see “Performance Tuning for the Networking Subsystem” earlier in this guide.


    Configure storage devices:

    If the operating system is Windows Server 2008 R2, DPC redirection optimization is available on some storage drivers. If the storage device driver supports DPC redirection optimization, there is no need to set interrupt affinity on storage devices. If the storage device driver does not support DPC redirection, or if storage device driver interrupts are not distributed to processors on the same NUMA node where the device resides, set the interrupt affinity for each device by using IntPolicy as advised for networking devices.

    For advanced storage tuning information, see “Performance Tuning for the Storage Subsystem” earlier in this guide.
    Configure disks for advanced performance:

    From the Disk Management MMC snap-in (Diskmgmt.msc), select each disk in use, right-click to Properties > Policies and select Advanced Performance if it is enabled for the disk.



    SQL Server Tunings for TPC-E Workload


    The following SQL Server tunings improve performance and scalability in environments such as TPC-E:

    You can use the -T834 start flag to enable SQL Server to use large pages.

    If you disable SQL Server performance counters to avoid potential overhead, start SQL Server as a process instead of a service and use the -x flag:


    1. From the Services MMC snap-in (Services.msc), stop and disable SQL Services.

    2. Execute the following command from the SQL Server Binn directory:

    sqlservr.exe –c –x
    Enable the TCP/IP protocol to allow communication with client systems:

    • Navigate to Start Menu > Programs > Microsoft SQL Server R2 > Configuration Tools > SQL Server Configuration Manager. Then navigate to SQL Server Network Configuration > Protocols for MSSQL Server, right-click TCP/IP, and click Enable.

    Configure SQL Server according to the guidance in the following list. You can configure SQL Server by using the sp_configure stored procedure. Set the “show advanced options” value to 1 to display more available configuration options. Detailed information about the sp_configure stored procedure is available in “Resources” later in this guide:

    You can set CPU affinity for the SQL process to isolate system resources for the SQL Server instance from other SQL Server instances or other applications running on the same system. You can also set CPU affinity for the SQL process to not use a set of logical processors that handle I/O interrupt traffic (network and disk).

    You can set CPU affinity for the SQL process in different ways, depending on processor count: Set affinity mask to partition the SQL process on specific cores up to 32 logical processors. To set affinity on more than 32 logical processors but fewer than 64 processors, use affinity64 mask. Starting with SQL Server 2008 R2, you can apply equivalent settings for configuring CPU affinity on as many as 256 logical processors using the ALTER SERVER CONFIGURATION SET PROCESS AFFINITY Data Definition Language (DDL) TSQL statement as the sp_configure affinity mask options are announced for deprecation. Use the ‘alter server configuration set process affinity cpu =’ command to set affinity to the desired range or ranges of processors, separated by commas. For more information on best practices for installations with more than 64 logical processors, and for more information on DDL, see “Resources” later in this guide.

    You can set a fixed amount of memory for the SQL Server process to use. About 3% of the total available memory is used for the system, and another 1% is used for memory management structures. SQL Server can use the rest of available memory, but not more.

    The following equation is available to calculate total memory to be used by SQL Server:

    TotalMemory – (1%memory * (numa_nodes)) – 3%memory – 1GB memory

    Leave the lightweight pooling value set to the default of 0. This enables SQL Server to run in threads mode. Threads mode performance is comparable to fibers mode.

    If it appears that the default settings do not allow sufficient concurrent transactions based on a throughput value lower than expected for the system and benchmark configuration, set the maximum worker threads value to approximately the number of connected users. Monitor the sys.dm_os_schedulers DMV to determine whether you need to increase the number of worker threads.

    Set the default trace enabled value to 0.

    Set the priority boost value to 1.

    Disk Storage Tunings


    Tune the disk storage:

    The TPC-E benchmark rules require disk storage redundancy. You can use RAID 1 0 if you have enough storage capacity. If you do not have enough capacity, you can use RAID 5.

    If you use rotational disks, configure logical drives so that all spindles are used for database disks, if possible. Additional spindles improve overall disk subsystem performance.

    The TPC-E workload consists of two disk I/O workloads: random reads/writes in a 9:1 ratio on database tables, and sequential writes on the log. You can improve performance with proper write caching on the log disk only in the case of battery backed up disk configurations that are able to avoid data loss in case of power failure:

    Enable 100% write caching for the log disk.

    TPC-E Database Size and Layout


    Tune the database size and layout:

    The TPC-E database consists of several file groups, and it can vary between different benchmark kits. Size is measured in number of customers, and for the database to be auditable, the ratio of database size (customers) to throughput (tpsE) should be approximately 500.

    You can perform more fine tuning on the database layout :

    Database tables that have higher access frequency should be placed on the outer edge of the disk if rotational disks are used.

    The default TPC-E kit can be changed, and new file groups can be created. That way, file groups can consist of higher frequency access table(s) and they can be placed on the outer edge of the disk for better performance.

    Client Systems Tunings


    Tune the client systems:

    Configure client systems the same way that the SUT is configured. See “Server Under Test (SUT) Tunings” earlier in this guide.

    In addition to tuning the client systems, you should monitor client performance and eliminate any bottlenecks. Follow these client performance guidelines:

    CPU utilization on clients should not be higher than 80%, to accommodate activity bursts.

    If any of the processors has high CPU utilization, consider using CPU affinity for benchmark processes to even out CPU utilization. If CPU utilization is still high, consider upgrading clients to the latest processors, or add more clients.

    Verify that time is synchronized between the master client and the SUT.



    Monitoring and Data Collection


    The following list of performance counters is considered a base set of counters when you monitor the resource usage of the database server for the TPC-E workload. Log the performance counters to a local, raw (blg) performance counter log. It is less expensive to collect all instances (‘*’ wide character) and then extract particular instances while post-processing by using Relog.exe or Perfmon:

    \IPv4\*
    \Memory\*


    \Network Interface(*)\*
    \PhysicalDisk(*)\*
    \Processor Information(*)\*
    \Synchronization(*)\*
    \System\*
    \TCPv4\*

    Note: If applicable, add the \IPv6\* and \TCPv6\* objects. To monitor overall performance, you can use the performance counter chart displayed in Figure 9 and the throughput chart displayed in Figure 10 to visualize run characteristics. The first part of the run in Figure 9 represents the warm-up stage where I/O consists of mostly reads. As the run progresses, the lazy writer starts flushing caches to the disks and as write I/O increases, read I/O decreases. The beginning of steady state for the run is when the read I/O and write I/O curves seem to be parallel to each other.



    Figure 9: TPC-E Perfmon Counters Chart

    c:\users\dariac.ntdev\appdata\local\microsoft\windows\temporary internet files\content.outlook\fk0md562\tpce throughput (3).png

    Figure 10. TPC-E Throughput Chart

    You can use other tools such as Xperf to perform additional analysis.



    Resources

    Web Sites


    Windows Server 2008 R2

    http://www.microsoft.com/windowsserver2008/en/us/R2.aspx



    Windows Server 2008

    http://www.microsoft.com/windowsserver2008/



    Windows Server Performance Team Blog

    http://blogs.technet.com/winserverperformance/



    Windows Server Catalog

    http://www.windowsservercatalog.com/



    SAP Global Benchmark: Sales and Distribution (SD)

    http://www.sap.com/solutions/benchmark/sd.epx



    Windows Sysinternals

    http://technet.microsoft.com/sysinternals/default.aspx



    Transaction Processing Performance Council

    http://www.tpc.org/



    IxChariot

    http://www.ixiacom.com/support/ixchariot/


    Power Management


    Power Policy Configuration and Deployment in Windows

    http://msdn.microsoft.com/windows/hardware/gg463243.aspx



    Using PowerCfg to Evaluate System Energy Efficiency

    http://msdn.microsoft.com/windows/hardware/gg463250.aspx



    Interrupt-Affinity Policy Tool

    http://msdn.microsoft.com/windows/hardware/gg463378.aspx


    Networking Subsystem


    Scalable Networking: Eliminating the Receive Processing Bottleneck—Introducing RSS

    http://download.microsoft.com/download/5/D/6/5D6EAF2B-7DDF-476B-93DC-7CF0072878E6/NDIS_RSS.doc



    Windows Filtering Platform

    http://msdn.microsoft.com/windows/hardware/gg463267.aspx



    Networking Deployment Guide: Deploying High-Speed Networking Features

    http://download.microsoft.com/download/8/E/D/8EDE21BC-0E3B-4E14-AAEA-9E2B03917A09/HSN_Deployment_Guide.doc


    Storage Subsystem


    Disk Subsystem Performance Analysis for Windows

    (Parts of this document are out of date, but many of the general observations and guidelines are still accurate.)

    http://msdn.microsoft.com/windows/hardware/gg463405.aspx

    Web Servers


    10 Tips for Writing High-Performance Web Applications

    http://go.microsoft.com/fwlink/?LinkId=98290


    File Servers


    Performance Tuning Guidelines for Microsoft Services for Network File System

    http://technet.microsoft.com/library/bb463205.aspx



    [MS-FSSO]: File Access Services System Overview

    http://msdn.microsoft.com/library/ee392367(v=PROT.10).aspx



    How to disable the TCP autotuning diagnostic tool

    http://support.microsoft.com/kb/967475


    Active Directory Servers


    Active Directory Performance for 64-bit Versions of Windows Server 2003

    http://www.microsoft.com/downloads/details.aspx?FamilyID=52e7c3bd-570a-475c-96e0-316dc821e3e7



    How to configure Active Directory diagnostic event logging in Windows Server 2003 and in Windows 2000 Server

    http://support.microsoft.com/kb/314980


    Remote Desktop Session Host Capacity Planning


    RD Session Host Capacity Planning in Windows Server 2008 R2

    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=ca837962-4128-4680-b1c0-ad0985939063



    RD Virtualization Host Capacity Planning in Windows Server 2008 R2

    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=bd24503e-b8b7-4b5b-9a86-af03ac5332c8


    Virtualization Servers


    Hyper-V Dynamic Memory Configuration Guide

    http://technet.microsoft.com/library/ff817651(WS.10).aspx



    NUMA Node Balancing

    http://blogs.technet.com/b/winserverperformance/archive/2009/12/10/numa-node-balancing.aspx



    Hyper-V WMI Provider

    http://msdn2.microsoft.com/library/cc136992(VS.85).aspx



    Hyper-V WMI Classes

    http://msdn.microsoft.com/library/cc136986(VS.85).aspx



    Requirements and Limits for Virtual Machines and Hyper-V in Windows Server 2008 R2

    http://technet.microsoft.com/library/ee405267(WS.10).aspx


    Network Workload


    Ttcp

    http://en.wikipedia.org/wiki/Ttcp



    How to Use NTttcp to Test Network Performance

    http://msdn.microsoft.com/windows/hardware/gg463264.aspx


    Sales and Distribution Two-Tier Workload and TPC-E Workload


    Setting Server Configuration Options

    http://go.microsoft.com/fwlink/?LinkId=98291



    How to: Configure SQL Server to Use Soft-NUMA

    http://go.microsoft.com/fwlink/?LinkId=98292



    How to: Map TCP/IP Ports to NUMA Nodes

    http://go.microsoft.com/fwlink/?LinkId=98293



    ALTER SERVER CONFIGURATION SET PROCESS AFFINITY (Transact-SQL) (How to Set Process Affinity using DDL)

    http://msdn.microsoft.com/library/ee210585.aspx



    Best Practices for Running SQL Server on Computers That Have More Than 64 CPUs

    http://msdn.microsoft.com/library/ee210547.aspx



    SAP with Microsoft SQL Server 2008 and SQL Server 2005:

    Best Practices for High Availability, Maximum Performance, and Scalability

    http://www.sdn.sap.com/irj/sdn/sqlserver?rid=/library/uuid/4ab89e84-0d01-0010-cda2-82ddc3548c65



    1   ...   16   17   18   19   20   21   22   23   24


    Download 0.5 Mb.

    Bosh sahifa
    Aloqalar

        Bosh sahifa


    Performance Tuning Guidelines for Windows Server 2008 R2 April 12, 2013 Abstract

    Download 0.5 Mb.