Reference http://www.microsoft.com/download/en/details.aspx?id=20163
Monitoring I/O Performance
Using SQLIO to Stress Test an I/O Subsystem
Summary
The purpose of this document is to provide guidance on using SQLIO.exe to determine the Input/Output capacity of a disk subsystem. This document is focused on how to test configurations for Storage Area Network (SAN) environments however the same concepts can be applied to more traditional direct attach storage environments.It is good practice to benchmark the I/O subsystem using an I/O stress tool to determine the hardware’s capacity and ensure the system is tuned for optimal performance before deploying SQL Server. Doing this initial testing helps to identify any hardware or driver related performance issues early on, before the complexity of SQL Server is introduced. SQLIO.exe is a tool which can be used to determine the I/O capacity of a given hardware configuration. The purpose of SQLIO is not to simulate I/O patterns of SQL Server but rather to test a variety of I/O types and sizes and determine the capacity of an I/O subsystem.
Determining I/O Capacity
Below are some concepts which should be applied when using any I/O stress tool. Below, ‘path’ refers to a unique I/O path (this can be used interchangeably with LUN/Disk/Volume depending on the system configuration). When testing the I/O system you should consider the following:1. Test a variety of I/O types and sizes. You should test a combination of I/O sizes for read/write and sequential/random. Be sure to include I/O sizes of 8K, 64K, 128K, and 256K. In addition, different levels of outstanding I/O’s can be tested to determine how this affects the overall achievable performance.
2. Make sure the total size of the test files used in each test is significantly larger than the amount of cache on the SAN. Using small test files can result in skewed results since the entire file may be cached on the SAN. Consider using test files that are at least 2 to 4 times the size of any data cache on the SAN. Although our focus is on benchmarking SAN storage this concept would apply to any cache offered by other technologies (i.e. Network Attached Storage and Direct Attached Storage).
3. Make sure you understand your specific hardware configuration and the theoretical limits for the given hardware. Be sure to consider specific details of your configuration which will impact this (ex. using static mapping of LUNs to HBAs vs. using multipathing software).
4. Test each I/O path individually and then test combinations of the I/O paths. Ideally, there should be a linear scale up of throughput (MB/s) as additional I/O paths are added to the test. This, of course, is dependent on the hardware throughput limitations.
5. Short test runs (5-10 minutes) are okay for initial testing, however if potential problem areas are discovered longer tests should be run. Many of the larger storage systems are self tuning and may require a workload be run for a period of time before an optimal performance level is reached. Discuss with your particular SAN vendor how long they recommend running tests.
6. Allow some time in between each I/O type you test to allow the I/O system to return to an idle state. Following a large amount of write activity, it may take some time to flush all pending write requests from cache to disk. This could impact results of subsequent tests if no wait time is included between tests. Waiting one minute between tests is acceptable for most systems however you should consult your SAN vendor to determine what is acceptable on their specific hardware.
7. Store all of the benchmark data in order to compare with the SQL Server I/O throughput numbers.
Depending on the capabilities of the tool, you can also attempt to simulate SQL Server I/O patterns. This should be done on a case-by-case basis for the workload type you are attempting to simulate.
SQLIO
As previously mentioned, SQLIO.exe is a tool provided by Microsoft which can be used to determine the I/O capacity of a given I/O configuration. SQLIO is provided ‘as is’ and there is no support offered for any problems encountered when using the tool. Please refer to the EULA.doc for the license agreement for using this tool.
Specific Instructions for using SQLIO.exe
Below are instructions on how to use SQLIO to test the hardware capacity of a given configuration. Here, ‘path’ refers to a unique I/O path (this can be used interchangeably with LUN/Disk/Volume depending on the configuration). For a complete set of the options available to SQLIO see readme.txt or run ‘SQLIO.exe -?’ from a command line.When using SQLIO to test a storage configuration, all of the I/O paths should be tested individually and then combinations of these paths should be tested together. Testing the I/O paths in this manner will enable you to determine if there are any problems with a given path and if the cumulative throughput is what is expected given the hardware being used.
Below is sample of the syntax which could be used in a batch file to automate running SQLIO through a series of tests varying I/O types and sizes. This sample should be modified as needed based on your specific testing needs.
sqlio -kW -s10 -frandom -o8 -b8 -LS -Fparam.txt
timeout /T 60
sqlio -kW -s360 -frandom -o8 -b64 -LS -Fparam.txt
timeout /T 60
sqlio -kW -s360 -frandom -o8 -b128 -LS -Fparam.txt
timeout /T 60
sqlio -kW -s360 -frandom -o8 -b256 -LS -Fparam.txt
timeout /T 60
sqlio -kW -s360 -fsequential -o8 -b8 -LS -Fparam.txt
timeout /T 60
sqlio -kW -s360 -fsequential -o8 -b64 -LS -Fparam.txt
timeout /T 60
sqlio -kW -s360 -fsequential -o8 -b128 -LS -Fparam.txt
timeout /T 60
sqlio -kW -s360 -fsequential -o8 -b256 -LS -Fparam.txt
timeout /T 60
sqlio -kR -s360 -frandom -o8 -b8 -LS -Fparam.txt
timeout /T 60
sqlio -kR -s360 -frandom -o8 -b64 -LS -Fparam.txt
timeout /T 60
sqlio -kR -s360 -frandom -o8 -b128 -LS -Fparam.txt
timeout /T 60
sqlio -kR -s360 -frandom -o8 -b256 -LS -Fparam.txt
timeout /T 60
sqlio -kR -s360 -fsequential -o8 -b8 -LS -Fparam.txt
timeout /T 60
sqlio -kR -s360 -fsequential -o8 -b64 -LS -Fparam.txt
timeout /T 60
sqlio -kR -s360 -fsequential -o8 -b128 -LS -Fparam.txt
timeout /T 60
sqlio -kR -s360 -fsequential -o8 -b256 -LS -Fparam.txt
Some of the more commonly used parameters for SQLIO are discussed below (again refer to the readme.txt or ‘SQLIO.exe -?’ for a complete listing).
Table 1 – Commonly used SQLIO.exe options
Option | Description |
-o | Specify the number of outstanding I/O requests. Increasing the queue depth may result in a higher total throughput. However, increasing this number too high may result in problems (described in more detail below). Common values for this are 8, 32, and 64. |
-LS | Capture disk latency information. Capturing latency data is recommended when testing a system. |
-k | Specify either R or W (read or write). |
-s | Duration of test (seconds). For initial tests, running for 5-10 minutes per I/O size is recommended to get a good idea of I/O performance. |
-b | Size of the IO request in bytes. |
-f | Type of IO to issue. Either ‘random’ or ‘sequential’. |
-F | Name of the file which will contain the test files to run SQLIO against. |
The sample script above will run a series of tests against the drive or drives specified in the “param.txt” file. This file should reside in the same directory as SQLIO.exe. Here is a sample param.txt that tests more than one path (or LUN) at a time:
c:\sqlio_test.dat 4 0x0 100
d:\sqlio_test.dat 4 0x0 100
The options on each line of the param.txt file are as follows:
- <Path to test file> Full path and name of the test file to be used.
- <Number of threads (per test file)> Recommend setting this equal to the number of CPUs on the host. Possible exception is when testing many paths at one time. See note above.
- <Mask > Set to 0x0
- <Size of test file in MB> Ideally, this should be large enough so that the test file will be larger than any cache resident on the SAN (or RAID controller). Two to four times the size of any cache allocated is a good rule of thumb to follow.
Interpreting the Results of SQLIO
SQLIO will return and summarize certain performance data during the test run. When running the test you can redirect the output of SQLIO to a text file to capture the performance information. Using the options shown above, SQLIO will return performance statistics for the total throughput (MB/sec), the average disk latency (milliseconds) and the number of I/O’s per second. In most cases this should be enough information to get a basic understanding of the system’s current I/O performance. If more detailed information is needed Windows Performance Monitor and/or SAN specific monitoring tools can be used.When interpreting the results from SQLIO keep in mind the following:
· Larger I/O sizes normally result in higher latencies.
· Ideally you should see some increase in your throughput as you add more data paths to your test. You may also see increased total throughput as the size of the I/O is increased. This is, however, dependent on the specifics of your particular configuration.
· Sometimes it can be helpful to share the results from your SQLIO tests with your SAN vendor since they have detailed knowledge about how the equipment should perform for different I/O types, RAID levels, etc…
SQLIOStress
Another tool which is available to stress the I/O subsystem is SQLIOStress.exe. This tool differs from SQLIO in that it is designed to simulate the I/O patterns of SQL Server. SQLIO should be used to determine the maximum capacity of a given I/O configuration. Then SQLIOStress can be used to determine how SQL Server I/O might perform under this configuration. This tool can also be very useful as a troubleshooting tool if your current SQL Server installation is experiencing I/O failures and you think you may have a hardware problem.
To download this tool and obtain more information on tool usage, see the following article:
231619 HOW TO: Use the SQLIOStress Utility to Stress a Disk Subsystem
http://support.microsoft.com/?id=231619
Monitoring I/O Performance
The Windows Performance Monitor tool can be used to get detailed information about I/O performance on a system. Table 2 contains some detailed information regarding some of the I/O specific performance monitor counters you should consider regularly monitoring and how to interpret them. The counters below can be captured by monitoring either Physical or Logical Disk counters. The choice of which object to use should be based on the specific storage configuration that is being tested. In some cases capturing Logical Disk counters will result in more detailed performance data.Table 2 - I/O related performance counters and their meanings.
Performance Monitor Counter | Description |
Disk Reads/sec & Disk Writes/sec | Number of I/O’s being issued against a particular disk. This number varies based on the size of I/O’s issued. Practical limit of 100-140/sec per spindle, however consult with hardware vendor for more accurate estimation. |
Average Disk/sec Read & Average Disk/sec Write | Measure of disk latency. Lower values are better but this can vary and is dependent on the size of I/O’s being issued as well as the workload characteristics. Numbers also vary across different storage configurations (SAN cache size/utilization can impact this greatly). On well-tuned OLTP systems deployed on high performance SAN’s ideal values would be in the range of < 2 ms for Log and 4-10 ms for Data. DSS (decision support system) type workloads may result in higher latencies (30+ ms). Long running values > 100ms could be an indication of I/O problems. This, however, is dependent on the workload’s characteristics and hardware used . Consider in combination with what is normal for your particular system. |
Average Disk Bytes/Read & Average Disk Bytes/Write | Size of I/O’s being issued. Impacts disk latency. Large I/O sizes may result in slightly higher latency. When used with SQLIO this value should correspond to the I/O size being issued during the test. When used to monitor SQL Server, this will tell you the average size of the I/O’s SQL is issuing to fill query requests. |
Average Disk Queue Length | The general rule of thumb is <=2 per spindle but this is hard to measure due to virtualization of storage in most SAN environments. Look for higher than average disk queue length in combination with higher than average disk latencies. This combination could be an indication that the SAN’s cache is being over utilized. |
Disk Read Bytes/sec & Disk Write Bytes/sec | Measure of the total throughput for a particular disk or LUN. |
The ability and level of disk array monitoring varies from vendor to vendor but most, if not all SAN vendors, offer some type of monitoring on the backend. Longer test runs may be needed to capture information on the SAN level. This is dependent on the frequency at which those tools capture data. It is advised that you familiarize yourself with the monitoring capabilities of your specific SAN. Some of the points of interest in monitoring the SAN are utilization of a specific port(s), write pending levels, % cache utilization, internal processor utilization, LUN activity and physical spindle activity. All of these are potential bottlenecks for performance.
In addition, you should work closely with your SAN vendor to determine the correct HBA driver and firmware versions as well as the correct settings for the HBA drivers. One of the settings on the HBA level that may need some adjustment is QueueDepth. It is recommended that you work with your SAN vendor and do careful testing to ensure that the optimal HBA values are set.
No comments:
Post a Comment