How to use Diskspd to check IO subsystem performance

One of many responsibilities of the Database Administrator is the installation of new SQL Server instances. However, before you do it, it is best practice to test IO subsystem performance. It does not make any sense to install the new instance in an environment that will have performance issues in a moment. In this blog post, I describe how to use Diskspd tool to perform such disk performance tests.

What is Diskspd?

Diskspd is a storage testing tool created by Microsoft Windows, Windows Server and Cloud Server Infrastructure Engineering teams. It combines robust and granular IO workload definition with flexible runtime and output options. That makes it a perfect tool for storage performance testing, validation and benchmarking.

Where to find Diskspd?

Diskspd is a free and open source utility. Its source code can be found on GitHub. The repository also hosts other frameworks which use Diskspd. You can find them under ‘Frameworks’ directory. A binary release is hosted by Microsoft at the following location: http://aka.ms/diskspd.

How to use Diskspd?

Diskspd doesn’t require any additional tools or frameworks to be used. No .NET or Java required. Everything that is needed is included in a ZIP file. After downloading it and extracting you will have this:

Diskspd - zip archive

Each of three folders contains a Diskspd.exe executable:

  • amd64fre – for 64bit systems
  • armfre – for ARM systems
  • x86fre – for 32bit systems

In addition to that, you get a comprehensive Diskspd documentation with many usage examples and a PowerShell script to convert XML results into CSV.

Diskspd is a command line utility and as a such has a long list of available parameters. However, to use it efficiently you just need to remember a few of them. Let’s analyze parameters from the below example:

DiskSpd.exe -c150G -d300 -r -w40 -t8 -o32 -b64K -Sh -L D:\SpeedTest\testfile.dat

Parameters:

  • -c150G – Create a file of the specified size. Size can be stated in bytes or KiBs, MiBs, GiBs. Here – 150GB.
  • -d300 – Duration of measurement period in seconds, not including cool-down or warm-up time (default = 10 seconds). Here – 5 minutes.
  • -r – Random I/O access (override -s).
  • -s – Sequential I/O access.
  • -w40 – Percentage of write requests to issue (default = 0, 100% read). Here 40% of IO operations are Writes, remaining 60% are Reads. This is a usual load for my SQL Server OLTP databases.
  • -t8 – The number of threads per file. Here – 8. One thread per available core.
  • -o32 – The number of outstanding I/O requests per target per thread. In other words, it is a queue depth. Here – 32.
  • -b46K – Block size in bytes or KiBs, MiBs, or GiBs. Here – 64KB.
  • -Sh – Disable both software caching and hardware write caching.
  • -L – Measure latency statistics.
  • D:\SpeedTest\testfile.dat – My target file used for testing (created with -c).
How to read results?

I executed this command on a system I recently got for testing. My task was to examine whether new SAN disks array is configured properly and has sufficient performance for a load generated by SQL Server. Here is the result:

Command Line: DiskSpd.exe -c150G -d300 -r -w40 -t8 -o32 -b64K -Sh -L D:\SpeedTest\testfile.dat

Input parameters:

    timespan:   1
    -------------
    duration: 300s
    warm up time: 5s
    cool down time: 0s
    measuring latency
    random seed: 0
    path: 'D:\SpeedTest\testfile.dat'
        think time: 0ms
        burst size: 0
        software cache disabled
        hardware write cache disabled, writethrough on
        performing mix test (read/write ratio: 60/40)
        block size: 65536
        using random I/O (alignment: 65536)
        number of outstanding I/O operations: 32
        thread stride size: 0
        threads per file: 8
        using I/O Completion Ports
        IO priority: normal



Results for timespan 1:
*******************************************************************************

actual test time:	300.01s
thread count:		8
proc count:		8

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|   1.01%|   0.13%|    0.87%|  99.00%
   1|   3.04%|   0.14%|    2.90%|  96.96%
   2|   0.71%|   0.14%|    0.57%|  99.29%
   3|   0.74%|   0.13%|    0.61%|  99.26%
   4|   0.62%|   0.10%|    0.53%|  99.38%
   5|   0.90%|   0.09%|    0.81%|  99.10%
   6|   0.66%|   0.06%|    0.60%|  99.34%
   7|   0.62%|   0.05%|    0.57%|  99.38%
-------------------------------------------
avg.|   1.04%|   0.11%|    0.93%|  98.96%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      4326424576 |        66016 |      13.75 |     220.04 |  145.375 |    52.576 | testfile.dat (150GB)
     1 |      4338679808 |        66203 |      13.79 |     220.67 |  144.962 |    51.947 | testfile.dat (150GB)
     2 |      4328062976 |        66041 |      13.76 |     220.13 |  145.323 |    52.482 | testfile.dat (150GB)
     3 |      4328128512 |        66042 |      13.76 |     220.13 |  145.308 |    52.563 | testfile.dat (150GB)
     4 |      4336517120 |        66170 |      13.78 |     220.56 |  145.029 |    52.215 | testfile.dat (150GB)
     5 |      4334288896 |        66136 |      13.78 |     220.44 |  145.107 |    52.244 | testfile.dat (150GB)
     6 |      4328718336 |        66051 |      13.76 |     220.16 |  145.289 |    53.204 | testfile.dat (150GB)
     7 |      4339859456 |        66221 |      13.80 |     220.73 |  144.928 |    52.339 | testfile.dat (150GB)
-----------------------------------------------------------------------------------------------------
total:       34660679680 |       528880 |     110.18 |    1762.85 |  145.165 |    52.447

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      2600796160 |        39685 |       8.27 |     132.28 |  144.647 |    58.771 | testfile.dat (150GB)
     1 |      2616066048 |        39918 |       8.32 |     133.05 |  144.026 |    57.697 | testfile.dat (150GB)
     2 |      2605776896 |        39761 |       8.28 |     132.53 |  144.203 |    58.723 | testfile.dat (150GB)
     3 |      2582183936 |        39401 |       8.21 |     131.33 |  144.362 |    59.017 | testfile.dat (150GB)
     4 |      2592538624 |        39559 |       8.24 |     131.86 |  144.037 |    58.435 | testfile.dat (150GB)
     5 |      2597126144 |        39629 |       8.26 |     132.09 |  144.168 |    58.460 | testfile.dat (150GB)
     6 |      2599616512 |        39667 |       8.26 |     132.22 |  144.713 |    59.531 | testfile.dat (150GB)
     7 |      2601058304 |        39689 |       8.27 |     132.29 |  143.783 |    58.635 | testfile.dat (150GB)
-----------------------------------------------------------------------------------------------------
total:       20795162624 |       317309 |      66.10 |    1057.65 |  144.242 |    58.661

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      1725628416 |        26331 |       5.49 |      87.77 |  146.474 |    41.504 | testfile.dat (150GB)
     1 |      1722613760 |        26285 |       5.48 |      87.61 |  146.384 |    41.685 | testfile.dat (150GB)
     2 |      1722286080 |        26280 |       5.47 |      87.60 |  147.019 |    41.225 | testfile.dat (150GB)
     3 |      1745944576 |        26641 |       5.55 |      88.80 |  146.707 |    41.165 | testfile.dat (150GB)
     4 |      1743978496 |        26611 |       5.54 |      88.70 |  146.505 |    41.226 | testfile.dat (150GB)
     5 |      1737162752 |        26507 |       5.52 |      88.35 |  146.511 |    41.199 | testfile.dat (150GB)
     6 |      1729101824 |        26384 |       5.50 |      87.94 |  146.156 |    41.917 | testfile.dat (150GB)
     7 |      1738801152 |        26532 |       5.53 |      88.44 |  146.641 |    41.101 | testfile.dat (150GB)
-----------------------------------------------------------------------------------------------------
total:       13865517056 |       211571 |      44.08 |     705.20 |  146.550 |    41.379


  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |     22.132 |     22.095 |     22.095
   25th |    111.940 |    129.764 |    116.858
   50th |    132.568 |    151.406 |    141.026
   75th |    163.575 |    172.090 |    168.590
   90th |    210.856 |    191.478 |    199.258
   95th |    251.512 |    203.980 |    226.995
   99th |    359.454 |    228.761 |    322.688
3-nines |    541.528 |    270.872 |    497.081
4-nines |    727.123 |    317.519 |    695.128
5-nines |    811.054 |    475.358 |    807.394
6-nines |    828.036 |    648.498 |    828.036
7-nines |    828.036 |    648.498 |    828.036
8-nines |    828.036 |    648.498 |    828.036
9-nines |    828.036 |    648.498 |    828.036
    max |    828.036 |    648.498 |    828.036

Let’s analyze it.

In the first line, we have the command which was executed to perform the test. Used parameters and any other settings are listed and explained below.

Command Line: DiskSpd.exe -c150G -d300 -r -w40 -t8 -o32 -b64K -Sh -L D:\SpeedTest\testfile.dat

Input parameters:

    timespan:   1
    -------------
    duration: 300s
    warm up time: 5s
    cool down time: 0s
    measuring latency
    random seed: 0
    path: 'D:\SpeedTest\testfile.dat'
        think time: 0ms
        burst size: 0
        software cache disabled
        hardware write cache disabled, writethrough on
        performing mix test (read/write ratio: 60/40)
        block size: 65536
        using random I/O (alignment: 65536)
        number of outstanding I/O operations: 32
        thread stride size: 0
        threads per file: 8
        using I/O Completion Ports
        IO priority: normal

In next section, we have the time of the test run, number of threads, and number of available processors. Additionally, we have information about the average utilization of every processor during the test run.

actual test time:	300.01s
thread count:		8
proc count:		8

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|   1.01%|   0.13%|    0.87%|  99.00%
   1|   3.04%|   0.14%|    2.90%|  96.96%
   2|   0.71%|   0.14%|    0.57%|  99.29%
   3|   0.74%|   0.13%|    0.61%|  99.26%
   4|   0.62%|   0.10%|    0.53%|  99.38%
   5|   0.90%|   0.09%|    0.81%|  99.10%
   6|   0.66%|   0.06%|    0.60%|  99.34%
   7|   0.62%|   0.05%|    0.57%|  99.38%
-------------------------------------------
avg.|   1.04%|   0.11%|    0.93%|  98.96%

As you can see there was almost no load.

The Total IO section provides statistics (Read+Write) per thread. The last row provides Total values for the whole test run.

Total IO
thread |  bytes     |  I/Os  |  MB/s  | I/O per s | AvgLat  | LatStdDev |  file
------------------------------------------------------------------------------------
     0 | 4326424576 |  66016 |  13.75 |    220.04 | 145.375 |    52.576 | testfile.dat
     1 | 4338679808 |  66203 |  13.79 |    220.67 | 144.962 |    51.947 | testfile.dat
     2 | 4328062976 |  66041 |  13.76 |    220.13 | 145.323 |    52.482 | testfile.dat
     3 | 4328128512 |  66042 |  13.76 |    220.13 | 145.308 |    52.563 | testfile.dat
     4 | 4336517120 |  66170 |  13.78 |    220.56 | 145.029 |    52.215 | testfile.dat
     5 | 4334288896 |  66136 |  13.78 |    220.44 | 145.107 |    52.244 | testfile.dat
     6 | 4328718336 |  66051 |  13.76 |    220.16 | 145.289 |    53.204 | testfile.dat
     7 | 4339859456 |  66221 |  13.80 |    220.73 | 144.928 |    52.339 | testfile.dat
------------------------------------------------------------------------------------
total:  34660679680 | 528880 | 110.18 |   1762.85 | 145.165 |    52.447

In this case, test generated 528880 IO operations, the average throughput was 110.18 MB/s (1762.85 IOPS) and the average latency was 145 ms. In short: Terrible!

Next two sections, Read IO and Write IO provides the same statistics but individually for the Read and Write operations.

The last section provides a detailed information about a latency. It is a summary table of per percentile latencies per operation type.

  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |     22.132 |     22.095 |     22.095
   25th |    111.940 |    129.764 |    116.858
   50th |    132.568 |    151.406 |    141.026
   75th |    163.575 |    172.090 |    168.590
   90th |    210.856 |    191.478 |    199.258
   95th |    251.512 |    203.980 |    226.995
   99th |    359.454 |    228.761 |    322.688
3-nines |    541.528 |    270.872 |    497.081
4-nines |    727.123 |    317.519 |    695.128
5-nines |    811.054 |    475.358 |    807.394
6-nines |    828.036 |    648.498 |    828.036
7-nines |    828.036 |    648.498 |    828.036
8-nines |    828.036 |    648.498 |    828.036
9-nines |    828.036 |    648.498 |    828.036
    max |    828.036 |    648.498 |    828.036

The ‘nines’ refer to the number of nines: 3-nines is the 99.9th percentile, and so forth.

This is how PerfMon counters looked during this test run.

Diskspd - PerfMon

Testing scenarios

Of course, one single test run is not enough to thoroughly test IO subsystem. Especially in a situation when you have different drives to handle different workload types (or if you work with OLAP databases). One of the most common best practice is to store database data files and database log file separately. Such separation requires a different kind of testing for each drive.

Fortunately, Diskspd documentation provides a lot of helpful examples:

Data file patterns

100% 8KiB/64KiB Random reads – Large area of random concurrent reads of 8KB/64KB blocks. Disable both software caching and hardware write caching.

DiskSpd.exe -c1000G –d300 -r -w0 -t8 –o32 -b8K -h -L F:\testfile.dat

DiskSpd.exe -c1000G –d300 -r -w0 -t8 –o32 -b64K -h -L F:\testfile.dat

100% 8KiB/64KiB Random writes – Large area of random concurrent writes of 8KB/64KB blocks. Disable both software caching and hardware write caching.

DiskSpd.exe -c1000G –d300 -r –w100 -t8 –o32 -b8K -h -L F:\testfile.dat

DiskSpd.exe -c1000G –d300 -r –w100 -t8 –o32 -b64K -h -L F:\testfile.dat

60% 8KiB/64KiB Random READs, 40% 8KiB/64KiB Random writes – Large area of random concurrent 60% reads and 40% writes of 8KB/64KB blocks. Disable both software caching and hardware write caching.

DiskSpd.exe -c1000G –d300 -r –w40 -t8 –o32 -b8K -h -L F:\testfile.dat

DiskSpd.exe -c1000G –d300 -r –w40 -t8 –o32 -b64K -h -L F:\testfile.dat
Log file patterns

100% 64KiB reads – Large area of sequential concurrent reads of 64KB blocks. 8 outstanding IOs. Disable both software caching and hardware write caching.

DiskSpd.exe -c1000G –w0 -b64K –F4 -T1b -s8b -o8 –d300 -h F:\testfile.dat

100% 64KiB writes – Large area of sequential concurrent writes of 64KB blocks. 116 Outstanding IOs. Disable both software caching and hardware write caching.

DiskSpd.exe -c1000G –w100 -b64K –F4 -T1b -s8b –o116 –d300 -h F:\testfile.dat

40% 64KiB reads, 60% 64KiB writes – Large area of sequential concurrent 40% Reads, 60% writes of 64KB blocks. 8 Outstanding IOs. Disable both software caching and hardware write caching.

DiskSpd.exe -c1000G –w60 -b64K –F4 -T1b -s8b –o8 –d300 -h F:\testfile.dat
The results of my testing

If you are curious, I will tell you. I executed six test runs. I tested 8KB and 64KB block size operations against disk formatted with 4KB, 8KB, and 64KB per cluster. Each one with 60% reads / 40% writes ratio. Here you can find the average latency I got from my measurements.

Latency in ms
Test Read Write Bytes per Cluster
8k_1 248.303 397.233 4k
8k_2 137.661 167.228 8k
8k_3 172.608 214.828 64k
64k_1 164.921 215.299 4k
64k_2 144.242 146.55 8k
64K_3 161.367 155.142 64k

If you don’t know yet how to interpret these results, I highly recommend you to read this post from Paul Randal (b|t). Here I will just paste a short cheat sheet:

  • Excellent: < 1ms
  • Very good: < 5ms
  • Good: 5 – 10ms
  • Poor: 10 – 20ms
  • Bad: 20 – 100ms
  • Shockingly bad: 100 – 500ms
  • WOW!: > 500ms

Yeah… Shockingly bad… This disk is trash. I need to write one more email to the storage team.

-Marek

Share it:
Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

One thought on “How to use Diskspd to check IO subsystem performance”

Leave a Reply

Your email address will not be published. Required fields are marked *