SRE/Dc-operations/Platform-specific documentation/Dell PowerEdge RN10/R510 Performance Testing
The Dell PowerEdge R510 hosts are used for the es1001-es1010 hosts in eqiad.
Hardware stats
12 disks hardware raid card
Performance statistics
These tests done with the disks configured:
- raid 10
- 256k stripe
- no read ahead
- writeback cache
Tests performed with sysbench 0.4.10-1build1, standard debian package (obtained by 'aptitude install sysbench')
cpu, memory, threads, mutex
root@es1003:/a/tmp/sysbench# for i in cpu memory threads mutex ; do sysbench --test=$i run; done sysbench 0.4.10: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing CPU performance benchmark Threads started! Done. Maximum prime number checked in CPU test: 10000 Test execution summary: total time: 10.0859s total number of events: 10000 total time taken by event execution: 10.0851 per-request statistics: min: 0.96ms avg: 1.01ms max: 1.91ms approx. 95 percentile: 1.76ms Threads fairness: events (avg/stddev): 10000.0000/0.00 execution time (avg/stddev): 10.0851/0.00 sysbench 0.4.10: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing memory operations speed test Memory block size: 1K Memory transfer size: 102400M Memory operations type: write Memory scope type: global Threads started! Done. Operations performed: 104857600 (2259393.46 ops/sec) 102400.00 MB transferred (2206.44 MB/sec) Test execution summary: total time: 46.4096s total number of events: 104857600 total time taken by event execution: 39.3877 per-request statistics: min: 0.00ms avg: 0.00ms max: 0.19ms approx. 95 percentile: 0.00ms Threads fairness: events (avg/stddev): 104857600.0000/0.00 execution time (avg/stddev): 39.3877/0.00 sysbench 0.4.10: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing thread subsystem performance test Thread yields per test: 1000 Locks used: 8 Threads started! Done. Test execution summary: total time: 2.0137s total number of events: 10000 total time taken by event execution: 2.0128 per-request statistics: min: 0.20ms avg: 0.20ms max: 0.33ms approx. 95 percentile: 0.20ms Threads fairness: events (avg/stddev): 10000.0000/0.00 execution time (avg/stddev): 2.0128/0.00 sysbench 0.4.10: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing mutex performance test Threads started! Done. Test execution summary: total time: 0.0021s total number of events: 1 total time taken by event execution: 0.0020 per-request statistics: min: 2.01ms avg: 2.01ms max: 2.01ms approx. 95 percentile: 10000000.00ms Threads fairness: events (avg/stddev): 1.0000/0.00 execution time (avg/stddev): 0.0020/0.00
disks
This machine is also what is currently (May 2014) running graphite, thus it makes sense to have a syntetic benchmark of the raid itself. Graphite (carbon-cache) results in many small writes, thus fio was run with a config like the one below. Note that 8 matches the number of carbon-cache that write to disk.
# cat carbon-cache.job [global] bs=4k rw=randwrite write_bw_log write_lat_log [carbon-cache] group_reporting=1 directory=/a/tmp/ numjobs=8 size=10g
And executed like this: fio --output carbon-cache.out --bandwidth-log --latency-log carbon-cache.job
# cat carbon-cache.out carbon-cache: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1 ... carbon-cache: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1 fio 1.59 Starting 8 processes carbon-cache: (groupid=0, jobs=8): err= 0: pid=13789 write: io=81920MB, bw=29500KB/s, iops=7375 , runt=2843564msec clat (usec): min=2 , max=132888 , avg=1065.97, stdev=4886.38 lat (usec): min=2 , max=132888 , avg=1066.14, stdev=4886.38 bw (KB/s) : min= 1, max=964232, per=12.68%, avg=3739.50, stdev=18964.43 cpu : usr=0.06%, sys=0.82%, ctx=1093250, majf=0, minf=247055 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w/d: total=0/20971520/0, short=0/0/0 lat (usec): 4=8.98%, 10=79.06%, 20=6.70%, 50=0.06%, 100=0.03% lat (usec): 250=0.01%, 500=0.01%, 750=0.01% lat (msec): 2=0.01%, 4=0.01%, 20=4.61%, 50=0.51%, 100=0.04% lat (msec): 250=0.01% Run status group 0 (all jobs): WRITE: io=81920MB, aggrb=29500KB/s, minb=30208KB/s, maxb=30208KB/s, mint=2843564msec, maxt=2843564msec Disk stats (read/write): dm-0: ios=1/13680669, merge=0/0, ticks=1768/415097120, in_queue=415108824, util=99.96%, aggrios=330/13701997, aggrmerge=10/329684, aggrticks=474592/413980824, aggrin_queue=414435312, aggrutil=99.96% sda: ios=330/13701997, merge=10/329684, ticks=474592/413980824, in_queue=414435312, util=99.96%
So with 4k random writes and 8 processes, using read/write it looks like there can be ~7.3k iops sustained from the raid controller.
TODO more comprehensive tests, with varying block sizes and sync/async io etc and performance comparation