matthewowen01
Guru
- Joined
- May 27, 2011
- Messages
- 566
Hello everybody, I'm seeing a growing collection of posts about performance and wanted to make a few general statements about how to get Accurate results.
dd is often used to read and write to a block device and will get you results. Only when used properly will you get accurate results.
what is the basic structure of the dd command
if: is the input file, dd copies data from the input file
of: is the output file
bs: the size of each read or write.
count: the number of times to read or write.
Here are some Bad examples for Writing.
Why is this bad? the block size is way to small. normal hard disks address storage in 512 byte chunks, if we read or write in 24 byte chunks we are instead of writing a complete chunk. we write 24 bytes and leave, then we read the 512 byte chunk, and write the next 24 bytes after the first 24 bytes to another spot on the disk and continues.
So what do we do? We need to increase the block size. we could figure out exactly what the minimum block size would be for the pool, but that's a waste of time. we just need to pick a sufficiently large number such that nearly every write will be full. there may be one write per many thousands of writes that is not full, but that will result in something not statistical measurable.
Why is this bad? /dev/random is very taxing for the cpu, and this impacts performance significantly. the results will be very low as more time is spent generating the data than writing it.
So what do we do? use /dev/zero, it has extremely low overhead.
Why is this bad? this is way harder to see. 512 * 2048k is about 1 GB. sounds big right? nope. ZFS is magic, and you've got nearly all of your memory used for cache. if you have 8 GB of ram like most of us, that 1 GB is easily crammed in memory with out even touching the disk. you'll get very inacurate write results.
So what can we do? increase it to such a level that the cache will not be a large factor to the benchmark. 50k is a good number.
so where does that leave us?
this will write 100 GB to the current directory and then report the speed (eventually)
The same is mostly true for reading, you need to read a sufficiently large file in large chunks. we can modify the command to read the file we just created and dump it in /dev/null
I really hope this helps you guys, if you have questions please let me know.
if you have compression enabled, writing all these zero's will give you bad results, i'll cover how to get around this if people need it.
dd is often used to read and write to a block device and will get you results. Only when used properly will you get accurate results.
what is the basic structure of the dd command
Code:
dd if= of= bs= count=
if: is the input file, dd copies data from the input file
of: is the output file
bs: the size of each read or write.
count: the number of times to read or write.
Here are some Bad examples for Writing.
Code:
dd if=/dev/zero of=tmp.dat bs=24 count=50k
Why is this bad? the block size is way to small. normal hard disks address storage in 512 byte chunks, if we read or write in 24 byte chunks we are instead of writing a complete chunk. we write 24 bytes and leave, then we read the 512 byte chunk, and write the next 24 bytes after the first 24 bytes to another spot on the disk and continues.
So what do we do? We need to increase the block size. we could figure out exactly what the minimum block size would be for the pool, but that's a waste of time. we just need to pick a sufficiently large number such that nearly every write will be full. there may be one write per many thousands of writes that is not full, but that will result in something not statistical measurable.
Code:
dd if=/dev/random of=tmp.dat bs=2048k count=50k
Why is this bad? /dev/random is very taxing for the cpu, and this impacts performance significantly. the results will be very low as more time is spent generating the data than writing it.
So what do we do? use /dev/zero, it has extremely low overhead.
Code:
dd if=/dev/zero of=tmp.dat bs=2048k count=512
Why is this bad? this is way harder to see. 512 * 2048k is about 1 GB. sounds big right? nope. ZFS is magic, and you've got nearly all of your memory used for cache. if you have 8 GB of ram like most of us, that 1 GB is easily crammed in memory with out even touching the disk. you'll get very inacurate write results.
So what can we do? increase it to such a level that the cache will not be a large factor to the benchmark. 50k is a good number.
so where does that leave us?
Code:
dd if=/dev/zero of=tmp.dat bs=2048k count=50k
this will write 100 GB to the current directory and then report the speed (eventually)
The same is mostly true for reading, you need to read a sufficiently large file in large chunks. we can modify the command to read the file we just created and dump it in /dev/null
Code:
dd if=tmp.dat of=/dev/null bs=2048k count=50k
I really hope this helps you guys, if you have questions please let me know.
if you have compression enabled, writing all these zero's will give you bad results, i'll cover how to get around this if people need it.