Raidz with 4k blocks

Status
Not open for further replies.

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
I'm finding that raidz pools are very inefficient with 4k blocks. Essentially I'm losing 30%-40%, beyond regular parity, drive space because of 4k blocks. In doing some zfs research, this seems parity related (1 4k block for data and 2 4k blocks for parity).

One person recommended formatting with 8k blocks. Another person told me to just not use raidz on HDDs with 4k blocks (which is just about every new hard drive 2TB and bigger).

I'm looking for some zfs guru recommendations, because giving up 30%-40% on a raidz volume seems pretty bad.

Thanks
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I have no clue how you even came up with that 30-40%, so I can't even tell you if that's normal or not for your pool configuration. I will tell you that going to bigger blocks isn't going to make it better unless you have a REALLY odd configuration.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
I've created a zpool with 6x1TB drives in raidz1. I then created a 100GB zvol on this zpool to be used as a container for a linux vm. I then installed linux (centos 6) on this zvol and formatted it ext4. Then, I issued a dd if=/dev/zero of=test.img bs=1M 15000 to fill up the ext4 file system. After a few minutes, my linux ext4 file system was full (df -h / = 100%). I then went to the freenas console and did a zfs list and the zvol showed as 153GB used space. So, it grew by 53%.

I did some google searching and it sounds like this is a by product of parity on a raidz pool with drives having 4k sectors.

Greg
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
there's a bug filed against that but i don't recall the number or resolution...
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
Do you know if you lose a lot of performance running a 4k drive in 512e mode, assuming the drive supports 512e?

Sent from my XT1060 using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you have 4k drives you should be using a pool with an ashift=12. As a general rule I recommend everyone do ashift=12 because all future drives are going to be 4k and you can't change the ashift in a vdev once its created. So if you make it non-4k friendly now the only solution is to destroy and recreate your pool(which is something most people can't do later due to limited case space, SATA ports, etc).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Ok so I updated 2383. Long story short, use volblocksize 32K (9/8ths space penalty is probably not unreasonable), or avoid RAIDZ, or determine your tolerance for space/performance ratio per 2383... you WILL pay a penalty of some sort when using zvols.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
Yup, that does the trick jgreco. Here's what I found: 3x2TB in raidz. Disks are WD 2TB Red Drives:

zvol with 4k block: 62% waste,
zvol with 8k block: 53% waste,
zvol with 16k block: 0.2% waste,
zvol with 32k block: 2% waste.

Looks like 16k blocks are the sweet spot for raidz. If I get time, I'll try raidz2.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
Just a quick follow up. I tested with 10G zvols and then just filled up the volume with dd (didn't want to wait forever to fill up a larger volume). Waste will likely be higher if you are writing a bunch of small files.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Huuuuuuh. You're defining waste to be ... ?

I'm probably addled but for 4K volblocksize the overhead parity block required means a 100% increase in space required (4K block -> 8K in pool)

For a 16K volblocksize you're only going up to 20K space required in pool.

Too tired to figure out your definition.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
Here's what I did:

- Created a 10G zvol,
- Mounted the zvol on another box and then put ext4 on top of the mounted zvol,
- Wrote data to the mounted volume until it filled up (10G),
- Did a zfs list on FreeNAS and looked at what USED reported. Anything over 10G, I considered waste.

Non-scientific I know, but it seems to be pretty consistent and reproducable

Greg
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
I like simple ... one of the things I had (have) trouble with is that it seems to me that a 100G zvol on RAIDZ3 with 4K volblocksize ought to take 400G ... 4K block plus 3 x 4K parity = 16K, but it doesn't. So I already don't like this whole thing... but I'm so darn tired too which doesn't help.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
I don't completely understand what's going on either. I've read so many forum posts, blogs, and mail list archives; I'm starting to lose track of what I read where. I do seem to recall reading that smaller zvol block size with raidz on disks with 4k sectors has issue with sector alignment. There was a formula that was a function of disk sectors, raid level, and disk sector size; or some such (I don't remember exactly).

It it interesting because I was closing in on what you found just this afternoon, but you beat me to the punch. I was in the middle of testing a few different scenarios when I saw your post.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
Just a follow-up. Jgreco is correct with the 32k block for zvols and drives with 4k sectors. A 16k block worked better for smaller zvols, but once I pushed 4TB of data to the volume, I had a problem with the volume growing due to parity again. I switched to a 32k block and all is well for the 4TB volume.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
The downside to that being that you're going to be suffering a lot of performance loss over the need to update a 32KB block every time you update a 512B sector.
 
Status
Not open for further replies.
Top