ZFS - File splitting over multiple drives?

Status
Not open for further replies.

74m

Explorer
Joined
Jul 13, 2013
Messages
66
Hi guys,

this is probably more a general question about zfs, than about freenas itself. But i hope you guys could help me with that.

If i throw a bunch of files (e.g. some very big and some very small ones) on my zfs raidz1 or raidz2. Will zfs splitting the files in equal parts across all my disks? Is this technically comparable with raid5/6?

My question behind this is: Is it (hypothetical) possible, to read out suitable data from a single disk from a raidz compound?

Thank you in advance.

Greetings
74m
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Hi,

I found this article: http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance

When writing to a RAID-Z vdev, ZFS may choose to use less than the maximum number of data disks. For example, you may be using a 3+2 (5 disks) RAID-Z2 vdev, but ZFS may choose to write a block as 2+2 because it fits better.
When writing to RAID-Z vdevs, each filesystem block is split up into its own stripe across (potentially) all devices of the RAID-Z vdev.

Don't know much more about ZFS interna, maybe somebody else can help. I would assume that it is not possible to recover a file from just one member of the RAIDZ.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
That pretty much sums it up. It splits it up somewhat equally, within some limits. Block sizes are not static like they are with standard RAID, varying in size from 4k to 128k(if I remember correctly) and your file is broken down appropriately.

I have no idea how to answer your last question though. "My question behind this is: Is it (hypothetical) possible, to read out suitable data from a single disk from a raidz compound?" doesn't really make sense to me.
 

74m

Explorer
Joined
Jul 13, 2013
Messages
66
At first: Thank you both!

cyberjock: Maybe the question is just stupid... or my english is to bad... ;)
Same thoughts, other words: Is it possible to reconstruct (small or big) files, with just one single drive - without the remaining zpool drives?

As far as i understand the whole thing, this should be impossible and only file-fragments should be readable.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Yeah, if you lose enough drives to have no redundancy + one more disk, you lose the data. There is no reconstructing anything. It's just gone.
 

74m

Explorer
Joined
Jul 13, 2013
Messages
66
Although i'm very surprised, that zfs will not always split a file into equal parts for every disk.
When writing to a RAID-Z vdev, ZFS may choose to use less than the maximum number of data disks. For example, you may be using a 3+2 (5 disks) RAID-Z2 vdev, but ZFS may choose to write a block as 2+2 because it fits better.
I've never expected this. But very interesting...
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
I've never expected this. But very interesting...

It's expected for variable block size.

The block size can vary from 512 bytes (or 4k with gnop'd vdevs?) to 128k. So take the example of a user writing a .txt file that contains the word "hello". Lets assume you have a 5 disk z1 pool (capacity of 4). This file easily fits within a 512 byte (or 4k) block. Writing this small file only involves writing to 2 different disks. So for example, disk 1 and disk 2 contain a copy of the file (block actually). There's no reason to write it to all 5 disks. As long as it's on two different disks, you're protected against single (z1) disk failure.

This is a huge advantage over hardware raid, because zfs knows about the file system. Hardware raid doesn't. With a fixed stripe size on hardware raid, writing a small file would involve full fixed stripe (usually at least 128k) reads from all disks (5 in the example above). Modifying data in memory, and computing new parity. Then writing full stripes back to all disks. So instead of simply writing tiny amounts of data to 2 disks, you have to read and write larger amounts of data from all disks. One advantage of software raid where the raid layer also understands the filesystem layer.
 
Status
Not open for further replies.
Top