Build advice for dealing with large image sequences + 10GbE

Status
Not open for further replies.

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
For ZFS, the likelihood that you have QD1 accesses is ... hahahahahahahahah

Sorry, you implied a funny. :smile:

I would definitely be concerned more about other aspects of the system. The CPU can be a limiting factor, in particular.

I follow you're concern about the other elemnts in the system.
.

QD1 random IO is the worst case for a SSD and inspired by the original post :

"I’d note that it will really only be myself and one other user hitting the NAS hard"

And the fact that it's about imaging software, which is often serializing disk access on purpose or using only a very moderate QD.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Mirrors do not always deliver the fastest performance. A vaguely wide RAIDZ2 will be faster than mirrors for large sequential writes. This would be the classic counterexample.
Yes, as noted by the author I quoted, and which he seems to believe he refuted, pointing out that this would only be true "if you're doing a single read or write of a lot of data at once while absolutely no other activity is going on, if the RAIDZ is completely unfragmented… but the moment you start throwing in other simultaneous reads or writes, fragmentation on the vdev, etc then you start looking for random access IOPS".

Assuming the OP's use case would involve single-user I/O with large blocks of data -- which seems reasonable for video data -- the first hypothetical requirement would be met, but that still leaves other issues that might slow down a RAIDZn pool, i.e., fragmentation and other simultaneous I/O processes. Which again implies that mirrors may very well be faster, in the 'Real World'.

Or, as noted below, I could simply be wrong... :smile:

Inquiring minds want to know!
That said, it has been my experience that mirrors deliver better performance than any RAIDz topology, and wide reading supports this view. For example, this informative article, with key quote:
It’s easy to think that a gigantic RAIDZ vdev would outperform a pool of mirror vdevs, for the same reason it’s got a greater storage efficiency. “Well when I read or write the data, it comes off of / goes onto more drives at once, so it’s got to be faster!” Sorry, doesn’t work that way. You might see results that look kinda like that if you’re doing a single read or write of a lot of data at once while absolutely no other activity is going on, if the RAIDZ is completely unfragmented… but the moment you start throwing in other simultaneous reads or writes, fragmentation on the vdev, etc then you start looking for random access IOPS. But don’t listen to me, listen to one of the core ZFS developers, Matthew Ahrens: “For best performance on random IOPS, use a small number of disks in each RAID-Z group. E.g, 3-wide RAIDZ1, 6-wide RAIDZ2, or 9-wide RAIDZ3 (all of which use ⅓ of total storage for parity, in the ideal case of using large blocks). This is because RAID-Z spreads each logical block across all the devices (similar to RAID-3, in contrast with RAID-4/5/6). For even better performance, consider using mirroring.
All of which implies that you may be right... but only briefly, when the the pool is new, unfragmented, and empty.

And of course, I could simply be wrong! :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Yes, as noted by the author I quoted, and which he seems to believe he refuted, pointing out that this would only be true "if you're doing a single read or write of a lot of data at once while absolutely no other activity is going on, if the RAIDZ is completely unfragmented… but the moment you start throwing in other simultaneous reads or writes, fragmentation on the vdev, etc then you start looking for random access IOPS".

Assuming the OP's use case would involve single-user I/O with large blocks of data -- which seems reasonable for video data -- the first hypothetical requirement would be met, but that still leaves other issues that might slow down a RAIDZn pool, i.e., fragmentation and other simultaneous I/O processes. Which again implies that mirrors may very well be faster, in the 'Real World'.

Or, as noted below, I could simply be wrong... :)

Inquiring minds want to know!

Fragmentation's more of an issue for HDD. It's not something that you can totally ignore with SSD, but the big thing with fragmentation on HDD is that when you start to have to seek, your throughput plummets. On SSD it also falls, but a lot less.

If someone came to me having run some comparison tests and said "RAIDZ2's totally just as fast for this use model" I'd believe it. What *I* think's likely to happen in RAIDZ2 is that as long as there isn't excessive fragmentation causing the system to work real hard for allocations, there'll be some significant degradation of SSD performance over what you'd get for clean sequential accesses, but if you've got a RAIDZ2 with 6 SSD's and they're still able to manage to pump out 200MB/sec each, you're approaching 10G speeds, and I think you're likely to experience issues with CPU and protocol overhead first.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Fragmentation's more of an issue for HDD. It's not something that you can totally ignore with SSD, but the big thing with fragmentation on HDD is that when you start to have to seek, your throughput plummets. On SSD it also falls, but a lot less.

If someone came to me having run some comparison tests and said "RAIDZ2's totally just as fast for this use model" I'd believe it. What *I* think's likely to happen in RAIDZ2 is that as long as there isn't excessive fragmentation causing the system to work real hard for allocations, there'll be some significant degradation of SSD performance over what you'd get for clean sequential accesses, but if you've got a RAIDZ2 with 6 SSD's and they're still able to manage to pump out 200MB/sec each, you're approaching 10G speeds, and I think you're likely to experience issues with CPU and protocol overhead first.
Good points, and I'd sorta lost track of the fact that the OP plans on using SSDs... Dooooh!

Somewhere above I recommended a system with two pools: one based on SSDs for fast 'working' storage and another based on HDDs for archival storage.

I would definitely recommend RAIDZ2 for the archival storage. And based on intervening discussion, it doesn't seem to really matter whether the SSD pool is comprised of mirrors or a RAIDZ2 topology.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Oh absolutely correct for archival... HDD and RAIDZ2 all the way. The more free space you leave the faster it'll fly.
 
Joined
Jun 3, 2016
Messages
7
700Mbps? 700 megabits per second?

Sorry, MB/s. Thanks for the CPU advice. I had been looking at just going for an i3 with a nice high clock speed but the more I've read about ZFS the more it seems like I should go Xeon and put 64GB+ of RAM in there for caching etc.

I've been doing a bit of reading about ZIL an L2ARC and my impression is that I wouldn't need them in an all-SSD system with loads of RAM.... what are peoples thoughts on this?

I think I'll end up with 6 X 1TB Evo 850s (@Robert Trevellyan I'm also a fan) in RAIDZ2 for my working drive and then (in a separate pool) 4 X 5TB HGST NAS disks in RAIDz2 for archiving. Everything important gets synced to both France and the US via Bittorrent Sync and I run a local mirror every 4 hours, which will mean RAIDZ2 should be more than adequate.

Thanks everyone for your input here, I'm getting some great info.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
L2ARC won't do you any good for your intended use case. For the main SSD pool, it'd only slow things down. For the archival pool, you'd rarely touch it. Might not be true if you were doing something like dedup, but that's just not highly recommended for your intended use.

SLOG (not ZIL) is also probably not necessary. For the types of things you're doing, IF the NAS were to crash catastrophically while you were doing some major data operation, you'd probably already be prepared to examine what had happened and maybe restart part of your process.
 
Joined
Jun 3, 2016
Messages
7
L2ARC won't do you any good for your intended use case. For the main SSD pool, it'd only slow things down. For the archival pool, you'd rarely touch it. Might not be true if you were doing something like dedup, but that's just not highly recommended for your intended use.

SLOG (not ZIL) is also probably not necessary. For the types of things you're doing, IF the NAS were to crash catastrophically while you were doing some major data operation, you'd probably already be prepared to examine what had happened and maybe restart part of your process.

Great news, thanks. That will simplify things a lot.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
It's a possibility. Part of this involves needing to know just how much trouble there would be if there was a catastrophe and two drives in a two-way mirror vdev failed, taking down the pool. This is an *unlikely* scenario to begin with, it's just something that needs to be considered.

The other thing is that if this is long runs of sequential data (image files), it's totally possible that mirrors are overkill and RAIDZ2 could be just dandy. In that case, RAIDZ2 or even RAIDZ3 would be a better choice.

We have long runs of sequential data.. depending on application, the file sizes are in the 1GB, 10GB or 60GB range.
z2/z3 is a better choice for us. 4ish years of operation on a fair number of big zpools with no problems.
We also have no problems consuming 4-5Gb/s on a 10G link during reads or writes..
 
Last edited:

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Sorry, MB/s. Thanks for the CPU advice. I had been looking at just going for an i3 with a nice high clock speed but the more I've read about ZFS the more it seems like I should go Xeon and put 64GB+ of RAM in there for caching etc.

I've been doing a bit of reading about ZIL an L2ARC and my impression is that I wouldn't need them in an all-SSD system with loads of RAM.... what are peoples thoughts on this?

I think I'll end up with 6 X 1TB Evo 850s (@Robert Trevellyan I'm also a fan) in RAIDZ2 for my working drive and then (in a separate pool) 4 X 5TB HGST NAS disks in RAIDz2 for archiving. Everything important gets synced to both France and the US via Bittorrent Sync and I run a local mirror every 4 hours, which will mean RAIDZ2 should be more than adequate.

Thanks everyone for your input here, I'm getting some great info.

I would suggest to use a 6 drive setup also for archiving. The 6 drive will not only be faster, but have more % storage usable. If I remember correctly there's a nice trusted supermicro MB that can be equiped with 14 sata ports. Although I would split working and -slower archiving to 2 seperate boxes. The archiving box can be rather modest and for example only 1 GbE. I'm rather paranoid on data loss and the day you're have a problem with you're working box, you'll still can get to you're archived copy... are even move the SSD's to the other box.
 
Joined
Jun 3, 2016
Messages
7
I would suggest to use a 6 drive setup also for archiving. The 6 drive will not only be faster, but have more % storage usable. If I remember correctly there's a nice trusted supermicro MB that can be equiped with 14 sata ports. Although I would split working and -slower archiving to 2 seperate boxes. The archiving box can be rather modest and for example only 1 GbE. I'm rather paranoid on data loss and the day you're have a problem with you're working box, you'll still can get to you're archived copy... are even move the SSD's to the other box.

That's a very sensible piece of advice! Thanks.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
QD1 random IO is the worst case for a SSD and inspired by the original post :

"I’d note that it will really only be myself and one other user hitting the NAS hard"

And the fact that it's about imaging software, which is often serializing disk access on purpose or using only a very moderate QD.

I apparently missed answering this. The point is that ZFS doesn't do I/O that way, it builds up a transaction group and sends it all out to the disk at once, and typically reads large chunks too, not just single sectors. Even when the client may be doing things that you think of as QD1'ish from a typical storage perspective, ZFS isn't likely to be doing that.
 
Status
Not open for further replies.
Top