Best practice

Status
Not open for further replies.

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
  1. It's distributed over several drives
  2. NAND flash reliability has proven itself to be better than estimated
  3. No modern controllers rely on minimizing with compression for performance and reliability
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Joined
Oct 8, 2016
Messages
48
  1. It's distributed over several drives
  2. NAND flash reliability has proven itself to be better than estimated
  3. No modern controllers rely on minimizing with compression for performance and reliability

1: false: with mirrors, all disks are getting the same writes. With parities raid, all disks get all the writes to write each stripe. Thus, with mirrors or parity raids, you are writing the same amount of data, at the same time, in the same way, on each SSD. There is a HUGE probability that multiple failures could happen in the same time (or in a very very short of time)
2: false: no vendor is able to guarantee the reliability for each SSD. Look at Intel DC3700 specs [1]. Intel says 10 write drives per day, for 5 years. Do you think that Intel has made a 5-years long test by writing 24/7 on an SSD? Absolutely not, when the test finished, that kind of SSD is already old.


[1] http://download.intel.com/newsroom/kits/ssd/pdfs/Intel_SSD_DC_S3700_Product_Specification.pdf
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
With parity RAID, all disks get the write, but only 1/(n-z) of the overall amount of data written. You'll observe that drives are rated based on data written... not the number of writes. So, if you're writing 6GB of data to a RAID-Z, each drive only sees 1/(6-2)=1.25GB of data written.

SSDs also typically fail in a predictable way, from a media wearout perspective. They exhibit a slowly increasing relocation count as the weakest cells wear out. The good controllers typically render the drive read-only when the wear reaches a certain level.

Let's assume a 6-drive RAIDZ2 array of 800GB S3700s. Each drive is rated to 10 DWPD, or 8TB/day. You would be able to completely fill the array 10 times in a day for 5 years without going outside the rated drive expectancy... and there's no guarantee that the drive will up and die as soon as one byte over that limit is written. In certain big data workloads, perhaps that might be a limiting factor... but it's unlikely that's your use case.

Finally, RAID isn't backup. If you're trying to build massive redundancy in and don't have an offline, offsite backup, you're doing it wrong.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Intel says 10 write drives per day, for 5 years. Do you think that Intel has made a 5-years long test by writing 24/7 on an SSD?
Of course not. Testing would be done by writing at a much higher rate to a small sample of NAND, then doing arithmetic on the results. But I doubt the real tests are anywhere near as simplistic as that.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Even at an atrociously slow 400MB/s, a (large) 2TB drive would do upwards of 17 DWPD, so endurance testing is easy to accelerate. The limiting factor is going to be the data retention part of the test, since that one would require simulated accelerated aging.
 
Joined
Oct 8, 2016
Messages
48
With parity RAID, all disks get the write, but only 1/(n-z) of the overall amount of data written. You'll observe that drives are rated based on data written... not the number of writes. So, if you're writing 6GB of data to a RAID-Z, each drive only sees 1/(6-2)=1.25GB of data written.

So, in other words, a raidz (1 or 2) would improve SSD reliability because writes are distributed across multiple disks
on the other hand, a simple mirror could be risky because all disks would get the same writes pattern and thus could fail at the same time.

SSDs also typically fail in a predictable way, from a media wearout perspective. They exhibit a slowly increasing relocation count as the weakest cells wear out. The good controllers typically render the drive read-only when the wear reaches a certain level.

I've always heard the opposite: ssds tends to fail in a catastrophic way and most of time without any "warning" signs

Let's assume a 6-drive RAIDZ2 array of 800GB S3700s. Each drive is rated to 10 DWPD, or 8TB/day. You would be able to completely fill the array 10 times in a day for 5 years without going outside the rated drive expectancy... and there's no guarantee that the drive will up and die as soon as one byte over that limit is written. In certain big data workloads, perhaps that might be a limiting factor... but it's unlikely that's your use case.

But there is also no guarantee that the drive would be able to reach that value of DWPD.

Finally, RAID isn't backup. If you're trying to build massive redundancy in and don't have an offline, offsite backup, you're doing it wrong.

I know, but i have to host VM images.as you can imagine, saying to customers that they have lost a day full of data due to a wrong raid configuration is not good for any business

I'm trying to get the most reliable configuration as possible, and after that I'll also have backups
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
So, in other words, a raidz (1 or 2) would improve SSD reliability because writes are distributed across multiple disks
on the other hand, a simple mirror could be risky because all disks would get the same writes pattern and thus could fail at the same time.
No, because when you write a specific sector, you aren't always writing the same cell. The SSD's firmware has quite a bit of magic in it to do wear leveling in real time. Thus, it really doesn't matter what the write pattern is... it's getting evenly distributed across the memory cells.

I've always heard the opposite: ssds tends to fail in a catastrophic way and most of time without any "warning" signs
I'm referring specifically to media wear-out. If the controller takes a dump, you're hosed... whether SSD or spinning rust.



But there is also no guarantee that the drive would be able to reach that value of DWPD.
I doubt Intel just pulls the DWPD number out of their collective arse. There's a lot of science and testing that goes into that number... and the reality is, the majority of the drives will probably last FAR longer than the minimum rating. Keep in mind the Tom's Hardware SSD endurance testing... 2.1PB before the drive failed. That's about 4.7DWPD... for a consumer drive that's not even officially rated.



I know, but i have to host VM images.as you can imagine, saying to customers that they have lost a day full of data due to a wrong raid configuration is not good for any business

I'm trying to get the most reliable configuration as possible, and after that I'll also have backups
You have to decide what "good enough" is. As we all know, each "9" is a huge increase in cost. If you're going for 4, 5, 6 nines uptime... you'd better have fully redundant chassis, redundant datacenters, etc.
 
Joined
Oct 8, 2016
Messages
48
Ok so as i have a ten slot chassis, if i'll go with a 5 disks raidz2 and, when needed i'll add another raidz2 , should be ok and at the sake time guarantee faster resilvering

With zfs is not possible to grow a raidz2 by adding single disks like with mdadm or any hardware controller so i have to use multiple raidz2 to keep costs low (starting immediatly with an 8 or 10 ssds raidz2 would be too expensive)
 
Joined
Feb 2, 2016
Messages
574
Any suggestion to archieve Max reliability?

Let's not even consider 'max reliability'. That's a fool's errand.

1. How much disk space is required?
2. What level of performance is required?
3. What are you promising your clients in terms of reliability?

We went from 10K SAS drives to cheap, consumer-grade SSDs for our XenServer VM storage. Four SSDs, mirrored stripe. Huge performance increase. FreeNAS snapshots and replicates the VMs throughout the day. If we lost all four primary drives, we could be back up and running on the replicated copies in under an hour. Probably a lot less but an hour is what we promise. We have a cold spare so a single drive failure is a non-event. We will replace the drives long before they wear out.

I'm not afraid of mirrors in a well-monitored environment with reliable backups. You could go triple but I wouldn't.

Cheers,
Matt
 
Joined
Oct 8, 2016
Messages
48
I really hate 2way mirrors
If you have to replace a drive, another failed drive will bring you down.

Performance side, an SSD raid6 should not be too bad as i'm moving from an SAS raid6. Current performance (6xSAS raid6) are good so anything better than this would be ok and even an SSD raid7 would be way faster than this

So performance are not an issue but reliability is and a raid6 is much more reliable than a mirror

Probably a 5disk raid6 plus hotspare is the way to go.
 
Status
Not open for further replies.
Top