FreeNas setup and my first hard drive failiure - discussion

Status
Not open for further replies.

Francis D

Cadet
Joined
Mar 7, 2016
Messages
6
Hi guys, I manage our entreprise freenas server. I just want to share my recent experience. We have only +-900gb of data and we are from 4 to 6 to work on documents, drawings, etc... So this is not a really demanding environment. My only preocupation is data integrity.

Our setup is 4 drives, in a raid 1 + raid 0 with a hot spare. In my head, I was bullet proof with this setup. I tought that I was able to loose 3 drives. After only 4500 hours, one of my hard drive started to have problems. 2 times, freenas crashed. So after the second reboot, the da4 was considered faulty. So I had to manually tell freenas to replace the drive by the hot spare.

Guess what, during resilver, the number of checksum errors on the da3 began to rise to over 1000. For those that might interest, I have not lost anything and I had a backup anyways.

Conclusion ;
-Raid 0 + raid 1 with 4 drives is not as good as it seems to be. If 2 mirror drives fail, it's useless to have 2 copies of only one half of your data. Your are screwed.
-In your setup, plan for a multiple hard drive failure at the same time. It's probable, if drives are mirror. They are doing the exact same thing, they have been produced the same day, put in service at the same time, then they can have the same defect and they can fail at the same time.

Other consideration ; I was suprised that the failing drive was participating to the resilver of the hot spare. 40% of the data (80mb/sec) was comming from the failing one and 60% ( 120mb/sec ) from the remaining good drive. This bring me this point ; if you can, use machine with more bays than you need. That way, you will be able to use hot spare to replace drives. That way, you reduce pressure you put on your remaining good drives. As my experience shows, your remaining drive could be near the end also, so reducing pressure on it might be a good idea. Finally I will replace the da3, so my both mirrors were about to fail.

Next, I plan to replace my raid 0 + 1 by a Raid Z-2 five drives + 1 hot spare. Any better idea ?

Francis
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
If you have the room for a hot spare, and you only have one pool with 1 vdev, I would suggest including that drive in the pool and increasing the protection level (eg 6 drive RAIDZ3).
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Could have also done a 3 drive mirror. Your workflow sounds like z2 or z3 is all you need though.
 

fta

Contributor
Joined
Apr 6, 2015
Messages
148
Am I the only one who thinks it's obvious he was running 2 mirror vdevs?

My guess is three things: 1) You weren't running regular scrubs (weekly), and 2) you weren't reading most of the data on disk regularly, and 3) you weren't running regular smart tests. If you had been doing all three, my guess is you wouldn't have suddenly had the second disk throwing 1000+ checksum errors. You would have caught it failing much sooner.
 

Francis D

Cadet
Joined
Mar 7, 2016
Messages
6
Search raid 10 in google image and this was what I had + 1 hot spare. So I had 2 vdev with 2 mirror hard drive in each.

fta : 1) Maybe there was too much time between my scrubs 2) 90% of our data may be unused for months 3) nothing on the smart tests. In fact, my failed drive was fine on smart test.

Finally, I changed everything for a 6 hard drive in RaidZ3. Now I have some performance issue. My freenas server is running on ESXi with only 2 cores of my xeon E5560. With my previous setup, I was able to reach 90mb/sec. Now with the RaidZ3, with 4 cores of the E5560, I can only get 50% of the performance that I had. CPU is always in the 80-90% area when I copy large files, and when I copy multiple small files, this is a real pain, speed can fall to 5-10mb/sec.

After all this experience, I think the best option could have been, as SweetAndLow said, 3 or 4 3tb hard drive in mirror mode. I think this could lead to better performance with my poor E5560.

Does anybody knows if I upgrade my CPUs to a X5660, this will solve my performance issue ?

My machine a Dell precision T7500 with 2 cpu and 24gig ECC ram.
 

fta

Contributor
Joined
Apr 6, 2015
Messages
148
fta : 1) Maybe there was too much time between my scrubs 2) 90% of our data may be unused for months 3) nothing on the smart tests. In fact, my failed drive was fine on smart test.

Drives are lazy. If you're not reading data at rest regularly, a drive has no idea a sector is going bad and that it needs to relocate the data to a spare sector before it's too late. Subsequently, you won't know until it's too late. If I were running mirrors, I'd be scrubbing once a week.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You are running freenas as a virtual machine? Are you passing your disks through to the OS correctly or are you using vmdk disks?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215

Francis D

Cadet
Joined
Mar 7, 2016
Messages
6
I have a dedicated LSI 9211-8i card for freenas. So ESX have no control over this card, it only redirect it to my freenas VM.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
I have a dedicated LSI 9211-8i card for freenas. So ESX have no control over this card, it only redirect it to my freenas VM.
That is good to hear.
Next, I plan to replace my raid 0 + 1 by a Raid Z-2 five drives + 1 hot spare. Any better idea ?
I would venture to suggest a RaidZ2 x 6 Drive vDev. Personally, the only reason to have a Hot Spare would be if the Server was located somewhere that was not easily accessible (Remote Office, Locked Server Room, Co-Located, etc.). As long as you have Smart Tests (Short and Long) as well as Scrubs routinely scheduled and are getting E-Mail Notifications you should be fine.

To me having a Hot Spare on a system I could get access to easily is basically a waste of a drive. I would rather be putting it to use and gaining the extra space. ;) /Not talking about RaidZ1 or Mirrored vDevs though...

P.S. I hope you have a UPS attached and properly configured ESXi to shutdown the FreeNas VM as well.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Finally, I changed everything for a 6 hard drive in RaidZ3. Now I have some performance issue.
Oh and on this, yes RaidZ3 is going to be slower. Since you are running it as a VM, here are a couple suggestions:
  1. Make sure in ESXi that the VM's CPU is "Shares" is set to "High"
  2. Make sure you provide it a decent amount of RAM
  3. Make sure in ESXi that the VM's RAM is configured to "Reserve all guest memory (All locked)"
  4. Might want to give thought to providing a SLOG
 

Francis D

Cadet
Joined
Mar 7, 2016
Messages
6
Hi, here is an update. First with my mirror setup I was able to reach 100mb/sec with only 2 cores of 2.4ghz and 8gb of ram. Now with 6 cores of 2.93ghz, I can only get 60mb/sec . I'm still waiting my order to boost my ram and I will let you know...

As you can see, mirrors are a lot faster than RaidZ3 especialy on machines with limited hardware capacity.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Hi, here is an update. First with my mirror setup I was able to reach 100mb/sec with only 2 cores of 2.4ghz and 8gb of ram. Now with 6 cores of 2.93ghz, I can only get 60mb/sec . I'm still waiting my order to boost my ram and I will let you know...

As you can see, mirrors are a lot faster than RaidZ3 especialy on machines with limited hardware capacity.
If those are your speeds you have something wrong because a single disk can stream data over the network at 1000mb/s.
 
Status
Not open for further replies.
Top