Raid degraded but volumes show healthy. View disks shows one 8.4 TB disk

Status
Not open for further replies.

OCCI_Dave

Cadet
Joined
Jul 16, 2018
Messages
4
I'm new to FreeNAS, working with an environment setup by someone else. In troubleshooting performance issues with my virtual servers I physically noticed that one of the drives in FreeNAS raid array has failed. I jumped into the FreeNAS GUI and found that the volumes all report healthy and only one line item shows up under "view disks", an 8.4 TB volume. As I was configuring E-mail reporting and such as outlined in Uncle Fester's Basic Guide it dawned on me that the raid configuration was probably setup on the physical Dell server and not within FreeNas, sound right? Interesting though that after setting up E-mail and alerts, I received an E-mail from FreeNas telling me the Raid was degraded.

Thanks in advance for any assistance
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
First we need the FULL DETAILS of your system. All specs and model numbers from the CPU to the case. On top of that, please provide the output of zpool list -v and if you could put that in code tags that would be great!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
it dawned on me that the raid configuration was probably setup on the physical Dell server and not within FreeNas, sound right?
That would explain what you're seeing, but isn't anywhere close to a "right" configuration. In addition to what @kdragon75 suggests, the output of zpool status would be helpful.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I have a sneaking suspicion that when we find out how he's set up, it's going to be extremely scary :(
 

OCCI_Dave

Cadet
Joined
Jul 16, 2018
Messages
4
zpool list -v run from the Shell accessed from the GUI seems to hang or maybe it is just taking a long time. Should it be run from Putty instead?

Server specs are as follows;
FreeNAS 9.2.1.88
Dell PowerEdge R720XD
Dual Xeon Six-Core 2.0 GHz E5-2620
12 Dell 600 GB 10K SAS
Perc H710 512 MB Raid controller
Dell Broadcom 5720 Quad-Port 1GB 1000Base-T PCI-e R Series
72 GB PC3-10600R DDR3
Dual 750 Watt PS

Security run output this morning as follows;
Code:
freenas..vmware kernel log messages:

> mfi0: 9259 (585066655s/0x0002/WARN) - PD 03(e0x20/s3) Path 50000c0f01048b4e  reset (Type 03)

> mfi0: 9260 (585066655s/0x0002/WARN) - Error on PD 03(e0x20/s3) (Error fa)

> mfi0: 9261 (585066655s/0x0002/info) - State change on PD 03(e0x20/s3) from ONLINE(18) to FAILED(11)

> mfi0: 9262 (585066655s/0x0001/info) - State change on VD 01/1 from OPTIMAL(3) to DEGRADED(2)

> mfi0: 9263 (585066655s/0x0001/CRIT) - VD 01/1 is now DEGRADED

> mfi0: 9264 (585066657s/0x0002/info) - State change on PD 03(e0x20/s3) from FAILED(11) to UNCONFIGURED_BAD(1)

> mfi0: 9265 (585066657s/0x0002/info) - Rebuild automatically started on PD 11(e0x20/s17)

> mfi0: 9266 (585066657s/0x0002/info) - State change on PD 11(e0x20/s17) from HOT SPARE(2) to REBUILD(14)

> mfi0: 9267 (585066657s/0x0002/WARN) - Removed: PD 03(e0x20/s3)

> mfi0: 9268 (585066657s/0x0002/info) - Removed: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=50000c0f01048b4e,0000000000000000

> mfi0: 9269 (585066674s/0x0002/info) - Inserted: PD 03(e0x20/s3)

> mfi0: 9270 (585066674s/0x0002/info) - Inserted: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=50000c0f01048b4e,0000000000000000

> mfi0: 9271 (585066674s/0x0002/info) - State change on PD 03(e0x20/s3) from UNCONFIGURED_BAD(1) to UNCONFIGURED_GOOD(0)

> mfi0: 9272 (585066735s/0x0002/WARN) - Predictive failure: PD 03(e0x20/s3)

> mfi0: 9273 (585066735s/0x0002/info) - Unexpected sense: PD 03(e0x20/s3) Path 50000c0f01048b4e, CDB: 4d 00 2f 00 00 00 00 00 0c 00, Sense:(null)

> mfi0: 9354 (585085935s/0x0002/WARN) - Predictive failure: PD 03(e0x20/s3)

> mfi0: 9375 (585092108s/0x0002/info) - Rebuild complete on VD 01/1

> mfi0: 9376 (585092108s/0x0002/info) - Rebuild complete on PD 11(e0x20/s17)

> mfi0: 9377 (585092109s/0x0002/info) - State change on PD 11(e0x20/s17) from REBUILD(14) to ONLINE(18)

> mfi0: 9378 (585092109s/0x0001/info) - State change on VD 01/1 from DEGRADED(2) to OPTIMAL(3)

> mfi0: 9379 (585092109s/0x0001/info) - VD 01/1 is now OPTIMAL
 
Last edited by a moderator:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Ok, so your right. They setup RAID on the controller and the controller presents one logical disk to the OS with no smart data. ZFS can see the errors but has no clue about the physical disks. This could also be causing your performance issues as ZFS will flush the transaction group to disk (but actually it hits the controller cache) and this can cause disk thrashing. You would see inconsistent burtsy performance followed by high latency while the controller cache is actually flushed to disk.

You need to find a supported HBA, take a look at the resource section of the forum (top nav bar). Unfortunately the data will need to be exported or restored from backup as there is no practical way of migrating from hardware RAID directly to ZFS "RAID" on the same disks.
 

OCCI_Dave

Cadet
Joined
Jul 16, 2018
Messages
4
Good (and bad) to know my instincts were correct. FedEx just delivered a replacement drive. Replacing the failed disk should automatically rebuild the new disk and put the hot swap spare back to sleep. The plan for 2018 is to replace this storage server with new hardware and use the old as backup. That will allow me to integrate a bit easier, restoring restore exported data from the old to the new which should help minimize downtime.

As stated in my first post, I'm green to FreeNAS so will have to read up on the do's and don'ts prior to purchasing new storage for my VMWare environment.

isn't anywhere close to a "right" configuration and extremely scary :( isn't where I want to end up and I have this exact same configuration at a second location so everything will be X 2.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Replacing the failed disk should automatically rebuild the new disk and put the hot swap spare back to sleep
Generally no. The new drive will become the hot spare. Be sure to test/burn in this new drive before using it! Once of the most important things with ZFS is to give direct access to each disk and let ZFS manage the disks/array.
isn't anywhere close to a "right" configuration and extremely scary :( isn't where I want to end up and I have this exact same configuration at a second location so everything will be X 2.
But now you can do it the right way and be the hero!
 

OCCI_Dave

Cadet
Joined
Jul 16, 2018
Messages
4
Even though FreeNAS is not handling the Raid, I'm impressed with the automated reporting and ability to know individual disk status in the Dell raid. Disk 11 did go back to sleep once 3 was replaced. Thanks for the help.
Code:
freenas..vmware kernel log messages:

> mfi0: 9381 (585239235s/0x0002/WARN) - PD 03(e0x20/s3) Path 50000c0f01048b4e  reset (Type 03)

> mfi0: 9382 (585239235s/0x0002/WARN) - Removed: PD 03(e0x20/s3)

> mfi0: 9383 (585239235s/0x0002/info) - Removed: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=50000c0f01048b4e,0000000000000000

> mfi0: 9384 (585239236s/0x0002/info) - State change on PD 03(e0x20/s3) from UNCONFIGURED_GOOD(0) to UNCONFIGURED_BAD(1)

> mfi0: 9385 (585239434s/0x0002/info) - Inserted: PD 03(e0x20/s3)

> mfi0: 9386 (585239434s/0x0002/info) - Inserted: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500003988858f706,0000000000000000

> mfi0: 9387 (585239434s/0x0002/info) - State change on PD 03(e0x20/s3) from UNCONFIGURED_BAD(1) to UNCONFIGURED_GOOD(0)

> mfi0: 9388 (585239434s/0x0002/info) - CopyBack automatically started on PD 03(e0x20/s3) from PD 11(e0x20/s17)

> mfi0: 9389 (585239434s/0x0002/info) - State change on PD 03(e0x20/s3) from UNCONFIGURED_GOOD(0) to COPYBACK(20)

> mfi0: 9490 (585248417s/0x0002/info) - CopyBack complete on PD 03(e0x20/s3) from PD 11(e0x20/s17)

> mfi0: 9491 (585248417s/0x0002/info) - State change on PD 03(e0x20/s3) from COPYBACK(20) to ONLINE(18)

> mfi0: 9492 (585248418s/0x0042/info) - Global Hot Spare created on PD 11(e0x20/s17) (global,rev,ea)

> mfi0: 9493 (585248418s/0x0002/info) - State change on PD 11(e0x20/s17) from ONLINE(18) to HOT SPARE(2)
 
Last edited by a moderator:
Status
Not open for further replies.
Top