Raid degraded but volumes show healthy. View disks shows one 8.4 TB disk

OCCI_Dave · Jul 16, 2018

I'm new to FreeNAS, working with an environment setup by someone else. In troubleshooting performance issues with my virtual servers I physically noticed that one of the drives in FreeNAS raid array has failed. I jumped into the FreeNAS GUI and found that the volumes all report healthy and only one line item shows up under "view disks", an 8.4 TB volume. As I was configuring E-mail reporting and such as outlined in Uncle Fester's Basic Guide it dawned on me that the raid configuration was probably setup on the physical Dell server and not within FreeNas, sound right? Interesting though that after setting up E-mail and alerts, I received an E-mail from FreeNas telling me the Raid was degraded.

Thanks in advance for any assistance

kdragon75 · Jul 16, 2018

First we need the FULL DETAILS of your system. All specs and model numbers from the CPU to the case. On top of that, please provide the output of zpool list -v and if you could put that in code tags that would be great!

danb35 · Jul 16, 2018

OCCI_Dave said:
it dawned on me that the raid configuration was probably setup on the physical Dell server and not within FreeNas, sound right?

That would explain what you're seeing, but isn't anywhere close to a "right" configuration. In addition to what @kdragon75 suggests, the output of zpool status would be helpful.

DrKK · Jul 16, 2018

I have a sneaking suspicion that when we find out how he's set up, it's going to be extremely scary :(

OCCI_Dave · Jul 17, 2018

zpool list -v run from the Shell accessed from the GUI seems to hang or maybe it is just taking a long time. Should it be run from Putty instead?

Server specs are as follows;
FreeNAS 9.2.1.88
Dell PowerEdge R720XD
Dual Xeon Six-Core 2.0 GHz E5-2620
12 Dell 600 GB 10K SAS
Perc H710 512 MB Raid controller
Dell Broadcom 5720 Quad-Port 1GB 1000Base-T PCI-e R Series
72 GB PC3-10600R DDR3
Dual 750 Watt PS

Security run output this morning as follows;

Code:

freenas..vmware kernel log messages:

> mfi0: 9259 (585066655s/0x0002/WARN) - PD 03(e0x20/s3) Path 50000c0f01048b4e  reset (Type 03)

> mfi0: 9260 (585066655s/0x0002/WARN) - Error on PD 03(e0x20/s3) (Error fa)

> mfi0: 9261 (585066655s/0x0002/info) - State change on PD 03(e0x20/s3) from ONLINE(18) to FAILED(11)

> mfi0: 9262 (585066655s/0x0001/info) - State change on VD 01/1 from OPTIMAL(3) to DEGRADED(2)

> mfi0: 9263 (585066655s/0x0001/CRIT) - VD 01/1 is now DEGRADED

> mfi0: 9264 (585066657s/0x0002/info) - State change on PD 03(e0x20/s3) from FAILED(11) to UNCONFIGURED_BAD(1)

> mfi0: 9265 (585066657s/0x0002/info) - Rebuild automatically started on PD 11(e0x20/s17)

> mfi0: 9266 (585066657s/0x0002/info) - State change on PD 11(e0x20/s17) from HOT SPARE(2) to REBUILD(14)

> mfi0: 9267 (585066657s/0x0002/WARN) - Removed: PD 03(e0x20/s3)

> mfi0: 9268 (585066657s/0x0002/info) - Removed: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=50000c0f01048b4e,0000000000000000

> mfi0: 9269 (585066674s/0x0002/info) - Inserted: PD 03(e0x20/s3)

> mfi0: 9270 (585066674s/0x0002/info) - Inserted: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=50000c0f01048b4e,0000000000000000

> mfi0: 9271 (585066674s/0x0002/info) - State change on PD 03(e0x20/s3) from UNCONFIGURED_BAD(1) to UNCONFIGURED_GOOD(0)

> mfi0: 9272 (585066735s/0x0002/WARN) - Predictive failure: PD 03(e0x20/s3)

> mfi0: 9273 (585066735s/0x0002/info) - Unexpected sense: PD 03(e0x20/s3) Path 50000c0f01048b4e, CDB: 4d 00 2f 00 00 00 00 00 0c 00, Sense:(null)

> mfi0: 9354 (585085935s/0x0002/WARN) - Predictive failure: PD 03(e0x20/s3)

> mfi0: 9375 (585092108s/0x0002/info) - Rebuild complete on VD 01/1

> mfi0: 9376 (585092108s/0x0002/info) - Rebuild complete on PD 11(e0x20/s17)

> mfi0: 9377 (585092109s/0x0002/info) - State change on PD 11(e0x20/s17) from REBUILD(14) to ONLINE(18)

> mfi0: 9378 (585092109s/0x0001/info) - State change on VD 01/1 from DEGRADED(2) to OPTIMAL(3)

> mfi0: 9379 (585092109s/0x0001/info) - VD 01/1 is now OPTIMAL

kdragon75 · Jul 17, 2018

Ok, so your right. They setup RAID on the controller and the controller presents one logical disk to the OS with no smart data. ZFS can see the errors but has no clue about the physical disks. This could also be causing your performance issues as ZFS will flush the transaction group to disk (but actually it hits the controller cache) and this can cause disk thrashing. You would see inconsistent burtsy performance followed by high latency while the controller cache is actually flushed to disk.

You need to find a supported HBA, take a look at the resource section of the forum (top nav bar). Unfortunately the data will need to be exported or restored from backup as there is no practical way of migrating from hardware RAID directly to ZFS "RAID" on the same disks.

OCCI_Dave · Jul 17, 2018

Good (and bad) to know my instincts were correct. FedEx just delivered a replacement drive. Replacing the failed disk should automatically rebuild the new disk and put the hot swap spare back to sleep. The plan for 2018 is to replace this storage server with new hardware and use the old as backup. That will allow me to integrate a bit easier, restoring restore exported data from the old to the new which should help minimize downtime.

As stated in my first post, I'm green to FreeNAS so will have to read up on the do's and don'ts prior to purchasing new storage for my VMWare environment.

isn't anywhere close to a "right" configuration and extremely scary :( isn't where I want to end up and I have this exact same configuration at a second location so everything will be X 2.

kdragon75 · Jul 17, 2018

OCCI_Dave said:
Replacing the failed disk should automatically rebuild the new disk and put the hot swap spare back to sleep

Generally no. The new drive will become the hot spare. Be sure to test/burn in this new drive before using it! Once of the most important things with ZFS is to give direct access to each disk and let ZFS manage the disks/array.

OCCI_Dave said:
isn't anywhere close to a "right" configuration and extremely scary :( isn't where I want to end up and I have this exact same configuration at a second location so everything will be X 2.

But now you can do it the right way and be the hero!

OCCI_Dave · Jul 19, 2018

Even though FreeNAS is not handling the Raid, I'm impressed with the automated reporting and ability to know individual disk status in the Dell raid. Disk 11 did go back to sleep once 3 was replaced. Thanks for the help.

Code:

freenas..vmware kernel log messages:

> mfi0: 9381 (585239235s/0x0002/WARN) - PD 03(e0x20/s3) Path 50000c0f01048b4e  reset (Type 03)

> mfi0: 9382 (585239235s/0x0002/WARN) - Removed: PD 03(e0x20/s3)

> mfi0: 9383 (585239235s/0x0002/info) - Removed: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=50000c0f01048b4e,0000000000000000

> mfi0: 9384 (585239236s/0x0002/info) - State change on PD 03(e0x20/s3) from UNCONFIGURED_GOOD(0) to UNCONFIGURED_BAD(1)

> mfi0: 9385 (585239434s/0x0002/info) - Inserted: PD 03(e0x20/s3)

> mfi0: 9386 (585239434s/0x0002/info) - Inserted: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500003988858f706,0000000000000000

> mfi0: 9387 (585239434s/0x0002/info) - State change on PD 03(e0x20/s3) from UNCONFIGURED_BAD(1) to UNCONFIGURED_GOOD(0)

> mfi0: 9388 (585239434s/0x0002/info) - CopyBack automatically started on PD 03(e0x20/s3) from PD 11(e0x20/s17)

> mfi0: 9389 (585239434s/0x0002/info) - State change on PD 03(e0x20/s3) from UNCONFIGURED_GOOD(0) to COPYBACK(20)

> mfi0: 9490 (585248417s/0x0002/info) - CopyBack complete on PD 03(e0x20/s3) from PD 11(e0x20/s17)

> mfi0: 9491 (585248417s/0x0002/info) - State change on PD 03(e0x20/s3) from COPYBACK(20) to ONLINE(18)

> mfi0: 9492 (585248418s/0x0042/info) - Global Hot Spare created on PD 11(e0x20/s17) (global,rev,ea)

> mfi0: 9493 (585248418s/0x0002/info) - State change on PD 11(e0x20/s17) from ONLINE(18) to HOT SPARE(2)

HoneyBadger · Jul 19, 2018

Glad things have worked. We'll look forward to the new build validation post when it comes time to put together the replacement.

Important Announcement for the TrueNAS Community.

Raid degraded but volumes show healthy. View disks shows one 8.4 TB disk

OCCI_Dave

Cadet

kdragon75

Wizard

danb35

Hall of Famer

DrKK

FreeNAS Generalissimo

OCCI_Dave

Cadet

kdragon75

Wizard

OCCI_Dave

Cadet

kdragon75

Wizard

OCCI_Dave

Cadet

HoneyBadger

actually does care

Similar threads

Important Announcement for the TrueNAS Community.

Raid degraded but volumes show healthy. View disks shows one 8.4 TB disk

Cadet

Wizard

Hall of Famer

FreeNAS Generalissimo

Cadet

Wizard

Cadet

Wizard

Cadet

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Raid degraded but volumes show healthy. View disks shows one 8.4 TB disk"

Similar threads