I think a drive is bad. Please confirm

Status
Not open for further replies.

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
OK, so the drive is logging errors internally, most recently during a short SMART test.

The only way I know for sure to tell if it's the drive, the cable, or a power issue, given no apparent catastrophic failure of the drive, is to replace each separately and see which replacement fixes it. Since you've tried replacing the cable, it's probably time to try replacing the drive.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
Woke up this morning with lots new write errors. I am going to swap the cable and move more data.

Code:
[root@Rick_James] ~# zpool status Vol1
  pool: Vol1
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 25.4M in 0h0m with 0 errors on Sat Nov 14 03:12:41 2015
config:

        NAME                                            STATE     READ WRITE CKSUM
        Vol1                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/f59a6e80-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0     0     0
            gptid/f6a0ed65-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0     6     0
            gptid/f7a889f1-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0     0     0
            gptid/f8ad1426-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0    15     0
            gptid/f9788d6d-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0    13     0
            gptid/fa86dfdf-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0     0     0
            gptid/fb895231-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0    13     0
            gptid/fc8ac078-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0     6     0
          raidz2-1                                      ONLINE       0     5     0
            gptid/fd5cb19c-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0    14     0
            gptid/fe6f518a-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0    13     0
            gptid/ff728308-87c2-11e5-9257-0cc47a6bd0ac  ONLINE       0    27     0
            gptid/007e4b9a-87c3-11e5-9257-0cc47a6bd0ac  ONLINE       0     0     0
            gptid/0189e083-87c3-11e5-9257-0cc47a6bd0ac  ONLINE       0    27     0
            gptid/028825dd-87c3-11e5-9257-0cc47a6bd0ac  ONLINE       0     7     0
            gptid/0385f501-87c3-11e5-9257-0cc47a6bd0ac  ONLINE       0     7     0
            gptid/0450883e-87c3-11e5-9257-0cc47a6bd0ac  ONLINE       0     6     0
        spares
          gptid/051cb5fd-87c3-11e5-9257-0cc47a6bd0ac    AVAIL

errors: No known data errors
 

random003

Dabbler
Joined
Sep 5, 2015
Messages
15
Errors like that indicate a problem further upstream from the drives. I would swap hardware in following order. Unfortunately this requires a lot of extra hardware and time.. : (

1- Swap Cable(s)
2- Move HBA to different pcie slot
3- Swap HBA
3- Swap motherboard
4- Swap backplane (chassis)
5- Swap power supply if you didn't swap it with backplane/chassis swap


Can you give us the model of your HBA?
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
I can do all that but the chassis/backplane. I've already tried 3 cables so unless they are all bad, that is not the issue.

I am using a LSI 9211-8i.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
Tried 2 other HBA and different PCI-E slots, same results. I think it might be the backplane. I've contacted the seller on Ebay to see if they will send me a new one.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
I can transfer small and large files with no errors coming up but when I throw several TB at it to copy I get all the write errors. No read errors so far. I am get Plex going and I will have several clients stream and see if I can get a read error.

Could the errors just be because I was moving so much data?
 

random003

Dabbler
Joined
Sep 5, 2015
Messages
15
The amount of errors you encounter is not acceptable in my opinion. This is an enterprise grade backplane. They are designed to handle tons of data. If you suspect the backplane there are two things that may be worth trying before replacing it.

1- Check the temp of the sas expander chip while transferring a lot of data after some time. Touch the heatsink, if it burning hot to the touch I would try cooling it down with some airflow and see if the errors increment.
2- The backplane should have three SFF-8087 sas ports. Try using a different port.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
So the seller is sending me a new backplane. If I get the same results then something is up with all my brand new hardware...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The amount of errors you encounter is not acceptable in my opinion. This is an enterprise grade backplane. They are designed to handle tons of data. If you suspect the backplane there are two things that may be worth trying before replacing it.

1- Check the temp of the sas expander chip while transferring a lot of data after some time. Touch the heatsink, if it burning hot to the touch I would try cooling it down with some airflow and see if the errors increment.
2- The backplane should have three SFF-8087 sas ports. Try using a different port.

Also make sure all the power is properly connected.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
New backplane came in. Is it safe to swap it out? Like will FreeNAS go apeshit if I do this?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No. Unlike Windows, FreeNAS really doesn't go apeshit at much of anything as long as you're being reasonable. Like, make sure you verify that all your disks are actually visible before you try bringing up the pool.

What you can do is to boot a fresh install of FreeNAS off a different boot device, go into single user mode, and then use that to validate your new hardware. I suggest that you do that, and go run a "dd if=/dev/daX of=/dev/null bs=1048576 &" as a read test for each of your disks (do them all in parallel). If all your disks are visible and everything seems happy-happy then try booting your normal NAS image.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
Ya I have no idea what you just said lol.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
I swapped out the new backplane and I still get the same write errors. It is fine for data up to like 30GB but when I move more than that, I start getting write errors.

Could it be the firmware on the HBA? It is a LSI 9211-8i with P20, would P19 be any better?
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
Firmware should match the version of FreeNAS you're running. If you're current, P20 is correct.
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
Firmware should match the version of FreeNAS you're running. If you're current, P20 is correct.
It is.... I gotta find out what the issue is here. Any other suggestions?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What SPECIFIC revision of firmware is on there? 20.00.04?
 

Fuganater

Patron
Joined
Sep 28, 2015
Messages
477
HBA SS.jpg
 
Status
Not open for further replies.
Top