Hi all,
I have a FreeNAS system that has a degraded RAID-Z2 volume and I'm not sure the best way to proceed - I'm hoping someone can help advise me how to stay safe. It's been through various upgrades and hardware changes over its lifetime but currently consists of:
- MSI Z270M Mortar motherboard
- Core i7-7700K CPU
- 64GB DDR4 (not ECC)
- Raidmax RX-1000AE PSU
- Norco RPC-3216 chassis
- 2x LSI 9211-8i controllers (Originally Dell M1015's) in IT mode
- Crucial CT120BX300SSD1 boot drive
- 8x 6TB WD Red data drives in raidz2:
It had been running fine for quite a while until one of the data drives recently started throwing SMART errors, so I replaced it. Problem is, WD has "upgraded" the EFRX to the EFAX now using SMR instead of PMR, and I cannot get the array to accept the new drive. Any time that I attempt a resilver the new drive gets faulted, like this:
and the dmesg fills with this:
Note: The new drive checks out fine according to SMART long self-test.
So far I have tried:
- Moving the drive to a different slot
- Upgrading FreeNAS (Was 11.2 U7, now 11.3 U1, this is why you see a zpool upgrade above)
- Updating controller firmware and BIOS (was 20.00.04.00 / 07.39.00.00, now 20.00.07.00 / 07.39.02.00)
Nothing has made any difference. As far as I can tell my options now are:
- Find a WD60EFRX, accept that It's going to be pricey, and run degraded while I wait for it
- Swap for another type of HDD (WD Red Pro?)
- Swap the PSU (it's fairly recent so this feels like a bit of a hail mary)
I'm hesitant to keep stressing such old drives with resilvers without any confidence that it'll work; I can't find anything about the WD60EFAX that indicates why it would be an issue so these all just seem like shots in the dark. I found some issues about the WD60EFRX-68L0BN1 being a problem and I have one of those in there, but it hasn't been an issue at any point and I'm skeptical that it could be causing issues with the new drive because they're on different controllers. I'm also hesitant so switch too far away from the EFRX because if the switch to EFAX has caused issues it seems like this would continue, so I'm left without a convincing option.
Any suggestions?
I have a FreeNAS system that has a degraded RAID-Z2 volume and I'm not sure the best way to proceed - I'm hoping someone can help advise me how to stay safe. It's been through various upgrades and hardware changes over its lifetime but currently consists of:
- MSI Z270M Mortar motherboard
- Core i7-7700K CPU
- 64GB DDR4 (not ECC)
- Raidmax RX-1000AE PSU
- Norco RPC-3216 chassis
- 2x LSI 9211-8i controllers (Originally Dell M1015's) in IT mode
- Crucial CT120BX300SSD1 boot drive
- 8x 6TB WD Red data drives in raidz2:
Code:
[root@booboo ~]# for i in {0..7}; do smartctl -a /dev/da$i | egrep "Device Model|Power_On_Hours"; done Device Model: WDC WD60EFRX-68MYMN1 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 40765 Device Model: WDC WD60EFRX-68MYMN1 9 Power_On_Hours 0x0032 054 054 000 Old_age Always - 34144 Device Model: WDC WD60EFRX-68L0BN1 9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7981 Device Model: WDC WD60EFRX-68MYMN1 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 40765 Device Model: WDC WD60EFRX-68MYMN1 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 40765 Device Model: WDC WD60EFRX-68MYMN1 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 40765 Device Model: WDC WD60EFRX-68MYMN1 9 Power_On_Hours 0x0032 045 045 000 Old_age Always - 40712 Device Model: WDC WD60EFAX-68JH4N0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 40
It had been running fine for quite a while until one of the data drives recently started throwing SMART errors, so I replaced it. Problem is, WD has "upgraded" the EFRX to the EFAX now using SMR instead of PMR, and I cannot get the array to accept the new drive. Any time that I attempt a resilver the new drive gets faulted, like this:
Code:
[root@booboo ~]# zpool status Storage pool: Storage state: DEGRADED status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub in progress since Thu Mar 19 13:29:09 2020 6.14T scanned at 1.10G/s, 1.21T issued at 1.00G/s, 32.7T total 0 repaired, 3.69% done, 0 days 08:56:29 to go config: NAME STATE READ WRITE CKSUM Storage DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/835dedeb-d1b8-11e4-afe0-6c626d4af35e ONLINE 0 0 0 gptid/4feeb1c8-a914-11e5-93c0-00259028b247 ONLINE 0 0 0 gptid/84461997-d1b8-11e4-afe0-6c626d4af35e ONLINE 0 0 0 gptid/84b98196-d1b8-11e4-afe0-6c626d4af35e ONLINE 0 0 0 gptid/e39a940b-0569-11e9-bb96-000c2959bfd4 ONLINE 0 0 0 gptid/85aae6f2-d1b8-11e4-afe0-6c626d4af35e ONLINE 0 0 0 replacing-6 UNAVAIL 0 0 6 1065334894515511882 UNAVAIL 0 0 0 was /dev/gptid/862982d0-d1b8-11e4-afe0-6c626d4af35e gptid/eac356b2-67ff-11ea-89d8-4ccc6ad6a297 FAULTED 0 69 0 too many errors gptid/86a13781-d1b8-11e4-afe0-6c626d4af35e ONLINE 0 0 0 errors: No known data errors
and the dmesg fills with this:
Code:
(da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 80 b7 47 10 00 00 30 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x80b74710 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 80 bb aa 58 00 00 60 00 length 49152 SMID 1050 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 80 bb aa 58 00 00 60 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 80 bb a9 58 00 01 00 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x80bba958 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 80 bb ab c0 00 00 60 00 length 49152 SMID 1057 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 length 8192 SMID 227 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): READ(16). CDB: 88 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 length 8192 SMID 638 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 80 bb ab c0 00 00 60 00 (da7:mps1:0:3:0): READ(16). CDB: 88 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 length 8192 SMID 760 terminated ioc 804b log(da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 info 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): READ(16). CDB: 88 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): READ(16). CDB: 88 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 80 bb aa c0 00 01 00 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x80bbaac0 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e2 90 00 00 60 00 length 49152 SMID 395 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e2 90 00 00 60 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e1 90 00 01 00 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x45e190 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 40 02 90 00 00 10 00 length 8192 SMID 462 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 length 8192 SMID 808 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 length 8192 SMID 305 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 40 02 90 00 00 10 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 c2 d8 00 00 88 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x45c2d8 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 length 8192 SMID 587 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e2 f8 00 01 00 00 length 131072 SMID 223 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e3 f8 00 00 60 00 length 49152 SMID 852 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e2 f8 00 01 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e3 f8 00 00 60 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x2baa0f090 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e3 f8 00 00 60 00 length 49152 SMID 981 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e3 f8 00 00 60 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 45 e2 f8 00 01 00 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x45e2f8 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 4a 31 98 00 00 b0 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x4a3198 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 length 8192 SMID 112 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): READ(16). CDB: 88 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 length 8192 SMID 913 terminated ioc 804b log(da7:mps1:0:3:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 info 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): READ(16). CDB: 88 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 length 8192 SMID 1077 terminated ioc 804b lo(da7:mps1:0:3:0): READ(16). CDB: 88 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 ginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): READ(16). CDB: 88 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 4a 33 50 00 00 60 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x4a3350 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 40 02 90 00 00 10 00 length 8192 SMID 326 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 length 8192 SMID 214 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 length 8192 SMID 493 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 40 02 90 00 00 10 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 (da7:mps1:0:3:0): CAM status: CCB request completed with an error (da7:mps1:0:3:0): Retrying command (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 4a 2b 80 00 00 38 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x4a2b80 (da7:mps1:0:3:0): Error 22, Unretryable error (da7:mps1:0:3:0): WRITE(10). CDB: 2a 00 00 4e 73 e8 00 00 b0 00 (da7:mps1:0:3:0): CAM status: SCSI Status Error (da7:mps1:0:3:0): SCSI status: Check Condition (da7:mps1:0:3:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mps1:0:3:0): Info: 0x4e73e8 (da7:mps1:0:3:0): Error 22, Unretryable error
Note: The new drive checks out fine according to SMART long self-test.
So far I have tried:
- Moving the drive to a different slot
- Upgrading FreeNAS (Was 11.2 U7, now 11.3 U1, this is why you see a zpool upgrade above)
- Updating controller firmware and BIOS (was 20.00.04.00 / 07.39.00.00, now 20.00.07.00 / 07.39.02.00)
Nothing has made any difference. As far as I can tell my options now are:
- Find a WD60EFRX, accept that It's going to be pricey, and run degraded while I wait for it
- Swap for another type of HDD (WD Red Pro?)
- Swap the PSU (it's fairly recent so this feels like a bit of a hail mary)
I'm hesitant to keep stressing such old drives with resilvers without any confidence that it'll work; I can't find anything about the WD60EFAX that indicates why it would be an issue so these all just seem like shots in the dark. I found some issues about the WD60EFRX-68L0BN1 being a problem and I have one of those in there, but it hasn't been an issue at any point and I'm skeptical that it could be causing issues with the new drive because they're on different controllers. I'm also hesitant so switch too far away from the EFRX because if the switch to EFAX has caused issues it seems like this would continue, so I'm left without a convincing option.
Any suggestions?