Had a system running for a while now (started life with 2tb disks and 9.01) - Last upgrade took it to:
U-NAS 800 case
X10SDV-4C-TLN2F board
64GB ECC Ram
LSI 9211 flashed with latest IT firmware
6TB Seagate NAS drives
Had no issues what so ever - Stable as could be!
Upgraded the 6TB disks to 10TB Seagate IronWolf disks and ever since I have been getting disks being marked as failed. Swap them with spares and the issue comes back on another disk / another slot. Randomly it is Write / Read and every now and then a Checksum.
[root@ZFS] ~# zpool status
pool: DATA1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Nov 21 15:48:30 2016
1.05T scanned out of 26.3T at 419M/s, 20h35m to go
131G resilvered, 3.44% done
config:
NAME STATE READ WRITE CKSUM
DATA1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/b581e923-902e-11e6-b082-0cc47ac34350 FAULTED 9 279 0 too many errors
gptid/7c107e59-8918-11e6-a4f4-0cc47ac34350 ONLINE 0 0 0
gptid/564188b4-8f7a-11e6-b082-0cc47ac34350 ONLINE 0 0 0
gptid/6e411002-adf7-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0
gptid/f3218024-af94-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0 (resilvering)
gptid/db56fee2-8ea8-11e6-88a3-0cc47ac34350 ONLINE 0 0 0
gptid/c1f58b6f-8c4e-11e6-88a3-0cc47ac34350 ONLINE 0 0 0
gptid/3064b301-ad3b-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0
errors: No known data errors
I've replaced SAS cables
I've replaced disks (I've made sure they are PMR, not SMR like the 8TB Archive disks)
I've swapped the LSI card to a 9300-8i along with new cables again
Smarts always come back good (quick or extended checks).
Disks always work and check out 100% in other systems (non zfs).
The only thing I have not changed is the chassis / disk backplane. Do you guys think this is the problem? Been chasing this issue for almost three weeks not and it is really annoying! I really dont think it is the disks but then again I cannot find anyone else who is running these disks with ZFS.
Any direction would be appreciated!
(Oh and I do live in New Zealand but far north where we have had no earthquakes... So not physical issues either ;) )
U-NAS 800 case
X10SDV-4C-TLN2F board
64GB ECC Ram
LSI 9211 flashed with latest IT firmware
6TB Seagate NAS drives
Had no issues what so ever - Stable as could be!
Upgraded the 6TB disks to 10TB Seagate IronWolf disks and ever since I have been getting disks being marked as failed. Swap them with spares and the issue comes back on another disk / another slot. Randomly it is Write / Read and every now and then a Checksum.
[root@ZFS] ~# zpool status
pool: DATA1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Nov 21 15:48:30 2016
1.05T scanned out of 26.3T at 419M/s, 20h35m to go
131G resilvered, 3.44% done
config:
NAME STATE READ WRITE CKSUM
DATA1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/b581e923-902e-11e6-b082-0cc47ac34350 FAULTED 9 279 0 too many errors
gptid/7c107e59-8918-11e6-a4f4-0cc47ac34350 ONLINE 0 0 0
gptid/564188b4-8f7a-11e6-b082-0cc47ac34350 ONLINE 0 0 0
gptid/6e411002-adf7-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0
gptid/f3218024-af94-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0 (resilvering)
gptid/db56fee2-8ea8-11e6-88a3-0cc47ac34350 ONLINE 0 0 0
gptid/c1f58b6f-8c4e-11e6-88a3-0cc47ac34350 ONLINE 0 0 0
gptid/3064b301-ad3b-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0
errors: No known data errors
I've replaced SAS cables
I've replaced disks (I've made sure they are PMR, not SMR like the 8TB Archive disks)
I've swapped the LSI card to a 9300-8i along with new cables again
Smarts always come back good (quick or extended checks).
Disks always work and check out 100% in other systems (non zfs).
The only thing I have not changed is the chassis / disk backplane. Do you guys think this is the problem? Been chasing this issue for almost three weeks not and it is really annoying! I really dont think it is the disks but then again I cannot find anyone else who is running these disks with ZFS.
Any direction would be appreciated!
(Oh and I do live in New Zealand but far north where we have had no earthquakes... So not physical issues either ;) )