Tracking weird issue with Disks

Status
Not open for further replies.

Py7h0n

Cadet
Joined
Nov 20, 2016
Messages
8
Had a system running for a while now (started life with 2tb disks and 9.01) - Last upgrade took it to:

U-NAS 800 case
X10SDV-4C-TLN2F board
64GB ECC Ram
LSI 9211 flashed with latest IT firmware
6TB Seagate NAS drives

Had no issues what so ever - Stable as could be!

Upgraded the 6TB disks to 10TB Seagate IronWolf disks and ever since I have been getting disks being marked as failed. Swap them with spares and the issue comes back on another disk / another slot. Randomly it is Write / Read and every now and then a Checksum.

[root@ZFS] ~# zpool status
pool: DATA1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Nov 21 15:48:30 2016
1.05T scanned out of 26.3T at 419M/s, 20h35m to go
131G resilvered, 3.44% done
config:

NAME STATE READ WRITE CKSUM
DATA1 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/b581e923-902e-11e6-b082-0cc47ac34350 FAULTED 9 279 0 too many errors
gptid/7c107e59-8918-11e6-a4f4-0cc47ac34350 ONLINE 0 0 0
gptid/564188b4-8f7a-11e6-b082-0cc47ac34350 ONLINE 0 0 0
gptid/6e411002-adf7-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0
gptid/f3218024-af94-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0 (resilvering)
gptid/db56fee2-8ea8-11e6-88a3-0cc47ac34350 ONLINE 0 0 0
gptid/c1f58b6f-8c4e-11e6-88a3-0cc47ac34350 ONLINE 0 0 0
gptid/3064b301-ad3b-11e6-ad6a-0cc47ac34350 ONLINE 0 0 0

errors: No known data errors


I've replaced SAS cables
I've replaced disks (I've made sure they are PMR, not SMR like the 8TB Archive disks)
I've swapped the LSI card to a 9300-8i along with new cables again
Smarts always come back good (quick or extended checks).
Disks always work and check out 100% in other systems (non zfs).

The only thing I have not changed is the chassis / disk backplane. Do you guys think this is the problem? Been chasing this issue for almost three weeks not and it is really annoying! I really dont think it is the disks but then again I cannot find anyone else who is running these disks with ZFS.

Any direction would be appreciated!

(Oh and I do live in New Zealand but far north where we have had no earthquakes... So not physical issues either ;) )
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
Well I don't have much experience with that kind of system so I can't say anything about the chassis... but what about the SMART data of the disks? Anything suspicious there? Eventhough you're confident it's not the disks, it's worth checking out.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
What is your power supply? They could be doing it because there isn't enough power.

Sent from my Nexus 5X using Tapatalk
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Those disks might need a bigger PSU.

Can probably verify by using a normal big PSU with the case off.
 

Py7h0n

Cadet
Joined
Nov 20, 2016
Messages
8
Thanks All.

Smarts are all good for sure (checked on 3 different systems with different tools).

PSU is a 400w unit purchased from u-nas. I will put a standard big 600w in front of it and see what happens - Honestly did not think about checking the PSU so thank you for the idea!
 

gimpbully

Cadet
Joined
Feb 23, 2017
Messages
1
Thanks All.

Smarts are all good for sure (checked on 3 different systems with different tools).

PSU is a 400w unit purchased from u-nas. I will put a standard big 600w in front of it and see what happens - Honestly did not think about checking the PSU so thank you for the idea!
Did this end up being a power supply issue? I've started having the same strange issues with ironwolf 10TB with a case strikingly similar to that unas one.
 

Py7h0n

Cadet
Joined
Nov 20, 2016
Messages
8
No.

I swapped boards, power supplies and cases without any success.

Upgrading to version 10 fixed it for me.

I now have 8 spare 10tb disks :p
Did this end up being a power supply issue? I've started having the same strange issues with ironwolf 10TB with a case strikingly similar to that unas one.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Py7h0n

Cadet
Joined
Nov 20, 2016
Messages
8

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Unfortunately, that's not going to be viable long-term. Good news is that 9.10.3 should behave the same way, since it's also FreeBSD 11.
 

Py7h0n

Cadet
Joined
Nov 20, 2016
Messages
8
What is not viable long term? Running Corrral?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Py7h0n

Cadet
Joined
Nov 20, 2016
Messages
8
Well I did run version 7 for a VERY long time before upgrading so I guess Corral could be the same :P.

It does 'just work' for the 10TB disks without any issues. I spent a VERY long time with the frustrating issue in this thread and it almost pushed me to Native FreeBSD so I am rather happy with Corral (Supported or not :P ).
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It's a low-level issue, so 9.10.3 is sure to solve it the same way Corral did.
 
Status
Not open for further replies.
Top