Note: I'm not physically able to access this hardware right now due to covid so my ability to try different things is limited, just trying to confirm my suspicion that this is the worst case physical hardware degradation
I've been recently trying to diagnose really poor dd and smb performance on a remote server I have. DD zero writes have been atrocious at 25-30MBytes/s and same in samba. While diagnosing, I decide to try scrubbing my drives and near the middle of the scrub I got a chain of emails within a 12 min span alerting me that all 4 drives were causing slow I/O on the pool:
All my long SMART tests shows healthy and my pool also has no issues now and in the past. I fear this may be the backplane going out on my device which would be the worst case since I can't do anything about that while I'm away from this server. I just wanted to get the help of the community to see if anyone has any other thoughts and things I can try remotely before I just give up until I can travel back to fix the physical issue.
Setup:
U-NAS NSC-400 Enclosure
X10SDV-6C+-TLN4F-O (Xeon D-1528)
64GB Registered ECC RAM
4x Seagate Baracuda 8TB SMR Drives (ST8000DM004) in RaidZ2
I've been recently trying to diagnose really poor dd and smb performance on a remote server I have. DD zero writes have been atrocious at 25-30MBytes/s and same in samba. While diagnosing, I decide to try scrubbing my drives and near the middle of the scrub I got a chain of emails within a 12 min span alerting me that all 4 drives were causing slow I/O on the pool:
I've seen this error in the past, usually only on one drive every 2 months or so and I originally thought it was due to the drives I used being the recently discovered SMR drives. I did not suspect this error was causing the write performance issues as past performance in the past 2 years has been very good at +110Mbytes/s SMB performance. Now I'm very suspicious that this was the root cause all along but I can't find that many posts about this error and it's typical root cause.New alerts:
* Device /dev/gptid/90518e3a-f397-11e8-9303-0cc47ac2d7cc.eli is causing slow I/O on pool vol1.
Current alerts:
* Device /dev/gptid/926767f0-f397-11e8-9303-0cc47ac2d7cc.eli is causing slow I/O on pool vol1.
* Device /dev/gptid/8e3610d3-f397-11e8-9303-0cc47ac2d7cc.eli is causing slow I/O on pool vol1.
* Device /dev/gptid/90518e3a-f397-11e8-9303-0cc47ac2d7cc.eli is causing slow I/O on pool vol1.
All my long SMART tests shows healthy and my pool also has no issues now and in the past. I fear this may be the backplane going out on my device which would be the worst case since I can't do anything about that while I'm away from this server. I just wanted to get the help of the community to see if anyone has any other thoughts and things I can try remotely before I just give up until I can travel back to fix the physical issue.
Setup:
U-NAS NSC-400 Enclosure
X10SDV-6C+-TLN4F-O (Xeon D-1528)
64GB Registered ECC RAM
4x Seagate Baracuda 8TB SMR Drives (ST8000DM004) in RaidZ2