sterlingphoenix
Cadet
- Joined
- Jan 3, 2018
- Messages
- 3
Hi all,
I built my first FreeNAS box about a month ago (been building it for a while, but the final version happened about then). It's based on a Supermicro X11SSM-F-O motherboard. There are four WD RED 4TB drives in the case, and four more in an external enclosure which is a Lenovo SA120 connected to an LSI SAS9200-8E.
This is an upgrade from an 8x3TB WD Red drives (which was running under a somewhat unstable ZFS Linux build), and the plan was to add those drives to the new array as soon as it seemed stable.
Which, after 3 weeks, it did. Completely stable, no errors anywhere, clean test scrubs, etc.
So I took four drives off the old array and plugged them into the SA120.
Immediately I started getting dozens of SCSI Status Errors and Aborted Commands. I'm talking several every minute, up to thousands per day.
So naturally I removed the "new" drives.... except the errors keep happening. Thousands a day.
I ran a scrub and, other than making the error rate jump to hundreds a minute, it found no errors at all.
From what I can tell, these are errors between the LSI card and the array, not actual drive issues. I've tried upgrading firmware, moving the card to a different slot, changing cables, etc. Nothing.
Naturally having my large data pool running with all these errors - real or not - seems bad. Does anyone have any ideas or suggestions?
Here's what the errors looks like; they are showing up on all the external drives (and none of the internal):
I built my first FreeNAS box about a month ago (been building it for a while, but the final version happened about then). It's based on a Supermicro X11SSM-F-O motherboard. There are four WD RED 4TB drives in the case, and four more in an external enclosure which is a Lenovo SA120 connected to an LSI SAS9200-8E.
This is an upgrade from an 8x3TB WD Red drives (which was running under a somewhat unstable ZFS Linux build), and the plan was to add those drives to the new array as soon as it seemed stable.
Which, after 3 weeks, it did. Completely stable, no errors anywhere, clean test scrubs, etc.
So I took four drives off the old array and plugged them into the SA120.
Immediately I started getting dozens of SCSI Status Errors and Aborted Commands. I'm talking several every minute, up to thousands per day.
So naturally I removed the "new" drives.... except the errors keep happening. Thousands a day.
I ran a scrub and, other than making the error rate jump to hundreds a minute, it found no errors at all.
From what I can tell, these are errors between the LSI card and the array, not actual drive issues. I've tried upgrading firmware, moving the card to a different slot, changing cables, etc. Nothing.
Naturally having my large data pool running with all these errors - real or not - seems bad. Does anyone have any ideas or suggestions?
Here's what the errors looks like; they are showing up on all the external drives (and none of the internal):
Code:
Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): READ(10). CDB: 28 00 fe 6e 2b 30 00 00 b0 00 length 90112 SMID 952 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0 Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): READ(10). CDB: 28 00 fe 6e 2b 30 00 00 b0 00 Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): CAM status: CCB request completed with an error Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): Retrying command Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): READ(10). CDB: 28 00 fe 6e 2a 30 00 01 00 00 Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): CAM status: SCSI Status Error Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): SCSI status: Check Condition Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected) Jan 3 14:23:58 squirrel (da3:mps0:0:24:0): Retrying command (per sense data)