Bad drive or cable?: "(da9:mps1:0:2:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC

Status
Not open for further replies.

esamett

Patron
Joined
May 28, 2011
Messages
345
Security output email today:

Code:

freenas.domain kernel log messages:
>   (da9:mps1:0:2:0): READ(10). CDB: 28 00 1d 98 2c e0 00 00 20 00 length 16384 SMID 875 terminated ioc 804b scsi 0 state 0 xfer 0
> (da9:mps1:0:2:0): READ(10). CDB: 28 00 1d 98 2c c0 00 00 20 00
> (da9:mps1:0:2:0): CAM status: SCSI Status Error
> (da9:mps1:0:2:0): SCSI status: Check Condition
> (da9:mps1:0:2:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da9:mps1:0:2:0): Retrying command (per sense data)

-- End of security output --


Smart error email message:
Code:
Subject: SMART error (ErrorCount) detected on host: freenas

This message was generated by the smartd daemon running on:

host name:  freenas
DNS domain: domain

The following warning/error was logged by the smartd daemon:

Device: /dev/da9 [SAT], ATA error count increased from 0 to 1

Device info:
HGST HDN724040ALE640, S/N:PK2381PBHATTLR, WWN:5-000cca-23dd30049, FW:MJAOA5E0, 4.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional messages about this problem will be sent.

smartctl -a /dev/da9 output:

Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)  
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org  
  
=== START OF INFORMATION SECTION ===  
Device Model:  HGST HDN724040ALE640  
Serial Number:  PK2381PBHATTLR  
LU WWN Device Id: 5 000cca 23dd30049  
Firmware Version: MJAOA5E0  
User Capacity:  4,000,787,030,016 bytes [4.00 TB]  
Sector Sizes:  512 bytes logical, 4096 bytes physical  
Rotation Rate:  7200 rpm  
Device is:  Not in smartctl database [for details use: -P showall]  
ATA Version is:  ATA8-ACS T13/1699-D revision 4  
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)  
Local Time is:  Fri Oct 31 18:14:17 2014 PDT  
SMART support is: Available - device has SMART capability.  
SMART support is: Enabled  
  
=== START OF READ SMART DATA SECTION ===  
SMART overall-health self-assessment test result: PASSED  
  
General SMART Values:  
Offline data collection status:  (0x82) Offline data collection activity
  was completed without error.  
  Auto Offline Data Collection: Enabled. 
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever 
  been run.  
Total time to complete Offline  
data collection:  (  24) seconds.  
Offline data collection  
capabilities:  (0x5b) SMART execute Offline immediate.  
  Auto Offline data collection on/off supp
ort.  
  Suspend Offline collection upon new  
  command.  
  Offline surface scan supported.  
  Self-test supported.  
  No Conveyance Self-test supported.  
  Selective Self-test supported.  
SMART capabilities:  (0x0003) Saves SMART data before entering  
  power-saving mode.  
  Supports SMART auto save timer.  
Error logging capability:  (0x01) Error logging supported.  
  General Purpose Logging supported.  
Short self-test routine  
recommended polling time:  (  1) minutes.  
Extended self-test routine  
recommended polling time:  ( 556) minutes.  
SCT capabilities:  (0x003d) SCT Status supported.  
  SCT Error Recovery Control supported.  
  SCT Feature Control supported.  
  SCT Data Table supported.  
  
SMART Attributes Data Structure revision number: 16  
Vendor Specific SMART Attributes with Thresholds:  
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_
FAILED RAW_VALUE  
  1 Raw_Read_Error_Rate  0x000b  100  100  016  Pre-fail  Always  -
  0  
  2 Throughput_Performance  0x0005  136  136  054  Pre-fail  Offline  -
  80  
  3 Spin_Up_Time  0x0007  131  131  024  Pre-fail  Always  -
  605 (Average 576)  
  4 Start_Stop_Count  0x0012  100  100  000  Old_age  Always  -
  26  
  5 Reallocated_Sector_Ct  0x0033  100  100  005  Pre-fail  Always  -
  0  
  7 Seek_Error_Rate  0x000b  100  100  067  Pre-fail  Always  -
  0  
  8 Seek_Time_Performance  0x0005  124  124  020  Pre-fail  Offline  -
  33  
  9 Power_On_Hours  0x0012  100  100  000  Old_age  Always  -
  1981  
10 Spin_Retry_Count  0x0013  100  100  060  Pre-fail  Always  -
  0  
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -
  26  
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -
  97  
193 Load_Cycle_Count  0x0012  100  100  000  Old_age  Always  -
  97  
194 Temperature_Celsius  0x0002  171  171  000  Old_age  Always  -
  35 (Min/Max 24/50)  
196 Reallocated_Event_Count 0x0032  100  100  000  Old_age  Always  -
  0  
197 Current_Pending_Sector  0x0022  100  100  000  Old_age  Always  -
  0  
198 Offline_Uncorrectable  0x0008  100  100  000  Old_age  Offline  -
  0  
199 UDMA_CRC_Error_Count  0x000a  200  200  000  Old_age  Always  -
  2  
  
SMART Error Log Version: 1  
ATA Error Count: 2  
  CR = Command Register [HEX]  
  FR = Features Register [HEX]  
  SC = Sector Count Register [HEX]  
  SN = Sector Number Register [HEX]  
  CL = Cylinder Low Register [HEX]  
  CH = Cylinder High Register [HEX]  
  DH = Device/Head Register [HEX]  
  DC = Device Command Register [HEX]  
  ER = Error register [HEX]  
  ST = Status register [HEX]  
Powered_Up_Time is measured from power on, and printed as  
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,  
SS=sec, and sss=millisec. It "wraps" after 49.710 days.  
  
Error 2 occurred at disk power-on lifetime: 1974 hours (82 days + 6 hours)  
  When the command that caused the error occurred, the device was active or idle
.  
  
  After command completion occurred, registers were:  
  
ER ST SC SN CL CH DH  
  -- -- -- -- -- -- --  
  84 51 01 27 e1 6d 00  Error: ICRC, ABRT at LBA = 0x006de127 = 7201063  
  
  Commands leading to the command that caused the error were:  
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name  
  -- -- -- -- -- -- -- --  ----------------  --------------------  
  60 20 00 08 e1 6d 40 00  1d+03:06:22.004  READ FPDMA QUEUED  
  60 20 00 e8 e0 6d 40 00  1d+03:06:22.004  READ FPDMA QUEUED  
  60 20 00 c8 e0 6d 40 00  1d+03:06:22.004  READ FPDMA QUEUED  
  60 20 00 a8 e0 6d 40 00  1d+03:06:22.002  READ FPDMA QUEUED  
  60 20 00 88 e0 6d 40 00  1d+03:06:22.001  READ FPDMA QUEUED  
  
Error 1 occurred at disk power-on lifetime: 1965 hours (81 days + 21 hours)  
  When the command that caused the error occurred, the device was active or idle
.  
  
  After command completion occurred, registers were:  
  ER ST SC SN CL CH DH  
  -- -- -- -- -- -- --  
  84 51 01 df 2c 98 0d  Error: ICRC, ABRT at LBA = 0x0d982cdf = 228076767  
  
  Commands leading to the command that caused the error were:  
  
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name  
  -- -- -- -- -- -- -- --  ----------------  --------------------  
  60 20 18 e0 2c 98 40 00  18:14:52.246  READ FPDMA QUEUED  
  60 20 10 c0 2c 98 40 00  18:14:52.246  READ FPDMA QUEUED  
  60 20 08 a0 2c 98 40 00  18:14:52.246  READ FPDMA QUEUED  
  60 20 00 80 2c 98 40 00  18:14:52.246  READ FPDMA QUEUED  
  60 08 00 48 e8 58 40 00  18:14:52.246  READ FPDMA QUEUED  
  
SMART Self-test log structure revision number 1  
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA
_of_first_error  
# 1  Short offline  Completed without error  00%  1057  - 
# 2  Short offline  Completed without error  00%  1015  - 
# 3  Short offline  Completed without error  00%  967  - 
# 4  Short offline  Completed without error  00%  919  - 
# 5  Short offline  Completed without error  00%  871  - 
# 6  Short offline  Completed without error  00%  823  - 
# 7  Short offline  Completed without error  00%  775  - 
# 8  Extended offline  Completed without error  00%  762  - 
# 9  Short offline  Completed without error  00%  727  - 
#10  Short offline  Completed without error  00%  679  - 
#11  Short offline  Completed without error  00%  633  - 
#12  Short offline  Completed without error  00%  585  - 
#13  Short offline  Completed without error  00%  561  - 
#14  Short offline  Completed without error  00%  513  - 
  
SMART Selective self-test log data structure revision number 1  
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS  
  1  0  0  Not_testing  
  2  0  0  Not_testing  
  3  0  0  Not_testing  
  4  0  0  Not_testing  
  5  0  0  Not_testing  
Selective self-test flags (0x0):  
  After scanning selected spans, do NOT read-scan remainder of disk.  
If Selective self-test is pending on power-up, resume after 0 minute delay.  
   


Long smart test started:
Code:
[root@freenas ~]# smartctl -t long /dev/da9
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-li
ne mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line
mode" successful.
Testing has begun.
Please wait 556 minutes for test to complete.
Test will complete after Sat Nov 1 03:41:18 2014

Use smartctl -X to abort test.
[root@freenas ~]#



Question: Bad drive, cable, Halloween gremlins? I searched the error on this forum and the one result I saw diagnosed a bad backplane after multiple disks errors noted. I have 8087 to SATA fan cables in use. This is the only error that has come up recently after I changed a bad cable and wiped out the old errors from an old 2TB drive. I briefly had temps to around 40'C while heating my room while PRIME95 testing my desktop. Otherwise no FreeNAS issues at all for about two months.

Thanks...as always.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Could be the cable, I see two UDMA CRC errors.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Basically the communication between your SAS/SATA controller and hard drive had a problem. So anything that could impact that communication is suspect. The data cable is the easiest, so that's the first thing we recommend people replace. It's cheap and easy so why not? ;)
 

esamett

Patron
Joined
May 28, 2011
Messages
345
These guys are really fragile!! Any recommendation for a "sturdy" one?

More messages:

Security output:
Code:
freenas.domain kernel log messages:
> (da9:mps1:0:2:0): READ(10). CDB: 28 00 20 6d e1 08 00 00 20 00
> (da9:mps1:0:2:0): CAM status: SCSI Status Error
> (da9:mps1:0:2:0): SCSI status: Check Condition
> (da9:mps1:0:2:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da9:mps1:0:2:0): Retrying command (per sense data)
>   (da9:mps1:0:2:0): WRITE(10). CDB: 2a 00 38 24 5e 48 00 00 18 00 length 12288 SMID 654 terminated ioc 804b scsi 0 state c xfer 0
> (da9:mps1:0:2:0): WRITE(10). CDB: 2a 00 38 24 5e 48 00 00 18 00
> (da9:mps1:0:2:0): CAM status: SCSI Status Error
> (da9:mps1:0:2:0): SCSI status: Check Condition
> (da9:mps1:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mps1:0:2:0): Retrying command (per sense data)
> (da9:mps1:0:2:0): READ(16). CDB: 88 00 00 00 00 01 0f b3 31 b0 00 00 00 20 00 00
> (da9:mps1:0:2:0): CAM status: SCSI Status Error
> (da9:mps1:0:2:0): SCSI status: Check Condition
> (da9:mps1:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mps1:0:2:0): Retrying command (per sense data)

-- End of security output --


smartctl -a /dev/da9 output after overnight long test:
Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)   
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org   
   
=== START OF INFORMATION SECTION ===   
Device Model:  HGST HDN724040ALE640   
Serial Number:  PK2381PBHATTLR   
LU WWN Device Id: 5 000cca 23dd30049   
Firmware Version: MJAOA5E0   
User Capacity:  4,000,787,030,016 bytes [4.00 TB]   
Sector Sizes:  512 bytes logical, 4096 bytes physical   
Rotation Rate:  7200 rpm   
Device is:  Not in smartctl database [for details use: -P showall]   
ATA Version is:  ATA8-ACS T13/1699-D revision 4   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)   
Local Time is:  Sat Nov  1 08:11:05 2014 PDT   
SMART support is: Available - device has SMART capability.   
SMART support is: Enabled   
   
=== START OF READ SMART DATA SECTION ===   
SMART overall-health self-assessment test result: PASSED   
   
General SMART Values:   
Offline data collection status:  (0x82) Offline data collection activity  

Shell

  was completed without error.   
  Auto Offline Data Collection: Enabled.  
Self-test execution status:  (  38) The self-test routine was interrupted   
  by the host with a hard or soft reset.  
Total time to complete Offline   
data collection:  (  24) seconds.   
Offline data collection   
capabilities:  (0x5b) SMART execute Offline immediate.   
  Auto Offline data collection on/off supp
ort.   
  Suspend Offline collection upon new   
  command.   
  Offline surface scan supported.   
  Self-test supported.   
  No Conveyance Self-test supported.   
  Selective Self-test supported.   
SMART capabilities:  (0x0003) Saves SMART data before entering   
  power-saving mode.   
  Supports SMART auto save timer.   
Error logging capability:  (0x01) Error logging supported.   
  General Purpose Logging supported.   
Short self-test routine   
recommended polling time:  (  1) minutes.   
Extended self-test routine   
recommended polling time:  ( 556) minutes.   
SCT capabilities:  (0x003d) SCT Status supported.   
  SCT Error Recovery Control supported.   
  SCT Feature Control supported.   
  SCT Data Table supported.   
   
SMART Attributes Data Structure revision number: 16   
Vendor Specific SMART Attributes with Thresholds:   
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_
FAILED RAW_VALUE   
  1 Raw_Read_Error_Rate  0x000b  100  100  016  Pre-fail  Always  -
  0   
  2 Throughput_Performance  0x0005  136  136  054  Pre-fail  Offline  -
  80   
  3 Spin_Up_Time  0x0007  131  131  024  Pre-fail  Always  -
  605 (Average 576)   
  4 Start_Stop_Count  0x0012  100  100  000  Old_age  Always  -
  26   
  5 Reallocated_Sector_Ct  0x0033  100  100  005  Pre-fail  Always  -
  0   
  7 Seek_Error_Rate  0x000b  100  100  067  Pre-fail  Always  -
  0   

Shell

  8 Seek_Time_Performance  0x0005  124  124  020  Pre-fail  Offline  -
  33   
  9 Power_On_Hours  0x0012  100  100  000  Old_age  Always  -
  1995   
 10 Spin_Retry_Count  0x0013  100  100  060  Pre-fail  Always  -
  0   
 12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -
  26   
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -
  98   
193 Load_Cycle_Count  0x0012  100  100  000  Old_age  Always  -
  98   
194 Temperature_Celsius  0x0002  166  166  000  Old_age  Always  -
  36 (Min/Max 24/50)   
196 Reallocated_Event_Count 0x0032  100  100  000  Old_age  Always  -
  0   
197 Current_Pending_Sector  0x0022  100  100  000  Old_age  Always  -
  0   
198 Offline_Uncorrectable  0x0008  100  100  000  Old_age  Offline  -
  0   
199 UDMA_CRC_Error_Count  0x000a  200  200  000  Old_age  Always  -
  2   

Shell

   
SMART Error Log Version: 1   
ATA Error Count: 2   
  CR = Command Register [HEX]   
  FR = Features Register [HEX]   
  SC = Sector Count Register [HEX]   
  SN = Sector Number Register [HEX]   
  CL = Cylinder Low Register [HEX]   
  CH = Cylinder High Register [HEX]   
  DH = Device/Head Register [HEX]   
  DC = Device Command Register [HEX]   
  ER = Error register [HEX]   
  ST = Status register [HEX]   
Powered_Up_Time is measured from power on, and printed as   
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,   
SS=sec, and sss=millisec. It "wraps" after 49.710 days.   
   
Error 2 occurred at disk power-on lifetime: 1974 hours (82 days + 6 hours)   
  When the command that caused the error occurred, the device was active or idle
.   
   
  After command completion occurred, registers were:   
  ER ST SC SN CL CH DH   

Shell

  -- -- -- -- -- -- --   
  84 51 01 27 e1 6d 00  Error: ICRC, ABRT at LBA = 0x006de127 = 7201063   
   
  Commands leading to the command that caused the error were:   
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name   
  -- -- -- -- -- -- -- --  ----------------  --------------------   
  60 20 00 08 e1 6d 40 00  1d+03:06:22.004  READ FPDMA QUEUED   
  60 20 00 e8 e0 6d 40 00  1d+03:06:22.004  READ FPDMA QUEUED   
  60 20 00 c8 e0 6d 40 00  1d+03:06:22.004  READ FPDMA QUEUED   
  60 20 00 a8 e0 6d 40 00  1d+03:06:22.002  READ FPDMA QUEUED   
  60 20 00 88 e0 6d 40 00  1d+03:06:22.001  READ FPDMA QUEUED   
   
Error 1 occurred at disk power-on lifetime: 1965 hours (81 days + 21 hours)   
  When the command that caused the error occurred, the device was active or idle
.   
   
  After command completion occurred, registers were:   
  ER ST SC SN CL CH DH   
  -- -- -- -- -- -- --   
  84 51 01 df 2c 98 0d  Error: ICRC, ABRT at LBA = 0x0d982cdf = 228076767   
   
  Commands leading to the command that caused the error were:   
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name   
 -- -- -- -- -- -- -- --  ----------------  --------------------   
  60 20 18 e0 2c 98 40 00  18:14:52.246  READ FPDMA QUEUED   
  60 20 10 c0 2c 98 40 00  18:14:52.246  READ FPDMA QUEUED   
  60 20 08 a0 2c 98 40 00  18:14:52.246  READ FPDMA QUEUED   
  60 20 00 80 2c 98 40 00  18:14:52.246  READ FPDMA QUEUED   
  60 08 00 48 e8 58 40 00  18:14:52.246  READ FPDMA QUEUED   
   
SMART Self-test log structure revision number 1   
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA
_of_first_error   
# 1  Extended offline  Interrupted (host reset)  60%  1985  -  
# 2  Short offline  Completed without error  00%  1057  -  
# 3  Short offline  Completed without error  00%  1015  -  
# 4  Short offline  Completed without error  00%  967  -  
# 5  Short offline  Completed without error  00%  919  -  
# 6  Short offline  Completed without error  00%  871  -  
# 7  Short offline  Completed without error  00%  823  -  
# 8  Short offline  Completed without error  00%  775  -  
# 9  Extended offline  Completed without error  00%  762  -  
#10  Short offline  Completed without error  00%  727  -  
#11  Short offline  Completed without error  00%  679  -  
#12  Short offline  Completed without error  00%  633  -  
#13  Short offline  Completed without error  00%  585  -
#14  Short offline  Completed without error  00%  561  -  
#15  Short offline  Completed without error  00%  513  -  
   
SMART Selective self-test log data structure revision number 1   
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS   
  1  0  0  Not_testing   
  2  0  0  Not_testing   
  3  0  0  Not_testing   
  4  0  0  Not_testing   
  5  0  0  Not_testing   
Selective self-test flags (0x0):   
  After scanning selected spans, do NOT read-scan remainder of disk.   
If Selective self-test is pending on power-up, resume after 0 minute delay.   
   
[root@freenas ~]# 


Please let me know any further input. I will update as appropriate.

Thanks,

e
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, keeping cables as short as possible (and always below one meter) helps. Otherwise, it's just a matter of luck.
 

esamett

Patron
Joined
May 28, 2011
Messages
345
Dagnabbit...
Same cable location that I replaced before. I hope its not the HBA. I will swap spare cable and reroute.

Looking forward, can you suggest an affordable case (used is fine) that fits 22 drives. I have two nice ATX PSU and lots of quiet 12/14 cm fans to swap out the server "jet engines.":D
 

Aitor

Cadet
Joined
Nov 18, 2014
Messages
4
Good Afternoon:

I have freenas 9.2.1.8 version, and same error. This was not happened with version 8.3.
I have upgrade my lsi sas adapter firmware to the last version and freenas mpslsi driver also, but still has problems.

Thanks.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, if you upgraded to the latest version you upgraded too far. The firmware version should match the driver version. This has been discussed to death in the forums. Please search the forums if you are interested in more information.
 

Aitor

Cadet
Joined
Nov 18, 2014
Messages
4
Yeah, I read it before than I upgrade the firmware but I have upgrade the freenas driver, and i still have the problem.

upload_2014-11-18_17-49-9.png
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you upgraded the FreeNAS driver you are now in the realm of totally unsupported and on your own. We don't support custom builds of FreeNAS with custom drivers. Sorry. The p16 is used because it's the most stable and most tested version out for FreeBSD.
 

RobertNAS

Cadet
Joined
Jan 16, 2015
Messages
1
If you upgraded the FreeNAS driver you are now in the realm of totally unsupported and on your own. We don't support custom builds of FreeNAS with custom drivers. Sorry. The p16 is used because it's the most stable and most tested version out for FreeBSD.

Hi, I'm a new face here, and a n00b when it comes to FreeBSD/FreeNAS. I do have a science and engineering BS, and I've been working as a software developer and systems engineer for almost 10 years. So I'm liking the challenge of learning FreeNAS. I've necro'd this thread to pile on my recent experience with LSI drivers and to acknowledge the point made by cyberjock.

Downgrading firmware on a new LSI|9201-16E naturally goes against everything I've been taught in the Windows Server domain. So, I ended up "experimenting" with the p20 firmware and the p20 driver in FreeNAS 9.3. This didn't work out so good. It seems to have caused a few thousand UDMA_CRC_Error_Count ticks on every drive, and even caused the zpool to fault on two different drives, though not at the same time. This all happened in the first couple days of building and testing the system.

One of the many signs of trouble were a ton of SCSI status errors in the logs. Here's an example of the log spam:
Code:
(da1:mpslsi0:0:1:0): CAM status: SCSI Status Error
(da1:mpslsi0:0:1:0): SCSI status: Check Condition
(da1:mpslsi0:0:1:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
(da1:mpslsi0:0:1:0): Retrying command (per sense data)
(da1:mpslsi0:0:1:0): READ(10). CDB: 28 00 08 ac d3 e8 00 00 78 00 length 61440 SMID 919 terminated ioc 804b scsi 0 state 0 xfer 0
(da1:mpslsi0:0:1:0): READ(10). CDB: 28 00 08 ac e0 40 00 00 20 00


Bottom line, if you've come across this thread with the same issues, get the p16 firmware and use the included driver. Google whatever it takes to figure out how to flash a downgrade. What finally worked for me was simply creating a DOS boot disk with Rufio on a USB drive and then downloading the p16 firmware package for MSDOS & Windows from the LSI archives.

From the DOS prompt, it was two simple commands:
Code:
sas2flsh.exe -o -e 6
sas2flsh.exe -o -f 9201-16e.bin -b mptsas2.rom


Other attempts to downgrade the firmware from various FreeBSD/Linux/UEFI shells with their respective versions of the sas2flash tool failed for one reason or another. "Go DOS, or go home," I guess.
 

isamudysan

Dabbler
Joined
Feb 25, 2015
Messages
11
i'm currently getting the same errors as stated on here. i'll do scrub and smart test. if that all checks out then i'll downgrade. i have a supermicro x9scm mobo, and after researching prior to cross flashing my m1015 card, the flash via uefi was fine. after cross flashing with p20 fw, had a problem post install as i got a alert from my nas about firmware not matching a driver. after researching it, i came across this: https://bugs.freenas.org/issues/6678. i followed phoenix one's post in creating a tunable, and the alert did go away. since then never had any issues till recently after a few updates that i started seeing these errors. so, i guess i'll need to delete the tunable, the mpslsi.ko, downgrade the firmware and see what happens, right?

i hope that downgrading via uefi will be the same, hopefully. i'll keep robertnas' last statement, "Other attempts to downgrade the firmware from various FreeBSD/Linux/UEFI shells with their respective versions of the sas2flash tool failed for one reason or another. 'Go DOS, or go home,' I guess," in mind.

EDIT 1 02/26/2015
i downgraded from p20 drivers to p16 -- had to google the p16 drivers, and installers (just in case). took awhile to find the much needed files since i think the files got archived by avago (fka lsi). in any case, the downgrade so far has worked and i have not seen the aforementioned error since then. i will, however, keep monitoring if it'll pop up. and, yes, i did disable the p20 drivers via tunable. also, i did scrub and do a smart test and everything prior to the downgrade was fine -- it's a new build after all...only been like 6 days old lol. hope this helps. a great thanks to the freenas community :D

EDIT 2:
i also note that with the p20 drivers my cpu and memory load was very high when no one was accessing it. with the downgrade to p16 everything sort of leveled off to normalcy. thank you freenas community :D
 
Last edited:

esamett

Patron
Joined
May 28, 2011
Messages
345
it was a bad cable for me. The HBA works great with new cable and p16 firmware.
 
Status
Not open for further replies.
Top