unreadable (pending) sectors?

Status
Not open for further replies.

kamal juck

Dabbler
Joined
May 13, 2014
Messages
18
Hello, I'm bit of a newbie.

Today I got a new message that I don't understand. "Device: /dev/ada4, 80 Currently unreadable (pending) sectors"

Is the message informing me that there are 80 sectors that are unreadable? if so is that a lot? (doesn't seem like a lot on 3TB HHD).

On my system I have 9 physical drives in 3 vaults, it seems half of the drives have some type of SMART error, when I use the "smartctl' command. All of the drives are less then year old and were purchased in three lots as I grew my system. I don't know how much I should be concerned. (just in case I need to swap out the drive, I just ordered a replacement drive).

So can someone please point me to a "simple" explanation on how to understand this listing and error message. Or post a reply.

Thanks in advance.
K


the following is the listing from "smartctl'

    1. [root@freenas] ~# smartctl -a /dev/ada4

    2. smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p13 amd64] (local build)

    3. Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org



    4. === START OF INFORMATION SECTION ===

    5. Model Family: Toshiba 3.5" HDD DT01ACA...

    6. Device Model: TOSHIBA DT01ACA300

    7. Serial Number: Y3UDK86GS

    8. LU WWN Device Id: 5 000039 ff4d3ccc4

    9. Firmware Version: MX6OABB0

    10. User Capacity: 3,000,592,982,016 bytes [3.00 TB]

    11. Sector Sizes: 512 bytes logical, 4096 bytes physical

    12. Rotation Rate: 7200 rpm

    13. Form Factor: 3.5 inches

    14. Device is: In smartctl database [for details use: -P show]

    15. ATA Version is: ATA8-ACS T13/1699-D revision 4

    16. SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

    17. Local Time is: Sat May 16 20:04:25 2015 EDT

    18. SMART support is: Available - device has SMART capability.

    19. SMART support is: Enabled



    20. === START OF READ SMART DATA SECTION ===

    21. SMART overall-health self-assessment test result: PASSED



    22. General SMART Values:

    23. Offline data collection status: (0x84) Offline data collection activity

    24. was suspended by an interrupting command from host.

    25. Auto Offline Data Collection: Enabled.

    26. Self-test execution status: ( 33) The self-test routine was interrupted

    27. by the host with a hard or soft reset.

    28. Total time to complete Offline

    29. data collection: (22078) seconds.

    30. Offline data collection

    31. capabilities: (0x5b) SMART execute Offline immediate.

    32. Auto Offline data collection on/off support.

    33. Suspend Offline collection upon new

    34. command.

    35. Offline surface scan supported.

    36. Self-test supported.

    37. No Conveyance Self-test supported.

    38. Selective Self-test supported.

    39. SMART capabilities: (0x0003) Saves SMART data before entering

    40. power-saving mode.

    41. Supports SMART auto save timer.

    42. Error logging capability: (0x01) Error logging supported.

    43. General Purpose Logging supported.

    44. Short self-test routine

    45. recommended polling time: ( 1) minutes.

    46. Extended self-test routine

    47. recommended polling time: ( 368) minutes.

    48. SCT capabilities: (0x003d) SCT Status supported.

    49. SCT Error Recovery Control supported.

    50. SCT Feature Control supported.

    51. SCT Data Table supported.



    52. SMART Attributes Data Structure revision number: 16

    53. Vendor Specific SMART Attributes with Thresholds:

    54. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

    55. 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0

    56. 2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline - 69

    57. 3 Spin_Up_Time 0x0007 177 177 024 Pre-fail Always - 337 (Average 311)

    58. 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 76

    59. 5 Reallocated_Sector_Ct 0x0033 091 091 005 Pre-fail Always - 334

    60. 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0

    61. 8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline - 33

    62. 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 7967

    63. 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0

    64. 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 76

    65. 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 280

    66. 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 280

    67. 194 Temperature_Celsius 0x0002 153 153 000 Old_age Always - 39 (Min/Max 21/43)

    68. 196 Reallocated_Event_Count 0x0032 084 084 000 Old_age Always - 455

    69. 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 80

    70. 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0

    71. 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0



    72. SMART Error Log Version: 1

    73. ATA Error Count: 23 (device log contains only the most recent five errors)

    74. CR = Command Register [HEX]

    75. FR = Features Register [HEX]

    76. SC = Sector Count Register [HEX]

    77. SN = Sector Number Register [HEX]

    78. CL = Cylinder Low Register [HEX]

    79. CH = Cylinder High Register [HEX]

    80. DH = Device/Head Register [HEX]

    81. DC = Device Command Register [HEX]

    82. ER = Error register [HEX]

    83. ST = Status register [HEX]

    84. Powered_Up_Time is measured from power on, and printed as

    85. DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

    86. SS=sec, and sss=millisec. It "wraps" after 49.710 days.



    87. Error 23 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)

    88. When the command that caused the error occurred, the device was active or idle.



    89. After command completion occurred, registers were:

    90. ER ST SC SN CL CH DH

    91. -- -- -- -- -- -- --

    92. 40 51 58 a8 6a 79 0c Error: UNC at LBA = 0x0c796aa8 = 209283752



    93. Commands leading to the command that caused the error were:

    94. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

    95. -- -- -- -- -- -- -- -- ---------------- --------------------

    96. 60 00 00 00 6b 79 40 00 3d+21:24:07.350 READ FPDMA QUEUED

    97. 60 00 f8 00 6a 79 40 00 3d+21:24:04.245 READ FPDMA QUEUED

    98. 60 00 f0 00 69 79 40 00 3d+21:24:04.137 READ FPDMA QUEUED

    99. 60 00 e8 00 68 79 40 00 3d+21:24:03.696 READ FPDMA QUEUED

    100. 60 00 e0 00 67 79 40 00 3d+21:24:03.604 READ FPDMA QUEUED



    101. Error 22 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)

    102. When the command that caused the error occurred, the device was active or idle.



    103. After command completion occurred, registers were:

    104. ER ST SC SN CL CH DH

    105. -- -- -- -- -- -- --

    106. 40 51 e0 98 2e 79 0c Error: UNC at LBA = 0x0c792e98 = 209268376



    107. Commands leading to the command that caused the error were:

    108. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

    109. -- -- -- -- -- -- -- -- ---------------- --------------------

    110. 60 00 58 78 2f 79 40 00 3d+21:23:45.781 READ FPDMA QUEUED

    111. 60 00 50 78 2e 79 40 00 3d+21:23:45.781 READ FPDMA QUEUED

    112. 2f 00 01 10 00 00 00 00 3d+21:23:45.709 READ LOG EXT

    113. 60 00 50 78 2f 79 40 00 3d+21:23:41.873 READ FPDMA QUEUED

    114. 60 00 48 78 2e 79 40 00 3d+21:23:41.873 READ FPDMA QUEUED



    115. Error 21 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)

    116. When the command that caused the error occurred, the device was active or idle.



    117. After command completion occurred, registers were:

    118. ER ST SC SN CL CH DH

    119. -- -- -- -- -- -- --

    120. 40 51 e0 98 2e 79 0c Error: UNC at LBA = 0x0c792e98 = 209268376



    121. Commands leading to the command that caused the error were:

    122. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

    123. -- -- -- -- -- -- -- -- ---------------- --------------------

    124. 60 00 50 78 2f 79 40 00 3d+21:23:41.873 READ FPDMA QUEUED

    125. 60 00 48 78 2e 79 40 00 3d+21:23:41.873 READ FPDMA QUEUED

    126. 2f 00 01 10 00 00 00 00 3d+21:23:41.791 READ LOG EXT

    127. 60 00 48 78 2f 79 40 00 3d+21:23:34.852 READ FPDMA QUEUED

    128. 60 00 40 78 2e 79 40 00 3d+21:23:34.852 READ FPDMA QUEUED



    129. Error 20 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)

    130. When the command that caused the error occurred, the device was active or idle.



    131. After command completion occurred, registers were:

    132. ER ST SC SN CL CH DH

    133. -- -- -- -- -- -- --

    134. 40 51 e0 98 2e 79 0c Error: UNC at LBA = 0x0c792e98 = 209268376



    135. Commands leading to the command that caused the error were:

    136. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

    137. -- -- -- -- -- -- -- -- ---------------- --------------------

    138. 60 00 48 78 2f 79 40 00 3d+21:23:34.852 READ FPDMA QUEUED

    139. 60 00 40 78 2e 79 40 00 3d+21:23:34.852 READ FPDMA QUEUED

    140. 2f 00 01 10 00 00 00 00 3d+21:23:34.778 READ LOG EXT

    141. 60 00 40 78 2f 79 40 00 3d+21:23:27.747 READ FPDMA QUEUED

    142. 60 00 38 78 2e 79 40 00 3d+21:23:27.747 READ FPDMA QUEUED



    143. Error 19 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)

    144. When the command that caused the error occurred, the device was active or idle.



    145. After command completion occurred, registers were:

    146. ER ST SC SN CL CH DH

    147. -- -- -- -- -- -- --

    148. 40 51 f0 88 2e 79 0c Error: UNC at LBA = 0x0c792e88 = 209268360



    149. Commands leading to the command that caused the error were:

    150. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

    151. -- -- -- -- -- -- -- -- ---------------- --------------------

    152. 60 00 40 78 2f 79 40 00 3d+21:23:27.747 READ FPDMA QUEUED

    153. 60 00 38 78 2e 79 40 00 3d+21:23:27.747 READ FPDMA QUEUED

    154. 2f 00 01 10 00 00 00 00 3d+21:23:27.665 READ LOG EXT

    155. 60 00 38 78 2f 79 40 00 3d+21:23:21.211 READ FPDMA QUEUED

    156. 60 00 30 78 2e 79 40 00 3d+21:23:20.380 READ FPDMA QUEUED



    157. SMART Self-test log structure revision number 1

    158. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

    159. # 1 Extended offline Interrupted (host reset) 10% 7960 -

    160. # 2 Short offline Completed without error 00% 7952 -

    161. # 3 Extended offline Completed without error 00% 7840 -

    162. # 4 Short offline Completed without error 00% 7832 -

    163. # 5 Extended offline Completed without error 00% 7720 -

    164. # 6 Short offline Completed without error 00% 7712 -

    165. # 7 Extended offline Completed without error 00% 7600 -

    166. # 8 Short offline Completed without error 00% 7592 -

    167. # 9 Extended offline Completed without error 00% 7480 -

    168. #10 Short offline Completed without error 00% 7472 -

    169. #11 Extended offline Completed without error 00% 7360 -

    170. #12 Short offline Completed without error 00% 7352 -

    171. #13 Extended offline Completed without error 00% 7240 -

    172. #14 Short offline Completed without error 00% 7232 -

    173. #15 Extended offline Completed without error 00% 7120 -

    174. #16 Short offline Completed without error 00% 7112 -

    175. #17 Extended offline Completed without error 00% 7000 -

    176. #18 Short offline Completed without error 00% 6992 -

    177. #19 Extended offline Completed without error 00% 6880 -

    178. #20 Short offline Completed without error 00% 6872 -

    179. #21 Extended offline Completed without error 00% 6856 -



    180. SMART Selective self-test log data structure revision number 1

    181. SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

    182. 1 0 0 Not_testing

    183. 2 0 0 Not_testing

    184. 3 0 0 Not_testing

    185. 4 0 0 Not_testing

    186. 5 0 0 Not_testing

    187. Selective self-test flags (0x0):

    188. After scanning selected spans, do NOT read-scan remainder of disk.

    189. If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
It should show as a red alert.. Replace that drive it will die eventually or Have worse problems.. I have an old green with a bad sector even one drives me nuts..

Sent from my SGH-I257M using Tapatalk 2
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Agreed, that drive has some serious issues. Attribute #5 shows that it has already reallocated 334 sectors, and attribute #197 shows that there are 80 more sectors waiting to be reallocated. It's not a large percentage of a 3TB drive but it's still way to many for comfort in my opinion.

The SMART output also shows 23 command errors. I don't know much about that, but it seems like a lot. Most of my drives show none, one or two show single digits (and I have a lot more than the 6 drives you see in my signature).

By the way, if you're using drives in groups of 3 per vdev, that implies each vdev is set up as RAIDZ1. That's a pretty risk configuration for 3TB drives. Usually it's not recommended to use RAIDZ1 for drives any larger than 1TB, due to the high chance of unrecoverable errors while resilvering.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
The drive in question is *WAY* past the point that I would have thrown it in the garbage.

You say "half" your drives show a problem like this? That's certainly very strange. Why don't we have a look at things?

Can you do: "smartctl -x /dev/adaX" for each appropriate X in your pool, upload the results to pastebin (or similar), and post the links for us so we can give everything a good once over?
 

kamal juck

Dabbler
Joined
May 13, 2014
Messages
18
First let me thank you guys for your help.

Hope this isn't a red herring, I noticed something odd. If you look at the table, the three drives on my IBM M1015 card all reported an error that same hour. This seems very odd to me. (fyi the drives are in a external cage.)

Do you know if the error message "Device: /dev/ada4, 80 Currently unreadable (pending) sectors" has changed recently in one of the updates, or has it always been around in previous versions of Freenas? I just want to know if I missed a crucial warning in the past.

Thanks again
K

Dsk Last
Size SMART Error
Drive (TB) make spool error # hour Connector
---- ---- ---- ---- ---- ---- ----
ada0 3 HGST vaut1 0 0 internal Sata conector
ada1 3 HGST vaut1 0 0 internal Sata conector
ada2 3 HGST vaut1 0 0 internal Sata conector

ada3 3 HGST vault2 0 0 internal Sata conector
ada4 3 HGST vault2 23 7812 internal Sata conector
ada5 3 HGST vault2 0 0 internal Sata conector

da0 4 HGST vault2 73 3964 IBM ServerRaid M1015
da1 4 HGST vault2 74 3964 IBM ServerRaid M1015
da2 4 HGST vault2 70 3964 IBM ServerRaid M1015



here are links for the smartctl -a /dev/xxxx

ada0 http://pastebin.com/0hRMp49B
ada1 http://pastebin.com/k08qZBVJ
ada2 http://pastebin.com/23PaNF2P

ada3 http://pastebin.com/vzkVWU1f
ada4 http://pastebin.com/cmeDvXCj
ada5 http://pastebin.com/5qtatgj1

da0 http://pastebin.com/bkZMrEbp

da1 http://pastebin.com/zPVc8dCR
da2 http://pastebin.com/PPf5agq7




My FreeNAS System as of May 2015
FreeNAS-9.3-STABLE-201505130355

Motherboard: Asus Z87-A
CPU: Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Memory: 16229MB

Cards
- IBM ServeRAID M1015


Harddrives
- 6 HGTS 3TB each ????
- 3 HGTS 4TB each
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
You'll note I asked you to do a smartctl -x

Not a smartctl -a. But that's OK, don't worry about it. I'll work with these.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
First let me thank you guys for your help.

Hope this isn't a red herring, I noticed something odd. If you look at the table, the three drives on my IBM M1015 card all reported an error that same hour. This seems very odd to me. (fyi the drives are in a external cage.)

Do you know if the error message "Device: /dev/ada4, 80 Currently unreadable (pending) sectors" has changed recently in one of the updates, or has it always been around in previous versions of Freenas? I just want to know if I missed a crucial warning in the past.

Thanks again
K

Dsk Last
Size SMART Error
Drive (TB) make spool error # hour Connector
---- ---- ---- ---- ---- ---- ----
ada0 3 HGST vaut1 0 0 internal Sata conector
ada1 3 HGST vaut1 0 0 internal Sata conector
ada2 3 HGST vaut1 0 0 internal Sata conector

ada3 3 HGST vault2 0 0 internal Sata conector
ada4 3 HGST vault2 23 7812 internal Sata conector
ada5 3 HGST vault2 0 0 internal Sata conector

da0 4 HGST vault2 73 3964 IBM ServerRaid M1015
da1 4 HGST vault2 74 3964 IBM ServerRaid M1015
da2 4 HGST vault2 70 3964 IBM ServerRaid M1015



here are links for the smartctl -a /dev/xxxx

ada0 http://pastebin.com/0hRMp49B
ada1 http://pastebin.com/k08qZBVJ
ada2 http://pastebin.com/23PaNF2P

ada3 http://pastebin.com/vzkVWU1f
ada4 http://pastebin.com/cmeDvXCj
ada5 http://pastebin.com/5qtatgj1

da0 http://pastebin.com/bkZMrEbp

da1 http://pastebin.com/zPVc8dCR
da2 http://pastebin.com/PPf5agq7




My FreeNAS System as of May 2015
FreeNAS-9.3-STABLE-201505130355

Motherboard: Asus Z87-A
CPU: Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Memory: 16229MB

Cards
- IBM ServeRAID M1015


Harddrives
- 6 HGTS 3TB each ????
- 3 HGTS 4TB each

First of all, let me compliment you on an aggressive SMART maintenance regimen. You appear to have read the docs and posts :)


  • ada0 looks good
  • ada1 looks good
  • ada2 looks good
  • ada3 looks good
  • ada4 is completely toasted. Controller and surface issues. Garbage.
  • ada5 looks good
  • da0-da2 have a problem. This problem is almost CERTAINLY controller related. You have some kind of firmware mismatch or something. Your M1015 is *NOT* set up correctly. Are you using version 16 firmware and driver? Is it flashed to IT mode?
 

kamal juck

Dabbler
Joined
May 13, 2014
Messages
18
Hi DrKK

Ooops! I see that smartctl -x results in a different output from smartctl -a. Thanks for working with the info that I provided anyway.


re: ada4 is what trigged my post, so I'm glad I ordered a replacement drive which just arrived today.

I have a quick question about swapping out the drive. I've looked over: http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive.

The way that I understand the docs (link above ), it assumes that the drive has failed completely and is unreadable. My question is there a dynamic way of replacing the failing drive without taking it offline completely. OR in other words is there a way to have some safety margin during the time the new drive has been built-up and having some redundancy. My understanding is that it can take a day for the RAID to rebuild , I don't want to be up the creek if another drive fails during this time. Sorry, if I'm not making myself clear.

If the answer is: "I should have used Mirror", well I had to go with RAIDZ1 due to my limited pocketbook.



re: M1015

- first, the card was flashed to IT mode (100% sure about this).

- Looking over my notes, but can't be 100% sure that v16 of the firmware was installed, I didn't write that in my notes. I do recall going to the LSI web site and downloading some files, but not sure if that version was installed. Is there a way to find out which version I have installed for sure?

- Also, in the past I've received an alert from Freenas about a driver not up to date (or something like that, not sure what the message was). Now I can't find the emails from the server, but when I first got the message I googled the alert message and learned that it could be ignored. Maybe I got some bad info.

Thanks again for your help.
K
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Hi DrKK

Ooops! I see that smartctl -x results in a different output from smartctl -a. Thanks for working with the info that I provided anyway.


re: ada4 is what trigged my post, so I'm glad I ordered a replacement drive which just arrived today.

I have a quick question about swapping out the drive. I've looked over: http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive.

The way that I understand the docs (link above ), it assumes that the drive has failed completely and is unreadable. My question is there a dynamic way of replacing the failing drive without taking it offline completely. OR in other words is there a way to have some safety margin during the time the new drive has been built-up and having some redundancy. My understanding is that it can take a day for the RAID to rebuild , I don't want to be up the creek if another drive fails during this time. Sorry, if I'm not making myself clear.

Yes, there is.
http://doc.freenas.org/9.3/freenas_storage.html#replacing-drives-to-grow-a-zfs-pool
The autoexpand stuff is irrelevant in your case.

re: M1015

- first, the card was flashed to IT mode (100% sure about this).

- Looking over my notes, but can't be 100% sure that v16 of the firmware was installed, I didn't write that in my notes. I do recall going to the LSI web site and downloading some files, but not sure if that version was installed. Is there a way to find out which version I have installed for sure?

- Also, in the past I've received an alert from Freenas about a driver not up to date (or something like that, not sure what the message was). Now I can't find the emails from the server, but when I first got the message I googled the alert message and learned that it could be ignored. Maybe I got some bad info.
So it seems.

The GUI should be displaying a warning if there's a version mismatch between the mps driver and the card's firmware. The driver is P16, so you must have the card at P16, as well.

sas2flash -listall (or similar, dunno exact syntax) should list the mps controllers on your system. It should tell you the current firmware version.
 

kamal juck

Dabbler
Joined
May 13, 2014
Messages
18
I installed the drive and it is in the process of resilvereing and a zpool status gives me following listing. Does this look ok so far?
  1. pool: vault2
  2. state: DEGRADED
  3. status: One or more devices is currently being resilvered. The pool will
  4. continue to function, possibly in a degraded state.
  5. action: Wait for the resilver to complete.
  6. scan: resilver in progress since Mon May 18 19:06:18 2015
  7. 123G scanned out of 7.24T at 142M/s, 14h37m to go
  8. 40.6G resilvered, 1.65% done
  9. config:
  10. NAME STATE READ WRITE CKSUM
  11. vault2 DEGRADED 0 0 0
  12. raidz1-0 DEGRADED 0 0 0
  13. gptid/863543f1-eeb0-11e3-86d5-e03f49ea0e7e ONLINE 0 0 0
  14. gptid/868a7e4d-eeb0-11e3-86d5-e03f49ea0e7e ONLINE 0 0 0
  15. replacing-2 OFFLINE 0 0 0
  16. 4759880931749812709 OFFLINE 0 0 0 was /dev/gptid/86df6a7b-eeb0-11e3-86d5-e03f49ea0e7e
  17. gptid/7cf2e64a-fdb2-11e4-aa9c-e03f49ea0e7e ONLINE 0 0 0 (resilvering)

Ericloewe, you were right on the mark with your syntax for sas2flash -listall

here is what I got.

  1. ~# sas2flash -listall

  2. LSI Corporation SAS2 Flash Utility
  3. Version 16.00.00.00 (2013.03.01)
  4. Copyright (c) 2008-2013 LSI Corporation. All rights reserved

  5. Adapter Selected is a LSI SAS: SAS2008(B1)

  6. Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
  7. ----------------------------------------------------------------------------
  8. 0 SAS2008(B1) 20.00.00.00 14.00.00.08 07.39.00.00 00:01:00:00

  9. Finished Processing Commands Successfully.
  10. Exiting SAS2Flash.

  11. [root@freenas] ~#


Thanks guys (gals?) for the hand holding
K
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Right. So, that resilver looks OK. Resilvers tend to speed up as they go, so you'll find that clock time for completion should in most cases be sooner than you were originally told.

Also, I don't mess with HBA's, but it looks to me like you have version 16 versus version 20 mismatch, which is precisely what I expected your problem would be based upon listening to Cyberjock for the past year ;)
 

kamal juck

Dabbler
Joined
May 13, 2014
Messages
18
DrKK
So are you saying that I have to upgrade the LSI SAS2 Flash Utility to v2o as well, right?

As I write this I'm up to 36.97% resilvered.
cheers,
K
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
DrKK
So are you saying that I have to upgrade the LSI SAS2 Flash Utility to v2o as well, right?

As I write this I'm up to 36.97% resilvered.
cheers,
K
Negative sir. Now, I am not an LSI guy. I just know what I hear from the people that know what they're talking about: The best thing to do is have BOTH things on version 16. I don't know why, but that is definitely considered best for now, even though version 20 exists.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Negative sir. Now, I am not an LSI guy. I just know what I hear from the people that know what they're talking about: The best thing to do is have BOTH things on version 16. I don't know why, but that is definitely considered best for now, even though version 20 exists.

In a nutshell, LSI only supports matched versions. The stable FreeBSD driver has been at P16 for a while now, do that's the firmware version you need at the moment.
There's talk about moving to P20, but I've yet to hear anything official.

What bugs me is that the GUI isn't giving you the warning and blinking light it should be.
 

kamal juck

Dabbler
Joined
May 13, 2014
Messages
18
Ok my drive is installed and resilvered. I'm not sure if I needed to scrubb it, just in case,

Ericloewe, after I rebooted the server and when I logged into the GUI, I saw this message "WARNING: Firmware version 20 does not match driver version 16 for /dev/mps0" I assume this the LSI mismatch you guys have been talking about. My server runs 24x7, maybe the alert is generated when the system is rebooted, could that be reason I wasn't able to find it before.

From the user Docs chapt 22. Alert http://doc.freenas.org/9.3/freenas_alert.html I came across the following but not sure all the steps. I don't want to mess it up.

"An alert will also be generated when the LSI HBA firmware version does not match the driver version. To resolve this alert, download the IT (integrated target) firmware, not the IR (integrated RAID) firmware, from the LSI website. Then, specify the name of the firmware image and bios as well as the controller to flash:

sas2flash -f firmwareimagename -b biosname -c controllernumber

When finished, reboot the system. The new firmware version should appear in the system messages and the alert will be cleared."

Searching for sas2flash I see that others have had this problem. but I'm not sure if there solutions pertain to me.

Has anyone here (who has been following my questions) gone through this process of downgrading a card, point to a link that is similar to my particular situation with step-by-step instruction, or should I post a new question. Please remember that I'm newbie, and don't want to brick the card or lose my data.

Thanks for all your help in getting through my drive replacement.
Cheers,
K
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It should be the exact same process. Just flash P16.

I'd rather do it from UEFI, though (detailed instructions elsewhere on the forum).
 
Status
Not open for further replies.
Top