unreadable (pending) sectors?

kamal juck · May 16, 2015

Hello, I'm bit of a newbie.

Today I got a new message that I don't understand. "Device: /dev/ada4, 80 Currently unreadable (pending) sectors"

Is the message informing me that there are 80 sectors that are unreadable? if so is that a lot? (doesn't seem like a lot on 3TB HHD).

On my system I have 9 physical drives in 3 vaults, it seems half of the drives have some type of SMART error, when I use the "smartctl' command. All of the drives are less then year old and were purchased in three lots as I grew my system. I don't know how much I should be concerned. (just in case I need to swap out the drive, I just ordered a replacement drive).

So can someone please point me to a "simple" explanation on how to understand this listing and error message. Or post a reply.

Thanks in advance.
K

the following is the listing from "smartctl'

1. [root@freenas] ~# smartctl -a /dev/ada4
3. smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p13 amd64] (local build)
5. Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
9. === START OF INFORMATION SECTION ===
11. Model Family: Toshiba 3.5" HDD DT01ACA...
13. Device Model: TOSHIBA DT01ACA300
15. Serial Number: Y3UDK86GS
17. LU WWN Device Id: 5 000039 ff4d3ccc4
19. Firmware Version: MX6OABB0
21. User Capacity: 3,000,592,982,016 bytes [3.00 TB]
23. Sector Sizes: 512 bytes logical, 4096 bytes physical
25. Rotation Rate: 7200 rpm
27. Form Factor: 3.5 inches
29. Device is: In smartctl database [for details use: -P show]
31. ATA Version is: ATA8-ACS T13/1699-D revision 4
33. SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
35. Local Time is: Sat May 16 20:04:25 2015 EDT
37. SMART support is: Available - device has SMART capability.
39. SMART support is: Enabled
43. === START OF READ SMART DATA SECTION ===
45. SMART overall-health self-assessment test result: PASSED
49. General SMART Values:
51. Offline data collection status: (0x84) Offline data collection activity
53. was suspended by an interrupting command from host.
55. Auto Offline Data Collection: Enabled.
57. Self-test execution status: ( 33) The self-test routine was interrupted
59. by the host with a hard or soft reset.
61. Total time to complete Offline
63. data collection: (22078) seconds.
65. Offline data collection
67. capabilities: (0x5b) SMART execute Offline immediate.
69. Auto Offline data collection on/off support.
71. Suspend Offline collection upon new
73. command.
75. Offline surface scan supported.
77. Self-test supported.
79. No Conveyance Self-test supported.
81. Selective Self-test supported.
83. SMART capabilities: (0x0003) Saves SMART data before entering
85. power-saving mode.
87. Supports SMART auto save timer.
89. Error logging capability: (0x01) Error logging supported.
91. General Purpose Logging supported.
93. Short self-test routine
95. recommended polling time: ( 1) minutes.
97. Extended self-test routine
99. recommended polling time: ( 368) minutes.
101. SCT capabilities: (0x003d) SCT Status supported.
103. SCT Error Recovery Control supported.
105. SCT Feature Control supported.
107. SCT Data Table supported.
111. SMART Attributes Data Structure revision number: 16
113. Vendor Specific SMART Attributes with Thresholds:
115. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
117. 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
119. 2 Throughput_Performance 0x0005 140 140 054 Pre-fail Offline - 69
121. 3 Spin_Up_Time 0x0007 177 177 024 Pre-fail Always - 337 (Average 311)
123. 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 76
125. 5 Reallocated_Sector_Ct 0x0033 091 091 005 Pre-fail Always - 334
127. 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
129. 8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline - 33
131. 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 7967
133. 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
135. 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 76
137. 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 280
139. 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 280
141. 194 Temperature_Celsius 0x0002 153 153 000 Old_age Always - 39 (Min/Max 21/43)
143. 196 Reallocated_Event_Count 0x0032 084 084 000 Old_age Always - 455
145. 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 80
147. 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
149. 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
153. SMART Error Log Version: 1
155. ATA Error Count: 23 (device log contains only the most recent five errors)
157. CR = Command Register [HEX]
159. FR = Features Register [HEX]
161. SC = Sector Count Register [HEX]
163. SN = Sector Number Register [HEX]
165. CL = Cylinder Low Register [HEX]
167. CH = Cylinder High Register [HEX]
169. DH = Device/Head Register [HEX]
171. DC = Device Command Register [HEX]
173. ER = Error register [HEX]
175. ST = Status register [HEX]
177. Powered_Up_Time is measured from power on, and printed as
179. DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
181. SS=sec, and sss=millisec. It "wraps" after 49.710 days.
185. Error 23 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)
187. When the command that caused the error occurred, the device was active or idle.
191. After command completion occurred, registers were:
193. ER ST SC SN CL CH DH
195. -- -- -- -- -- -- --
197. 40 51 58 a8 6a 79 0c Error: UNC at LBA = 0x0c796aa8 = 209283752
201. Commands leading to the command that caused the error were:
203. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
205. -- -- -- -- -- -- -- -- ---------------- --------------------
207. 60 00 00 00 6b 79 40 00 3d+21:24:07.350 READ FPDMA QUEUED
209. 60 00 f8 00 6a 79 40 00 3d+21:24:04.245 READ FPDMA QUEUED
211. 60 00 f0 00 69 79 40 00 3d+21:24:04.137 READ FPDMA QUEUED
213. 60 00 e8 00 68 79 40 00 3d+21:24:03.696 READ FPDMA QUEUED
215. 60 00 e0 00 67 79 40 00 3d+21:24:03.604 READ FPDMA QUEUED
219. Error 22 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)
221. When the command that caused the error occurred, the device was active or idle.
225. After command completion occurred, registers were:
227. ER ST SC SN CL CH DH
229. -- -- -- -- -- -- --
231. 40 51 e0 98 2e 79 0c Error: UNC at LBA = 0x0c792e98 = 209268376
235. Commands leading to the command that caused the error were:
237. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
239. -- -- -- -- -- -- -- -- ---------------- --------------------
241. 60 00 58 78 2f 79 40 00 3d+21:23:45.781 READ FPDMA QUEUED
243. 60 00 50 78 2e 79 40 00 3d+21:23:45.781 READ FPDMA QUEUED
245. 2f 00 01 10 00 00 00 00 3d+21:23:45.709 READ LOG EXT
247. 60 00 50 78 2f 79 40 00 3d+21:23:41.873 READ FPDMA QUEUED
249. 60 00 48 78 2e 79 40 00 3d+21:23:41.873 READ FPDMA QUEUED
253. Error 21 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)
255. When the command that caused the error occurred, the device was active or idle.
259. After command completion occurred, registers were:
261. ER ST SC SN CL CH DH
263. -- -- -- -- -- -- --
265. 40 51 e0 98 2e 79 0c Error: UNC at LBA = 0x0c792e98 = 209268376
269. Commands leading to the command that caused the error were:
271. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
273. -- -- -- -- -- -- -- -- ---------------- --------------------
275. 60 00 50 78 2f 79 40 00 3d+21:23:41.873 READ FPDMA QUEUED
277. 60 00 48 78 2e 79 40 00 3d+21:23:41.873 READ FPDMA QUEUED
279. 2f 00 01 10 00 00 00 00 3d+21:23:41.791 READ LOG EXT
281. 60 00 48 78 2f 79 40 00 3d+21:23:34.852 READ FPDMA QUEUED
283. 60 00 40 78 2e 79 40 00 3d+21:23:34.852 READ FPDMA QUEUED
287. Error 20 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)
289. When the command that caused the error occurred, the device was active or idle.
293. After command completion occurred, registers were:
295. ER ST SC SN CL CH DH
297. -- -- -- -- -- -- --
299. 40 51 e0 98 2e 79 0c Error: UNC at LBA = 0x0c792e98 = 209268376
303. Commands leading to the command that caused the error were:
305. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
307. -- -- -- -- -- -- -- -- ---------------- --------------------
309. 60 00 48 78 2f 79 40 00 3d+21:23:34.852 READ FPDMA QUEUED
311. 60 00 40 78 2e 79 40 00 3d+21:23:34.852 READ FPDMA QUEUED
313. 2f 00 01 10 00 00 00 00 3d+21:23:34.778 READ LOG EXT
315. 60 00 40 78 2f 79 40 00 3d+21:23:27.747 READ FPDMA QUEUED
317. 60 00 38 78 2e 79 40 00 3d+21:23:27.747 READ FPDMA QUEUED
321. Error 19 occurred at disk power-on lifetime: 7812 hours (325 days + 12 hours)
323. When the command that caused the error occurred, the device was active or idle.
327. After command completion occurred, registers were:
329. ER ST SC SN CL CH DH
331. -- -- -- -- -- -- --
333. 40 51 f0 88 2e 79 0c Error: UNC at LBA = 0x0c792e88 = 209268360
337. Commands leading to the command that caused the error were:
339. CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
341. -- -- -- -- -- -- -- -- ---------------- --------------------
343. 60 00 40 78 2f 79 40 00 3d+21:23:27.747 READ FPDMA QUEUED
345. 60 00 38 78 2e 79 40 00 3d+21:23:27.747 READ FPDMA QUEUED
347. 2f 00 01 10 00 00 00 00 3d+21:23:27.665 READ LOG EXT
349. 60 00 38 78 2f 79 40 00 3d+21:23:21.211 READ FPDMA QUEUED
351. 60 00 30 78 2e 79 40 00 3d+21:23:20.380 READ FPDMA QUEUED
355. SMART Self-test log structure revision number 1
357. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
359. # 1 Extended offline Interrupted (host reset) 10% 7960 -
361. # 2 Short offline Completed without error 00% 7952 -
363. # 3 Extended offline Completed without error 00% 7840 -
365. # 4 Short offline Completed without error 00% 7832 -
367. # 5 Extended offline Completed without error 00% 7720 -
369. # 6 Short offline Completed without error 00% 7712 -
371. # 7 Extended offline Completed without error 00% 7600 -
373. # 8 Short offline Completed without error 00% 7592 -
375. # 9 Extended offline Completed without error 00% 7480 -
377. #10 Short offline Completed without error 00% 7472 -
379. #11 Extended offline Completed without error 00% 7360 -
381. #12 Short offline Completed without error 00% 7352 -
383. #13 Extended offline Completed without error 00% 7240 -
385. #14 Short offline Completed without error 00% 7232 -
387. #15 Extended offline Completed without error 00% 7120 -
389. #16 Short offline Completed without error 00% 7112 -
391. #17 Extended offline Completed without error 00% 7000 -
393. #18 Short offline Completed without error 00% 6992 -
395. #19 Extended offline Completed without error 00% 6880 -
397. #20 Short offline Completed without error 00% 6872 -
399. #21 Extended offline Completed without error 00% 6856 -
403. SMART Selective self-test log data structure revision number 1
405. SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
407. 1 0 0 Not_testing
409. 2 0 0 Not_testing
411. 3 0 0 Not_testing
413. 4 0 0 Not_testing
415. 5 0 0 Not_testing
417. Selective self-test flags (0x0):
419. After scanning selected spans, do NOT read-scan remainder of disk.
421. If Selective self-test is pending on power-up, resume after 0 minute delay.

Yatti420 · May 16, 2015

It should show as a red alert.. Replace that drive it will die eventually or Have worse problems.. I have an old green with a bad sector even one drives me nuts..

Sent from my SGH-I257M using Tapatalk 2

Robert Trevellyan · May 17, 2015

Agreed, that drive has some serious issues. Attribute #5 shows that it has already reallocated 334 sectors, and attribute #197 shows that there are 80 more sectors waiting to be reallocated. It's not a large percentage of a 3TB drive but it's still way to many for comfort in my opinion.

The SMART output also shows 23 command errors. I don't know much about that, but it seems like a lot. Most of my drives show none, one or two show single digits (and I have a lot more than the 6 drives you see in my signature).

By the way, if you're using drives in groups of 3 per vdev, that implies each vdev is set up as RAIDZ1. That's a pretty risk configuration for 3TB drives. Usually it's not recommended to use RAIDZ1 for drives any larger than 1TB, due to the high chance of unrecoverable errors while resilvering.

DrKK · May 17, 2015

The drive in question is *WAY* past the point that I would have thrown it in the garbage.

You say "half" your drives show a problem like this? That's certainly very strange. Why don't we have a look at things?

Can you do: "smartctl -x /dev/adaX" for each appropriate X in your pool, upload the results to pastebin (or similar), and post the links for us so we can give everything a good once over?

kamal juck · May 17, 2015

First let me thank you guys for your help.

Hope this isn't a red herring, I noticed something odd. If you look at the table, the three drives on my IBM M1015 card all reported an error that same hour. This seems very odd to me. (fyi the drives are in a external cage.)

Do you know if the error message "Device: /dev/ada4, 80 Currently unreadable (pending) sectors" has changed recently in one of the updates, or has it always been around in previous versions of Freenas? I just want to know if I missed a crucial warning in the past.

Thanks again
K

Dsk Last
Size SMART Error
Drive (TB) make spool error # hour Connector
---- ---- ---- ---- ---- ---- ----
ada0 3 HGST vaut1 0 0 internal Sata conector
ada1 3 HGST vaut1 0 0 internal Sata conector
ada2 3 HGST vaut1 0 0 internal Sata conector

ada3 3 HGST vault2 0 0 internal Sata conector
ada4 3 HGST vault2 23 7812 internal Sata conector
ada5 3 HGST vault2 0 0 internal Sata conector

da0 4 HGST vault2 73 3964 IBM ServerRaid M1015
da1 4 HGST vault2 74 3964 IBM ServerRaid M1015
da2 4 HGST vault2 70 3964 IBM ServerRaid M1015

here are links for the smartctl -a /dev/xxxx

ada0 http://pastebin.com/0hRMp49B
ada1 http://pastebin.com/k08qZBVJ
ada2 http://pastebin.com/23PaNF2P

ada3 http://pastebin.com/vzkVWU1f
ada4 http://pastebin.com/cmeDvXCj
ada5 http://pastebin.com/5qtatgj1

da0 http://pastebin.com/bkZMrEbp
da1 http://pastebin.com/zPVc8dCR
da2 http://pastebin.com/PPf5agq7

My FreeNAS System as of May 2015
FreeNAS-9.3-STABLE-201505130355

Motherboard: Asus Z87-A
CPU: Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Memory: 16229MB

Cards
- IBM ServeRAID M1015

Harddrives
- 6 HGTS 3TB each ????
- 3 HGTS 4TB each

DrKK · May 17, 2015

You'll note I asked you to do a smartctl -x

Not a smartctl -a. But that's OK, don't worry about it. I'll work with these.

DrKK · May 17, 2015

kamal juck said:
First let me thank you guys for your help.

Hope this isn't a red herring, I noticed something odd. If you look at the table, the three drives on my IBM M1015 card all reported an error that same hour. This seems very odd to me. (fyi the drives are in a external cage.)

Do you know if the error message "Device: /dev/ada4, 80 Currently unreadable (pending) sectors" has changed recently in one of the updates, or has it always been around in previous versions of Freenas? I just want to know if I missed a crucial warning in the past.

Thanks again
K

Dsk Last
Size SMART Error
Drive (TB) make spool error # hour Connector
---- ---- ---- ---- ---- ---- ----
ada0 3 HGST vaut1 0 0 internal Sata conector
ada1 3 HGST vaut1 0 0 internal Sata conector
ada2 3 HGST vaut1 0 0 internal Sata conector

ada3 3 HGST vault2 0 0 internal Sata conector
ada4 3 HGST vault2 23 7812 internal Sata conector
ada5 3 HGST vault2 0 0 internal Sata conector

da0 4 HGST vault2 73 3964 IBM ServerRaid M1015
da1 4 HGST vault2 74 3964 IBM ServerRaid M1015
da2 4 HGST vault2 70 3964 IBM ServerRaid M1015

here are links for the smartctl -a /dev/xxxx

ada0 http://pastebin.com/0hRMp49B
ada1 http://pastebin.com/k08qZBVJ
ada2 http://pastebin.com/23PaNF2P

ada3 http://pastebin.com/vzkVWU1f
ada4 http://pastebin.com/cmeDvXCj
ada5 http://pastebin.com/5qtatgj1

da0 http://pastebin.com/bkZMrEbp
da1 http://pastebin.com/zPVc8dCR
da2 http://pastebin.com/PPf5agq7

My FreeNAS System as of May 2015
FreeNAS-9.3-STABLE-201505130355

Motherboard: Asus Z87-A
CPU: Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Memory: 16229MB

Cards
- IBM ServeRAID M1015

Harddrives
- 6 HGTS 3TB each ????
- 3 HGTS 4TB each

First of all, let me compliment you on an aggressive SMART maintenance regimen. You appear to have read the docs and posts :)

ada0 looks good
ada1 looks good
ada2 looks good
ada3 looks good
ada4 is completely toasted. Controller and surface issues. Garbage.
ada5 looks good
da0-da2 have a problem. This problem is almost CERTAINLY controller related. You have some kind of firmware mismatch or something. Your M1015 is *NOT* set up correctly. Are you using version 16 firmware and driver? Is it flashed to IT mode?

kamal juck · May 18, 2015

Hi DrKK

Ooops! I see that smartctl -x results in a different output from smartctl -a. Thanks for working with the info that I provided anyway.

re: ada4 is what trigged my post, so I'm glad I ordered a replacement drive which just arrived today.

I have a quick question about swapping out the drive. I've looked over: http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive.

The way that I understand the docs (link above ), it assumes that the drive has failed completely and is unreadable. My question is there a dynamic way of replacing the failing drive without taking it offline completely. OR in other words is there a way to have some safety margin during the time the new drive has been built-up and having some redundancy. My understanding is that it can take a day for the RAID to rebuild , I don't want to be up the creek if another drive fails during this time. Sorry, if I'm not making myself clear.

If the answer is: "I should have used Mirror", well I had to go with RAIDZ1 due to my limited pocketbook.

re: M1015

- first, the card was flashed to IT mode (100% sure about this).

- Looking over my notes, but can't be 100% sure that v16 of the firmware was installed, I didn't write that in my notes. I do recall going to the LSI web site and downloading some files, but not sure if that version was installed. Is there a way to find out which version I have installed for sure?

- Also, in the past I've received an alert from Freenas about a driver not up to date (or something like that, not sure what the message was). Now I can't find the emails from the server, but when I first got the message I googled the alert message and learned that it could be ignored. Maybe I got some bad info.

Thanks again for your help.
K

kamal juck · May 18, 2015

I think I just found out how to safely replace the drive. http://doc.freenas.org/9.3/freenas_storage.html#replacing-drives-to-grow-a-zfs-pool

K

Ericloewe · May 18, 2015

kamal juck said:
Hi DrKK

Ooops! I see that smartctl -x results in a different output from smartctl -a. Thanks for working with the info that I provided anyway.

re: ada4 is what trigged my post, so I'm glad I ordered a replacement drive which just arrived today.

I have a quick question about swapping out the drive. I've looked over: http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive.

The way that I understand the docs (link above ), it assumes that the drive has failed completely and is unreadable. My question is there a dynamic way of replacing the failing drive without taking it offline completely. OR in other words is there a way to have some safety margin during the time the new drive has been built-up and having some redundancy. My understanding is that it can take a day for the RAID to rebuild , I don't want to be up the creek if another drive fails during this time. Sorry, if I'm not making myself clear.

Yes, there is.
http://doc.freenas.org/9.3/freenas_storage.html#replacing-drives-to-grow-a-zfs-pool
The autoexpand stuff is irrelevant in your case.

kamal juck said:
re: M1015

- first, the card was flashed to IT mode (100% sure about this).

- Looking over my notes, but can't be 100% sure that v16 of the firmware was installed, I didn't write that in my notes. I do recall going to the LSI web site and downloading some files, but not sure if that version was installed. Is there a way to find out which version I have installed for sure?

- Also, in the past I've received an alert from Freenas about a driver not up to date (or something like that, not sure what the message was). Now I can't find the emails from the server, but when I first got the message I googled the alert message and learned that it could be ignored. Maybe I got some bad info.

So it seems.

The GUI should be displaying a warning if there's a version mismatch between the mps driver and the card's firmware. The driver is P16, so you must have the card at P16, as well.

sas2flash -listall (or similar, dunno exact syntax) should list the mps controllers on your system. It should tell you the current firmware version.

kamal juck · May 18, 2015

I installed the drive and it is in the process of resilvereing and a zpool status gives me following listing. Does this look ok so far?

pool: vault2
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon May 18 19:06:18 2015
123G scanned out of 7.24T at 142M/s, 14h37m to go
40.6G resilvered, 1.65% done
config:
NAME STATE READ WRITE CKSUM
vault2 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/863543f1-eeb0-11e3-86d5-e03f49ea0e7e ONLINE 0 0 0
gptid/868a7e4d-eeb0-11e3-86d5-e03f49ea0e7e ONLINE 0 0 0
replacing-2 OFFLINE 0 0 0
4759880931749812709 OFFLINE 0 0 0 was /dev/gptid/86df6a7b-eeb0-11e3-86d5-e03f49ea0e7e
gptid/7cf2e64a-fdb2-11e4-aa9c-e03f49ea0e7e ONLINE 0 0 0 (resilvering)

Ericloewe, you were right on the mark with your syntax for sas2flash -listall

here is what I got.

~# sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved
Adapter Selected is a LSI SAS: SAS2008(B1)
Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
----------------------------------------------------------------------------
0 SAS2008(B1) 20.00.00.00 14.00.00.08 07.39.00.00 00:01:00:00
Finished Processing Commands Successfully.
Exiting SAS2Flash.
[root@freenas] ~#

Thanks guys (gals?) for the hand holding
K

DrKK · May 18, 2015

Right. So, that resilver looks OK. Resilvers tend to speed up as they go, so you'll find that clock time for completion should in most cases be sooner than you were originally told.

Also, I don't mess with HBA's, but it looks to me like you have version 16 versus version 20 mismatch, which is precisely what I expected your problem would be based upon listening to Cyberjock for the past year ;)

kamal juck · May 18, 2015

DrKK
So are you saying that I have to upgrade the LSI SAS2 Flash Utility to v2o as well, right?

As I write this I'm up to 36.97% resilvered.
cheers,
K

DrKK · May 18, 2015

kamal juck said:
DrKK
So are you saying that I have to upgrade the LSI SAS2 Flash Utility to v2o as well, right?

As I write this I'm up to 36.97% resilvered.
cheers,
K

Negative sir. Now, I am not an LSI guy. I just know what I hear from the people that know what they're talking about: The best thing to do is have BOTH things on version 16. I don't know why, but that is definitely considered best for now, even though version 20 exists.

Ericloewe · May 19, 2015

DrKK said:
Negative sir. Now, I am not an LSI guy. I just know what I hear from the people that know what they're talking about: The best thing to do is have BOTH things on version 16. I don't know why, but that is definitely considered best for now, even though version 20 exists.

In a nutshell, LSI only supports matched versions. The stable FreeBSD driver has been at P16 for a while now, do that's the firmware version you need at the moment.
There's talk about moving to P20, but I've yet to hear anything official.

What bugs me is that the GUI isn't giving you the warning and blinking light it should be.

kamal juck · May 19, 2015

Ok my drive is installed and resilvered. I'm not sure if I needed to scrubb it, just in case,

Ericloewe, after I rebooted the server and when I logged into the GUI, I saw this message "WARNING: Firmware version 20 does not match driver version 16 for /dev/mps0" I assume this the LSI mismatch you guys have been talking about. My server runs 24x7, maybe the alert is generated when the system is rebooted, could that be reason I wasn't able to find it before.

From the user Docs chapt 22. Alert http://doc.freenas.org/9.3/freenas_alert.html I came across the following but not sure all the steps. I don't want to mess it up.

"An alert will also be generated when the LSI HBA firmware version does not match the driver version. To resolve this alert, download the IT (integrated target) firmware, not the IR (integrated RAID) firmware, from the LSI website. Then, specify the name of the firmware image and bios as well as the controller to flash:

sas2flash -f firmwareimagename -b biosname -c controllernumber

When finished, reboot the system. The new firmware version should appear in the system messages and the alert will be cleared."

Searching for sas2flash I see that others have had this problem. but I'm not sure if there solutions pertain to me.

Has anyone here (who has been following my questions) gone through this process of downgrading a card, point to a link that is similar to my particular situation with step-by-step instruction, or should I post a new question. Please remember that I'm newbie, and don't want to brick the card or lose my data.

Thanks for all your help in getting through my drive replacement.
Cheers,
K

Ericloewe · May 20, 2015

It should be the exact same process. Just flash P16.

I'd rather do it from UEFI, though (detailed instructions elsewhere on the forum).

Important Announcement for the TrueNAS Community.

unreadable (pending) sectors?

kamal juck

Dabbler

Yatti420

Wizard

Robert Trevellyan

Pony Wrangler

DrKK

FreeNAS Generalissimo

kamal juck

Dabbler

DrKK

FreeNAS Generalissimo

DrKK

FreeNAS Generalissimo

kamal juck

Dabbler

kamal juck

Dabbler

Ericloewe

Server Wrangler

kamal juck

Dabbler

DrKK

FreeNAS Generalissimo

kamal juck

Dabbler

DrKK

FreeNAS Generalissimo

Ericloewe

Server Wrangler

kamal juck

Dabbler

Ericloewe

Server Wrangler

Similar threads

Important Announcement for the TrueNAS Community.

unreadable (pending) sectors?

Dabbler

Wizard

Pony Wrangler

FreeNAS Generalissimo

Dabbler

FreeNAS Generalissimo

FreeNAS Generalissimo

Dabbler

Dabbler

Server Wrangler

Dabbler

FreeNAS Generalissimo

Dabbler

FreeNAS Generalissimo

Server Wrangler

Dabbler

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "unreadable (pending) sectors?"

Similar threads