Questions about removing and updating drives

Status
Not open for further replies.

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
One of my pools consists of 5 desktop drives that I re-purposed when I built my NAS last year. One day while I was waiting for FreeNAS to boot I saw a bunch of warning messages about those drives. Apparently there is a bug in the firmware that could cause the drive to fail if S.M.A.R.T. diagnostics were used on them. It gave me two links for more information, http://knowledge.seagate.com/articles/en_US/FAQ/223571en and http://www.smartmontools.org/wiki/SamsungF4EGBadBlocks. My question is, can I update the firmware on the drives in the NAS while they are part of a pool? If not, what do I need to do in order to remove the drives and update them on another machine? I'm backing everything up now but if I don't have to create a new pool and start over that would be preferred. The instructions for updating it aren't even clear regardless. I am supposed to connect the drive to the primary master position, which doesn't exist with SATA drives as far as I know.
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I think the first thing to do here after posting your system specs is to also include a listing of the specific error messages you are getting just so we are not confusing some of the messages. Even if you have the proper firmware, smart will still tell you if you have that drive to check for the firmware update. I have several of these drives myself, not in a NAS but I have them.

If it turns out that you really do need to update the firmware then depending on your hardware, it's very possible to reprogram the hard drives installed in your FreeNAS system. The real factor will be if you are using the MB SATA connectors or an add-on controller which might complicate it.

If you feel the instructions are a bit complicated, maybe you shouldn't be doing this kind of work, but I'll explain it to you...

The Master will be the first drive listed in your BIOS. If it were me, I'd try to upgrade all the drives at once just to see if the software is capable of doing a batch, if not then it will only do the first hard drive. If you are using your FreeNAS computer, disconnect all but one SATA cable, boot the DOS and run the flash program. Shut it down, unplug the one drive you just upgraded and repeat the process for the remaining drives. I would use the SATA0 connector for the entire process but that is just me. As for the bootable DOS flash drive, FreeDOS and MSDOS iso images are out on the internet for free download, you just need to create a bootable flash or hey, if you have a floppy disk drive laying around, even easier. Then add the reprogramming exe file and BAM! you got it.

Seriously, post the output of your smart message, if it's just a warning message then it may be nothing.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
SUPERMICRO MBD-X10SL7-F-O uATX Server Motherboard LGA 1150 Intel C222 DDR3 1600
Intel Core i3-4150 Haswell Dual-Core 3.5GHz LGA 1150 54W Desktop Processor Intel HD Graphics 4400 BX80646I34150
2x Crucial 16GB (2 x 8GB) 240-Pin DDR3 SDRAM DDR3 1600 (PC3 12800) ECC Unbuffered Server Memory Model CT2KIT102472BD160B
SeaSonic G Series SSR-550RM 550W ATX12V / EPS12V SLI Ready CrossFire Ready 80 PLUS GOLD Certified Modular Active PFC Power Supply New 4th Gen CPU Certified Haswell Ready
NZXT Source 210 Elite Black Steel with painted interior ATX Mid Tower Computer Case
Norco 5.25" to 3.5" HDD Cage
9x WD Red WD60EFRX 6TB IntelliPower 64MB Cache SATA 6.0Gb/s 3.5" NAS Hard Drive Bare Drive
5x Samsung HD204UI 2TB Hard Drives

The warning messages always appeared during a boot, but now apparently they don't. All I got from a short test on one of the drives is:
Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 6
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 067 066 025 Pre-fail Always - 10009
4 Start_Stop_Count 0x0032 094 094 000 Old_age Always - 6436
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 5359
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 2
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 875
181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 8962060
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 8763
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 059 000 Old_age Always - 31 (Min/Max 8/41)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 374
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 2
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 6461


The reason I'm doing this is my pool isn't healthy. Back in January when I first noticed the message my pool became degraded and it removed one drive. Eventually it added it back in without my intervention and it continued working. As of last week a drive in that pool has once again been removed and my pool is now operating in a degraded state once again.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I was actually looking for all the SMART data, not just what you posted. In the first few lines there will be a message about the specific drive and the warning if it exists.

Based on your posting, looks like you are having a few Multi-Zone errors but that is it. It doesn't look like it will fail today or anything but those errors are likely what took it offline so I'd replace it. What about your other drives, gonna post those SMART results too?
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
That isn't the drive that it took offline. It is still offline so I can't do any tests on it. Is there another command to see results? I used smartctl -A and that was all it posted. The results of the other three drives still in the pool are:
Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 15
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 067 059 025 Pre-fail Always - 10199
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1402
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 11005
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 1
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1325
181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 16892880
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 2512
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 055 000 Old_age Always - 31 (Min/Max 8/45)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 1026
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 1
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1504

Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 2
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 068 066 025 Pre-fail Always - 9818
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1414
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 10761
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1325
181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 140799
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 2090
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 053 000 Old_age Always - 30 (Min/Max 7/47)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 100 100 000 Old_age Always - 64
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 366
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1502

Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 14
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 068 066 025 Pre-fail Always - 9991
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1478
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 10776
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1323
181 Program_Fail_Cnt_Total 0x0022 100 100 000 Old_age Always - 482791
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 1276
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 053 000 Old_age Always - 33 (Min/Max 7/48)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 304
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1491
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Use a lower case "a"... "smartctl -a /dev/ada0" Not certain you should trust ID 200 Multi_Zone_Error_Rate, sometimes those are not very accurate representations on some drives. You sure have a lot of power on cycles.

What do you mean you cannot run the command on the offline drive? Even if it's offline I would think it could be checked from the command line. Are you using the shell or SSH to run the command?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Also report the results of "camcontrol devlist".
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
Ok now the warning message is there. This is for the first drive I posted results for earlier.

Code:
=== START OF INFORMATION SECTION ===  
Model Family:  SAMSUNG SpinPoint F4 EG (AF)  
Device Model:  SAMSUNG HD204UI  
Serial Number:  S2H7J1BZA26680  
LU WWN Device Id: 5 0024e9 00440e337  
Firmware Version: 1AQ10001  
User Capacity:  2,000,398,934,016 bytes [2.00 TB]  
Sector Size:  512 bytes logical/physical  
Rotation Rate:  5400 rpm  
Form Factor:  3.5 inches  
Device is:  In smartctl database [for details use: -P show]  
ATA Version is:  ATA8-ACS T13/1699-D revision 6  
SATA Version is:  SATA 2.6, 3.0 Gb/s  
Local Time is:  Tue Aug  4 16:08:56 2015 PDT  
  
==> WARNING: Using smartmontools or hdparm with this  
drive may result in data loss due to a firmware bug.  
****** THIS DRIVE MAY OR MAY NOT BE AFFECTED! ******  
Buggy and fixed firmware report same version number!  
See the following web pages for details:  
http://knowledge.seagate.com/articles/en_US/FAQ/223571en  
http://www.smartmontools.org/wiki/SamsungF4EGBadBlocks  
  
SMART support is: Available - device has SMART capability.  
SMART support is: Enabled  

SMART Attributes Data Structure revision number: 16  
Vendor Specific SMART Attributes with Thresholds:  
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE  
  1 Raw_Read_Error_Rate  0x002f  100  100  051  Pre-fail  Always  -  6  
  2 Throughput_Performance  0x0026  055  055  000  Old_age  Always  -  18643  
  3 Spin_Up_Time  0x0023  067  066  025  Pre-fail  Always  -  10009  
  4 Start_Stop_Count  0x0032  094  094  000  Old_age  Always  -  6489  
  5 Reallocated_Sector_Ct  0x0033  252  252  010  Pre-fail  Always  -  0  
  7 Seek_Error_Rate  0x002e  252  252  051  Old_age  Always  -  0  
  8 Seek_Time_Performance  0x0024  252  252  015  Old_age  Offline  -  0  
  9 Power_On_Hours  0x0032  100  100  000  Old_age  Always  -  5359  
10 Spin_Retry_Count  0x0032  252  252  051  Old_age  Always  -  0  
11 Calibration_Retry_Count 0x0032  100  100  000  Old_age  Always  -  2  
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  875  
181 Program_Fail_Cnt_Total  0x0022  100  100  000  Old_age  Always  -  8962060  
191 G-Sense_Error_Rate  0x0022  100  100  000  Old_age  Always  -  8763  
192 Power-Off_Retract_Count 0x0022  252  252  000  Old_age  Always  -  0  
194 Temperature_Celsius  0x0002  064  059  000  Old_age  Always  -  29 (Min/Max 8/41)  
195 Hardware_ECC_Recovered  0x003a  100  100  000  Old_age  Always  -  0  
196 Reallocated_Event_Count 0x0032  252  252  000  Old_age  Always  -  0  
197 Current_Pending_Sector  0x0032  252  252  000  Old_age  Always  -  0  
198 Offline_Uncorrectable  0x0030  252  252  000  Old_age  Offline  -  0  
199 UDMA_CRC_Error_Count  0x0036  200  200  000  Old_age  Always  -  0  
200 Multi_Zone_Error_Rate  0x002a  100  100  000  Old_age  Always  -  374  
223 Load_Retry_Count  0x0032  100  100  000  Old_age  Always  -  2  
225 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  6514  
  
SMART Error Log Version: 1  
No Errors Logged  
  
SMART Self-test log structure revision number 1  
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error  
# 1  Extended offline  Aborted by host  90%  5359  -  
# 2  Extended offline  Completed without error  00%  5359  -  
# 3  Short offline  Completed without error  00%  5359  -  
  
SMART Selective self-test log data structure revision number 0  
Note: revision number not 1 implies that no selective self-test has ever been run  
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS  
  1  0  0  Aborted_by_host [90% left] (0-65535)  
  2  0  0  Not_testing  
  3  0  0  Not_testing  
  4  0  0  Not_testing  
  5  0  0  Not_testing  
Selective self-test flags (0x0):  
  After scanning selected spans, do NOT read-scan remainder of disk.  
If Selective self-test is pending on power-up, resume after 0 minute delay. 


I'm using the shell in the webgui. I'm not positive that was the entire results though as I don't know how to copy all the data. That drive took a few seconds to display everything so I was able to copy the initial data and then what looked like the rest once it was displayed. The other three drives spit it out all at once and maybe the entire first half I can't see anymore to copy. When I try to run a test on the offline drive, I get the message "/dev/ada5: Unable to detect device type Please specify device type with the -d option."

Code:
<ATA WDC WD60EFRX-68M 0A82> at scbus0 target 0 lun 0 (pass0,da0)
<ATA SAMSUNG HD204UI 0001> at scbus0 target 1 lun 0 (pass1,da1)
<ATA SAMSUNG HD204UI 0001> at scbus0 target 2 lun 0 (pass2,da2)
<ATA SAMSUNG HD204UI 0001> at scbus0 target 3 lun 0 (pass3,da3)
<ATA WDC WD60EFRX-68M 0A82> at scbus0 target 4 lun 0 (pass4,da4)
<ATA WDC WD60EFRX-68M 0A82> at scbus0 target 5 lun 0 (pass5,da5)
<ATA WDC WD60EFRX-68M 0A82> at scbus0 target 6 lun 0 (pass6,da6)
<ATA WDC WD60EFRX-68M 0A82> at scbus0 target 7 lun 0 (pass7,da7)
<WDC WD60EFRX-68MYMN1 82.00A82> at scbus1 target 0 lun 0 (pass8,ada0)
<WDC WD60EFRX-68MYMN1 82.00A82> at scbus2 target 0 lun 0 (pass9,ada1)
<WDC WD60EFRX-68MYMN1 82.00A82> at scbus3 target 0 lun 0 (pass10,ada2)
<WDC WD60EFRX-68MYMN1 82.00A82> at scbus4 target 0 lun 0 (pass11,ada3)
<SAMSUNG HD204UI 1AQ10001> at scbus6 target 0 lun 0 (pass12,ada4)
<ADATA USB Flash Drive 1100> at scbus8 target 0 lun 0 (pass13,da8)
<ADATA USB Flash Drive 1100> at scbus9 target 0 lun 0 (pass14,da9)


As you can see ada5 is nowhere to be found. The high power cycles are attributed to their former use prior to constructing the NAS. One drive collected dust for a year before joining the other 4 in a machine that was powered on/off multiple times a week for probably 4 years.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Your drive ada5 is dead. Check the physical connections to the drive. If it will not show up then it's completely dead and you need to replace the drive following the user manual procedure.

EDIT: BTW, this has absolutely nothing to do with the firmware update which is likely already updated. Samsung released the bug fix using the same version number, a bad way to do things.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
Everything felt tight but apparently something was loose as it is now showing up. I'm pretty sure I purchased my drives 2-3 months before the firmware was released so they did need a firmware update. I did have to do it one at a time with the drives plugged into the sata0 port. Now I can ignore those warning messages. Thanks for all your help, I really appreciate it.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You could have a failing SATA cable, it happens more than your would think.
 

DGenerateKane

Explorer
Joined
Sep 4, 2014
Messages
95
That's true. I have had to replace a couple over time in other machines.
 
Status
Not open for further replies.
Top