Scrub frozen

Status
Not open for further replies.

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Is the failed drive the only one connected? If yes, than it could be your failed drive, not properly connected (or defective) SATA cable or a defective SATA port.

Only moving the suspected failed disk to another system (known to be working) can help to determine which component is likely to be the failed one.

If it were only the hard drive that was defective, it looks like it is getting progressively worse, so the next time (in any system) the drive might not even be recognized. Please check its SATA cable (preferably use a SATA cable with locking, and check that it clicked on both ends).
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
No all 6 of the drives are connected. Obviously the failed drive is ada1. It is firmly connected. And the cable has been swaped with a known good one.
Tried to reboot and still hangs at:
(ada1):ata3:0:0:0) READ_DMA48. ACB:25 00 00 a0 50 40 5d 01 00 00 00 01
(ada1):ata3:0:0:0) CAM status: ATA status Error
(ada1):ata3:0:0:0) ATA status 51 (DRDY SERV ERR), error: 40(UNC)
(ada1):ata3:0:0:0) RES: 51 40 88 a0 50 5d 5d 01 00 6f 00
(ada1):ata3:0:0:0) Error 5, Retries exhausted

Should I just wait until my replacement drive arrives? And boot with 5 of the six drives of the pool connected, or boot with 5 old drives and the new (blank) one connected too?
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
OK I got the system up and running with what I believe is a faulty drive still in.

I ran: smartctl -A /dev/ada1

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 199 051 Pre-fail Always - 189
3 Spin_Up_Time 0x0027 154 144 021 Pre-fail Always - 9258
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 289
5 Reallocated_Sector_Ct 0x0033 155 155 140 Pre-fail Always - 647
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 132
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2706
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 262
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 235
193 Load_Cycle_Count 0x0032 178 178 000 Old_age Always - 68452
194 Temperature_Celsius 0x0022 129 115 000 Old_age Always - 23
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 300
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 27
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 42
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 7

No idea what this means.

I received two emails when the system came back online:

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 10 Currently unreadable (pending) sectors

Device info:
WDC WD30EZRX-00MMMB0, S/N:WD-WCAWZ1566267, WWN:5-0014ee-2b10eddf7, FW:80.00A80, 3.00 TB

For details see host's SYSLOG.

And:

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 27 Offline uncorrectable sectors

Device info:
WDC WD30EZRX-00MMMB0, S/N:WD-WCAWZ1566267, WWN:5-0014ee-2b10eddf7, FW:80.00A80, 3.00 TB

For details see host's SYSLOG.

This is my ada1 device that I'm waiting on a replacement.

So now that is is back up and I have a WebGUI what is next?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, if you search the forums you should find the answers to your questions on what those mean. ;)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Values 5, 197, and 198 are indicators of a pending failure, anything above zero (0) is bad. Your emails clearly are warning you of the 197 and 198 values. If you don't have your replacement drive handy, I have no clue why you are running your machine. You are taking a chance on another drive failure by running your system. You didn't post it here but I'm assuming you are running a RAIDZ2 system using 2TB drives. I could be wrong but if you are running a RAIDZ1, turn your system off and wait for the new drive.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
In addition to what joeschmuck wrote 30 minutes ago, based upon the S.M.A.R.T. output, you should be able to get a warranty replacement from Western Digital (or whoever you have the drive warranty with).
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
OK I got the system up and running with what I believe is a faulty drive still in.

I ran: smartctl -A /dev/ada1

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 199 051 Pre-fail Always - 189
3 Spin_Up_Time 0x0027 154 144 021 Pre-fail Always - 9258
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 289
5 Reallocated_Sector_Ct 0x0033 155 155 140 Pre-fail Always - 647
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 132
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2706
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 262
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 235
193 Load_Cycle_Count 0x0032 178 178 000 Old_age Always - 68452
194 Temperature_Celsius 0x0022 129 115 000 Old_age Always - 23
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 300
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 27
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 42
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 7

No idea what this means.

I received two emails when the system came back online:

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 10 Currently unreadable (pending) sectors

Device info:
WDC WD30EZRX-00MMMB0, S/N:WD-WCAWZ1566267, WWN:5-0014ee-2b10eddf7, FW:80.00A80, 3.00 TB

For details see host's SYSLOG.

And:

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 27 Offline uncorrectable sectors

Device info:
WDC WD30EZRX-00MMMB0, S/N:WD-WCAWZ1566267, WWN:5-0014ee-2b10eddf7, FW:80.00A80, 3.00 TB

For details see host's SYSLOG.

This is my ada1 device that I'm waiting on a replacement.

So now that is is back up and I have a WebGUI what is next?

Please use code tags! It makes it much easier for us to help you...

That drive is failing, as has been mentioned. However, the Load Cycle Count for your other drives will be similar to this drive's - and it's pretty high, considering the drive's age. Search the forums for how to change the idle timer from 8 seconds to 300 seconds.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Please use code tags! It makes it much easier for us to help you...

That drive is failing, as has been mentioned. However, the Load Cycle Count for your other drives will be similar to this drive's - and it's pretty high, considering the drive's age. Search the forums for how to change the idle timer from 8 seconds to 300 seconds.
Good catch.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Thanks again.

The failing drive is now "offline", pulled and i'm awaiting the arrival of the RMA replacement.

The system is shutdown. And today's project is running wdidle to change the idle timer on the remaining 5 drives.

I like making improvements.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
I'm having trouble getting wdidle3 to make changes to the timer.

When I run the program all I get is a Model # Serial # and timer set to 8.000 seconds.

How do I get it to change the timer to 300 seconds?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm having trouble getting wdidle3 to make changes to the timer.

When I run the program all I get is a Model # Serial # and timer set to 8.000 seconds.

How do I get it to change the timer to 300 seconds?

We have a guide on this....
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
I read the guide, watched the video. But it did not show the syntax to change to 300 seconds. Only to disable the park.

Trial and error and I figured out the syntax.

Exciting using trial and error syntax on $600 worth of drives.

5 drives changed to 300 seconds. Once the RMA drive arrives I'll change that one. Then try to resilver my volume.

Thanks
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
You should have in the ZIP archive a file named wdidle3.txt that, although very short, would answer some of your questions. You will find that manufacturer's instructions are seldom copied into the forum, as they are subject to change.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
No questions. Trial and error reveals what is need to be placed in the fields and how spaces and characters are used. Since none of these things are covered in the why to guide, or the text file with the exe. or the video, or WD's use instructions on their page associated with the wdidle3 download.

The answer was. D:wdidle3 /s300

wdidle3 /r -----> issues a report
and
wdidle2 /d -----> disables the timer

And no one. Not the guide, video, text file or WD' use instructions will tell you this, about what is needed to space and format the syntax for the 300 second timer. Imagine if your a total noob.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Your file did not include this?
Code:
WDIDLE3 [/S[<Timer>]] [/D] [/R] [/?]
where:
/S[<Timer>] Set timer, units in seconds. Default=8.0 (8.0 seconds).
            Resolution is 0.1 seconds from 8.0 to 12.7 seconds.
            Resolution is 30 seconds from 30 seconds to 300 seconds.
            Note, times between 12.8 and 30 seconds will be set to 30 seconds.
 

climb2bhi

Contributor
Joined
Feb 1, 2012
Messages
108
Yes, but when you are a total computer noob like me you have no idea what kind of spaces or characters are needed to replace or include things like [ or <. So you hammer away including these symbols, and eventual replacing them with spaces, and then finding out that there is a combination of spaces and no spaces to make it work.

It would be handy if the guides and instructions said type the following exact syntax.

X:wdidle3 /s300

If you want to reset your timer to 300 seconds. Where X is the drive letter for your CD drive.

X:wdidle3 /r

If you want to issue a report on your drives.

X:wdidle2 /d

If you want to disable the timer.

The experts here are very helpful and we noobs appreciate all of the help proivided on these forms. But they are so advanced that they can't imagine someone not knowing the most basic things about DOS or UNIX syntax.

Imagine, All it takes to end up here is connect some hard drives to a Motherboard then search on Google for NAS Operating Systems, download FreeNAS, install, set up the most basic credentials, and get your Windows to recognize a network drive. For us computer Noobs DOS and UNIX are like a vacation to Russia and trying to buy something at a hardware store.

BTW what are "Code tags"?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
The problem with the documentation you had described is a serious one. And unfortunately a widespread one.

Too much documentation and people do not read it. Too terse, some people would be lost.

WDC could have included examples. As it is not an iDevice ;) so it is not necessarily intuitive ;)

Over time, you will find that some documentation almost always has examples and some almost never has examples. The division lines are not between commercial and open source software.
 
Status
Not open for further replies.
Top