Degraded volume after update 11.0-U4

Status
Not open for further replies.

Junicast

Patron
Joined
Mar 6, 2015
Messages
206
Hi,

I just updated to 11.0-U4 and after that my raidz2 is in degraded state.
The strange thing about this is all 5 disks are available.

Code:
camcontrol devlist
<ST4000DM000-1F2168 CC54>  at scbus1 target 0 lun 0 (pass1,ada1)
<ST4000DM000-1F2168 CC54>  at scbus2 target 0 lun 0 (pass2,ada2)
<ST4000DM000-1F2168 CC54>  at scbus3 target 0 lun 0 (pass3,ada3)
<ST4000DM005-2DP166 0001>  at scbus4 target 0 lun 0 (pass4,ada4)
<ST4000DM000-1F2168 CC54>  at scbus5 target 0 lun 0 (pass5,ada5)


Code:
  NAME  STATE  READ WRITE CKSUM
   fileserver  DEGRADED  0  0  0
	raidz2-0  DEGRADED  0  0  0
	gptid/3c38d05a-2ac4-11e7-97e2-001b21c1a8c0.eli  ONLINE  0  0  0
	gptid/3d2f6210-2ac4-11e7-97e2-001b21c1a8c0.eli  ONLINE  0  0  0
	gptid/3e53f7a2-2ac4-11e7-97e2-001b21c1a8c0.eli  ONLINE  0  0  0
	6114655274444198586  UNAVAIL  0  0  0  was /dev/gptid/3f7f0196-2ac4-11e7-97e2-001b21c1a8c0.eli
	gptid/4076a268-2ac4-11e7-97e2-001b21c1a8c0.eli  ONLINE  0  0  0
   logs
	gptid/885b2f74-2ce6-11e7-918f-001b21c1a8c0  ONLINE  0  0  0
   cache
	gptid/93074bd7-2ce6-11e7-918f-001b21c1a8c0  ONLINE  0  0  0


So why is my volume in degraded state in the first place and shall I just replace the UNAVAIL drive with the actual drive?

FreeNAS-11.0-U4 (54848d13b)
Intel(R) Core(TM) i5-3470S CPU @ 2.90GHz
Memory 32713MB
 

Attachments

  • Screenshot_2017-10-10_12-48-13.png
    Screenshot_2017-10-10_12-48-13.png
    14.9 KB · Views: 402

Mihalich

Patron
Joined
Mar 14, 2017
Messages
297
What are you talking about? The fourth disk is not available. Unplug it and check, if it works put it back. I would have done.
 

Junicast

Patron
Joined
Mar 6, 2015
Messages
206
camcontrol devlist clearly states that all 5 drives are there. dmesg tells the same:
Code:
egrep 'da[0-9]|cd[0-9]' /var/run/dmesg.boot
ada0: <MZ-5EA1000-0D3 AXM17D3Q> ATA8-ACS SATA 2.x device
ada0: Serial Number S0SENEAC602092
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 4096bytes)
ada0: Command Queueing enabled
ada0: 95396MB (195371568 512 byte sectors)
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST4000DM000-1F2168 CC54> ACS-2 ATA SATA 3.x device
ada1: Serial Number AAAAAAAA
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 3815447MB (7814037168 512 byte sectors)
ada1: quirks=0x1<4K>
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <ST4000DM000-1F2168 CC54> ACS-2 ATA SATA 3.x device
ada2: Serial Number AAAAAAAA
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 3815447MB (7814037168 512 byte sectors)
ada2: quirks=0x1<4K>
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <ST4000DM000-1F2168 CC54> ACS-2 ATA SATA 3.x device
ada3: Serial Number AAAAAAAA
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 3815447MB (7814037168 512 byte sectors)
ada3: quirks=0x1<4K>
ada4 at ahcich4 bus 0 scbus4 target 0 lun 0
ada4: <ST4000DM005-2DP166 0001> ACS-3 ATA SATA 3.x device
ada4: Serial Number AAAAAAAA
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 3815447MB (7814037168 512 byte sectors)
ada4: quirks=0x1<4K>
ada5 at ahcich5 bus 0 scbus5 target 0 lun 0
ada5: <ST4000DM000-1F2168 CC54> ACS-2 ATA SATA 3.x device
ada5: Serial Number AAAAAAAA
ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 3815447MB (7814037168 512 byte sectors)
ada5: quirks=0x1<4K>
 

Mihalich

Patron
Joined
Mar 14, 2017
Messages
297
Why your drives have different transfer rate?
Show SMART failed hard drive.
 

Junicast

Patron
Joined
Mar 6, 2015
Messages
206

Mihalich

Patron
Joined
Mar 14, 2017
Messages
297
Rotation Rate: 5980 rpm
1 Raw_Read_Error_Rate 0x000f 067 064 006 Pre-fail Always - 5470990
7 Seek_Error_Rate 0x000f 080 060 045 Pre-fail Always - 8774489937

You are not worrying? Compare this performance with your hard drives.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Here's the smart output of the failed drive:
https://paste.debian.net/990036/
You have never done a smart test on this drive. That's very negligent. Run a smart test I bet this drive will fail and while you are at it run smart short and long tests on all your drives. I hope you don't find out they are all failing.

Read the manual.
 

Junicast

Patron
Joined
Mar 6, 2015
Messages
206
Naively I thought FreeNAS would check smart for me and warn in the GUI if something is wrong.
Well I compared my drives and I could see that there are two drives critical.
Seek Error rates:
ada1: 1,726,712,645,781
ada2: 141,181,410
ada3: 139,579,117
ada4: 8,774,502,726
ada5: 135,292,440

Just ordered two replacement drives. Thanks for your help.
 
Last edited by a moderator:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
The RRER and SER on Seagate drives are expressed on a logarithmic scale, it's normal to have very high values in those fields.
 

Junicast

Patron
Joined
Mar 6, 2015
Messages
206
A litte but important update.
Seagate Seek Error rates (raw) need to be converted in order to get the actual value.
http://www.users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html
What I did is convert the raw string to HEX, take the first 16 bits (i.e. 4 characters) and convert back to int.
See my results attached.
Just talked to someone who is responsible for hard disk in a datacenter. He told me that the most important values to detect a failing Seagate would be those:
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
Those values state that there are no errors at all on the designated disk. So I have to assume that the disk is actually just fine.

Any errors in my logic?
 

Attachments

  • Screenshot_2017-10-11_15-11-19.png
    Screenshot_2017-10-11_15-11-19.png
    12.2 KB · Views: 402

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
A litte but important update.
Seagate Seek Error rates (raw) need to be converted in order to get the actual value.
http://www.users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html
What I did is convert the raw string to HEX, take the first 16 bits (i.e. 4 characters) and convert back to int.
See my results attached.
Just talked to someone who is responsible for hard disk in a datacenter. He told me that the most important values to detect a failing Seagate would be those:
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
Those values state that there are no errors at all on the designated disk. So I have to assume that the disk is actually just fine.

Any errors in my logic?
Yes, you didn't run a smart test. That's literally all you have to do. No converting no understanding values just pass or no pass. And yes FreeNAS does this automatically and will email you and flash in the GUI if something goes wrong. You just have to set it up.
 
Last edited by a moderator:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
And yes freenas does this automatically and will email you and flash in the GUI if something goes wrong. You just have to set it up.
...and that's the key, and FreeNAS really should set a default SMART test schedule, but it doesn't.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I think the devs are open to changing that, though. I don't think anybody actually filed a feature request, though. I'd probably add a few things in a single SMART overhaul ticket.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
It could be one of the questions in the setup wizard. "Do you want to set up some SMART tests? Yes/No?" That way the user isn't forced to run a SMART test if they despise SMART tests.
 
Last edited:

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
If they actively hate them then they can hunt down how to disable them. Other people who don't care will just get them for free.
 

Junicast

Patron
Joined
Mar 6, 2015
Messages
206
I ran a long self test via smartctl yesterday on all disks without any errors. I will replace the missing disk with the actual disk now.
The initial question why my volume got degraded in the first place is still unanswered though.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I ran a long self test via smartctl yesterday on all disks without any errors. I will replace the missing disk with the actual disk now.
The initial question why my volume got degraded in the first place is still unanswered though.
You had a hardware failures of some kind. Or at least that should be the first thing you check. You didn't provide very good hardware info so we can't give you any ideas of what to look at next. Let us know if you figure it out.
 
Status
Not open for further replies.
Top