raidz - degraded - investiagion

phier

Patron
Joined
Dec 4, 2012
Messages
400
Have you checked your TLER settings for this drive?
@rvassar that looks like a great point

the Ada2 - Seagate is "kind of problematic" in my case, was removed from pool multiple times, timeout-ing,
maybe bc of that ERC?

Other two drives are WD and seems both by default have ECR disabled.

Even i read abovementioned article, still not sure if its a good idea to have it enabled or disabled.

thanks

Code:
root@truenas[~]# smartctl -l scterc /dev/ada2
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:    100 (10.0 seconds)
          Write:    100 (10.0 seconds)

root@truenas[~]# smartctl -l scterc /dev/ada1
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

root@truenas[~]# smartctl -l scterc /dev/ada0
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled
 
Last edited:

rvassar

Guru
Joined
May 2, 2018
Messages
972
@rvassar

Even i read abovementioned article, still not sure if its a good idea to have it enabled or disabled.

I would try the problematic drive with TLER enabled. If the array can recover the data from the mirror peer or parity calc, you're basically giving ZFS the option to bail out of a read transaction without waiting for the error timeout and keep moving. The data is safe, but that drive has an intermittent problem. Writes have to get re-issued and succeed, or the device will get punted out of the pool.

My experience here is there's probably a firmware glitch in the drive. Something in the firmware holding a spinlock deadlocks (think two toddlers grabbing a toy and yelling "mine!" at each other...) and the drive resets to avoid going "phy bad" and the issued command gets dropped. If the drive has update-able firmware, you may find a fix in an update, but this is often not possible with retail devices. It could also be something simple like a stress bent or a worn/bad cable.
 
Last edited:

phier

Patron
Joined
Dec 4, 2012
Messages
400
@rvassar got it,
will try to replace cable also.

question would be if i have to also enable TLER on another 2 WD drives.

thanks
 
Joined
Jan 18, 2017
Messages
525
Code:
193 Load_Cycle_Count        0x0032   096   096   000    Old_age   Always       -       8011


Just out of curiosity what is your load cycle count at now?
 

phier

Patron
Joined
Dec 4, 2012
Messages
400
@cobrakiller58 seems it got increased,
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 9618

Why?


@rvassar is it a good idea to enable TLER for all drive inside the raid/pool?


thanks
 
Joined
Jan 18, 2017
Messages
525
@cobrakiller58 seems it got increased,
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 9618

Why?


@rvassar is it a good idea to enable TLER for all drive inside the raid/pool?


thanks
That is increasing quickly, I saw it mentioned in here about EXOS drives having an idle timer parking the head causing timeouts when TrueNAS goes to fetch data.
 

phier

Patron
Joined
Dec 4, 2012
Messages
400
That is increasing quickly, I saw it mentioned in here about EXOS drives having an idle timer parking the head causing timeouts when TrueNAS goes to fetch data.
@cobrakiller58 no clue where are u pointing.... is that bad its increasing quickly?

By that idle timer parking the head ... so its OK that i am getting timeouts on that drive?
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
@rvassar is it a good idea to enable TLER for all drive inside the raid/pool?


thanks

Well... Some of your drives have it enabled by default. But for the ones that don't you should enable it. Our resident Grinch has a resource on it:

 

phier

Patron
Joined
Dec 4, 2012
Messages
400
@rvassar okay i will enable it on all remaining drives with 10sec value. Thanks
Well... Some of your drives have it enabled by default. But for the ones that don't you should enable it. Our resident Grinch has a resource on it:

 
Joined
Jan 18, 2017
Messages
525
@cobrakiller58 no clue where are u pointing.... is that bad its increasing quickly?

By that idle timer parking the head ... so its OK that i am getting timeouts on that drive?

Personally I would not accept the timeouts, sadly I could not find the exact thread that was talking about the Seagate power choice EPC settings I only have the bookmark I made for when I get these drives to disable the head parking. https://christoph-jahn.com/seagate-exos-disks-in-truenas-core-disable-parking-of-heads/ I cannot say if this will be of any use to you or not but might be worth investigating.
 

phier

Patron
Joined
Dec 4, 2012
Messages
400
@cobrakiller58 will check the article thanks.
by not accepting timeouts ... it means to disable ERC for all drives? or am i mixing apples with oranges?
 
Joined
Jan 18, 2017
Messages
525
@cobrakiller58 will check the article thanks.
by not accepting timeouts ... it means to disable ERC for all drives? or am i mixing apples with oranges?

I meant timeouts as the drive being dropped from the pool repeatedly (which I believe Seagate's EPC can cause), I cannot comment on ERC due to insufficient knowledge lol
I'll leave that subject to rvassar and people more experienced
 

phier

Patron
Joined
Dec 4, 2012
Messages
400
@cobrakiller58 okay i see,
i went thru the article, so just want to confirm - is it a good idea to execute that EPC related command?
The strange thing is that it was kicked out from the pool multiple times time ago... but that issue is not happening anymore.

@rvassar assuming than its a good idea to set ERC ~7-10sec on all drive in pool/truenas?


Thanks!
 
Joined
Jan 18, 2017
Messages
525
@cobrakiller58 https://www.truenas.com/community/t...een-epc-idle_a-and-idle_b-power-states.90751/
maybe it can be good just to keep idle_b set to 25min?

to completely disable - drive will increase power consumption... still not clear from abovementioned thread.
You may want to check the manual to see if the amount of power saved is relevant to you or not.
that issue is not happening anymore.
I'm curious why it has seemingly stopped being an issue though.....
 

phier

Patron
Joined
Dec 4, 2012
Messages
400
hello,
is it a good idea to buy Exos drive for a NAS in that case?
based on the drive behavior and articles posted here... i am confused... should i go for Exos or WD ultrastar is a better choice?

thanks!
 
Top