I have a disk that has just started to show some "Completed: read failure" errors and apparently as such, Truenas has decided to put it to a FAULTED state which in turn has put my pool in a DEGRADED state.
The pool is in Z2 and Replacement has be ordered, I'm just trying to understand exactly how this happened in term of signal and automatic action taken by Truenas.
My SMART test are scheduled as follow:
SHORT weekly
LONG monthly
( I do realize now that OFFLINE tests are also schedulable)
When I look at the SMART Test results of the disk in error, I see only `Short offline` and `Extended Offline`.
I do suppose `Extended Offline` corresponds to scheduled LONG tests but that's a bit troubling. What would scheduled OFFLINE be displayed as?
In any case, for the disk in question, I have a first FAILED `Extended Offline` dating back 16 days ago, and a first FAILED `Short offline` dating back from this morning. (though I have manually started an Offline test yesterday evening when discovering the issue, which should still be running now)
BUT the Truenas alert that triggered the FAULTED status on the disk dates back from a day ago, i.e. before (or long after the failed extended) a SMART test started to report the error:
Pool subramanya state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
Hence my question: how did TrueNAS take the decision to put this disk in FAULTED? And what signal did it get that triggered the action?
Also, AFAIK, Offline uncorrectable sectors even though very bad sign, do not directly lead to a run time problem for a disk. Unreadable sector should be marked as such and the disk would still continue to operate.
Shall I put the disk back ONLINE, or perform a full resilver as a last test before replacing it?
The pool is in Z2 and Replacement has be ordered, I'm just trying to understand exactly how this happened in term of signal and automatic action taken by Truenas.
My SMART test are scheduled as follow:
SHORT weekly
LONG monthly
( I do realize now that OFFLINE tests are also schedulable)
When I look at the SMART Test results of the disk in error, I see only `Short offline` and `Extended Offline`.
I do suppose `Extended Offline` corresponds to scheduled LONG tests but that's a bit troubling. What would scheduled OFFLINE be displayed as?
In any case, for the disk in question, I have a first FAILED `Extended Offline` dating back 16 days ago, and a first FAILED `Short offline` dating back from this morning. (though I have manually started an Offline test yesterday evening when discovering the issue, which should still be running now)
BUT the Truenas alert that triggered the FAULTED status on the disk dates back from a day ago, i.e. before (or long after the failed extended) a SMART test started to report the error:
Pool subramanya state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
- Disk WDC WD30EFRX-68EUZN0 WD-WCC4N7HZ39J7 is FAULTED
Hence my question: how did TrueNAS take the decision to put this disk in FAULTED? And what signal did it get that triggered the action?
Also, AFAIK, Offline uncorrectable sectors even though very bad sign, do not directly lead to a run time problem for a disk. Unreadable sector should be marked as such and the disk would still continue to operate.
Shall I put the disk back ONLINE, or perform a full resilver as a last test before replacing it?