SOLVED FreeNAS Failure & Resilver Inspection

Alpha-Inc.

Dabbler
Joined
Feb 15, 2021
Messages
25
Hello everybody,

Every Monday night, my FreeNAS runs a scrub of my whole pool (11 drives á 10 TB configured in a RAID Z3). Today I woke up and got this messages from my FreeNAS Server:
Code:
- smartd is not running
- Pool Server state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state..

After I got from work i shut down my server, checked all my cables and put them off and back into the hdds. It appeared that one cable came off a little because when I booted my server the whole pool was ONLINE again but I got a new message:
Code:
Pool Server state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.

This brings me to my first question - why did a resilver-process took place? IIRC a resilver only takes place when a hdd is replaced and the data has to be written to the new disk.

Also, after running 'zpool status‘ i got this message:
Code:
status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected
action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear’ or replace the device with 'zpool replace‘.
see: http://illumos.org/msg/ZFS-8000-9P
scan: resilvered 360M in 0 day 00:01:25 with 0 errors on Mon Feb 15 17:17:40 2021

One disk had 10 CKSUM errors - after another reboot (because of an IP change done by my router) the zpool status output does not show them anymore.
This brings me to my next question - are all my files save or is there the possibility that some files have silently corrupted ?

Also, do you think i need to replace the failing disk or did those checksum errors only appear because of the disconnection of my hdd ?
 
Joined
Jan 7, 2015
Messages
1,155
The disk that was missing has out of sync data, so it resilvers. This could be caused by cabling. Do the zpool clear and do another scrub when the resilver completes. If it still finds the errors, then youll know. If the disks are healthy a scrub should fix it.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Also, do you think i need to replace the failing disk or did those checksum errors only appear because of the disconnection of my hdd ?
What's important to look at in deciding that is the SMART data.

Are you running SMART long and short tests?

Looking at smartctl -a /dev/adaX for that disk would be a start.
 

Alpha-Inc.

Dabbler
Joined
Feb 15, 2021
Messages
25
The disk that was missing has out of sync data, so it resilvers. This could be caused by cabling. Do the zpool clear and do another scrub when the resilver completes. If it still finds the errors, then youll know. If the disks are healthy a scrub should fix it.

Alright thank you.
 

Alpha-Inc.

Dabbler
Joined
Feb 15, 2021
Messages
25
What's important to look at in deciding that is the SMART data.

Are you running SMART long and short tests?

Looking at smartctl -a /dev/adaX for that disk would be a start.

I run SMART-Tests every week and it always ran out without any error.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
it always ran out without any error
Do you mean result = PASSED?

That doesn't mean there are no problems with the disk.

Check the details of the report values to see if you have errors.
 
Top