RefereeBeau
Cadet
- Joined
- Aug 26, 2020
- Messages
- 9
Short version: Immediately after installing 11.1-U7, boot then run "zpool scrub freenas-boot", which finds errors. Repeatable after multiple retries. Boot SSD verified good.
Not clear where to look next. Help/pointers appreciated.
Longer description:
After installing 11.1-U7, upgrading to 11.3-U4, creating mirrored pool for backups from clients, getting SMB config and ACLs right so that client backups ran smoothly, I noticed a critical alert on the web console. The alert reported a degraded boot pool. Web searches generally said some variant on "bad boot hardware, replace the USB and reinstall".
The server was installed with a brand new WD internal SSD as the boot drive. No USB. "dmesg" showed no errors related to any disk or SSD.
First step: backed up the config, then reinstalled 11.1-U7, boot and scrub. Multiple errors reported. Repeated several times.
Second step: boot and run WD diagnostics from a bootCD. Zero errors on quick SMART test, extended SMART test, and writing 0's to entire SSD. Drive appears to be perfect.
Third step: download 11.1-U7 ISO again. Binary compare with original. No corruption in ISO image. Burn new installation DVD just in case.
Fourth step: install 11.1-U7 from DVD again. Final message was (again) "Installation completed. No errors reported."
Fifth step: Booted FreeNAS. No messages during the boot indicating disk errors. Immediately started shell on console and ran "zpool scrub freenas-boot" then "zpool status". Again, the same errors!
status: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
...
scan: scrub repaired 3.50K in 0 days 00:00:04 with 14 errors on <date>
...
errors: 9 data errors, use "-v" for a list
and "zpool status -v" showed:
<metadata>:<0x24>
<metadata>:<0x3f>
freenas-boot/ROOT/default:<0x0>
//
freenas-boot/ROOT/default@3030-09-06-00:56:07:<0x0>
plus four files in the python3.6 "__pycache__" directory.
The errors are not exactly the same each time I redo the install, and the error count fluctuates slightly (11, 14, 12, ...). But so far the files affected are always related to python.
I'm stumped. Help? Suggestions for next steps to debug (or for a fix)?
Thanks.
Not clear where to look next. Help/pointers appreciated.
Longer description:
After installing 11.1-U7, upgrading to 11.3-U4, creating mirrored pool for backups from clients, getting SMB config and ACLs right so that client backups ran smoothly, I noticed a critical alert on the web console. The alert reported a degraded boot pool. Web searches generally said some variant on "bad boot hardware, replace the USB and reinstall".
The server was installed with a brand new WD internal SSD as the boot drive. No USB. "dmesg" showed no errors related to any disk or SSD.
First step: backed up the config, then reinstalled 11.1-U7, boot and scrub. Multiple errors reported. Repeated several times.
Second step: boot and run WD diagnostics from a bootCD. Zero errors on quick SMART test, extended SMART test, and writing 0's to entire SSD. Drive appears to be perfect.
Third step: download 11.1-U7 ISO again. Binary compare with original. No corruption in ISO image. Burn new installation DVD just in case.
Fourth step: install 11.1-U7 from DVD again. Final message was (again) "Installation completed. No errors reported."
Fifth step: Booted FreeNAS. No messages during the boot indicating disk errors. Immediately started shell on console and ran "zpool scrub freenas-boot" then "zpool status". Again, the same errors!
status: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
...
scan: scrub repaired 3.50K in 0 days 00:00:04 with 14 errors on <date>
...
errors: 9 data errors, use "-v" for a list
and "zpool status -v" showed:
<metadata>:<0x24>
<metadata>:<0x3f>
freenas-boot/ROOT/default:<0x0>
//
freenas-boot/ROOT/default@3030-09-06-00:56:07:<0x0>
plus four files in the python3.6 "__pycache__" directory.
The errors are not exactly the same each time I redo the install, and the error count fluctuates slightly (11, 14, 12, ...). But so far the files affected are always related to python.
I'm stumped. Help? Suggestions for next steps to debug (or for a fix)?
Thanks.