I have some 10 scheduled scrubs (default setting, 35 days) that were working fine with 11.0-U2 but started emailing me errors after updating to 11.0-U3, errors that also persist with 11.0-U4.
What is strange is that the weekly scrub for the boot drive is working fine. The other scrubs send me two types of error emails:
Subject: Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 35 d4tb4-backups
or
Subject: Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 35 backup_vm
As I mentioned, the same scrubs were working perfectly fine in 11.0-U2 and were only sending an email with "starting scrub of...".
I saw another thread with tracebacks related to scrubs but the issue seems to be different in my case so I decided to open a new thread.
Any idea what might be the issue? Any suggestion on how to fix it or how to proceed?
Seeing that it appears to be a timeout of some sort, could it be related to the fact that some drives are spinned down at that particular time by the configured power management? Note that the same error also shows up for a drive that is guaranteed to be active at the time, but I guess the timeout would still happen if middlewared is waiting for some other drive to spin up (seeing that all the scrubs start at the same time). Could be that the weekly scrub works fine because it doesn't start at the same time as the other scrubs and doesn't require powering anything up.
What is strange is that the weekly scrub for the boot drive is working fine. The other scrubs send me two types of error emails:
Subject: Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 35 d4tb4-backups
Code:
Traceback (most recent call last): File "/usr/local/bin/midclt", line 10, in <module> sys.exit(main()) File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 325, in main with Client(uri=args.uri) as c: File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 117, in __init__ raise ClientException('Failed connection handshake') middlewared.client.client.ClientException: Failed connection handshake
or
Subject: Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 35 backup_vm
Code:
Traceback (most recent call last): File "/usr/local/bin/midclt", line 10, in <module> sys.exit(main()) File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 325, in main with Client(uri=args.uri) as c: File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 114, in __init__ self._ws.connect() File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 51, in connect rv = super(WSClient, self).connect() File "/usr/local/lib/python3.6/site-packages/ws4py/client/__init__.py", line 216, in connect bytes = self.sock.recv(128) socket.timeout: timed out starting first scrub (since reboot) of pool 'backup_vm'
As I mentioned, the same scrubs were working perfectly fine in 11.0-U2 and were only sending an email with "starting scrub of...".
I saw another thread with tracebacks related to scrubs but the issue seems to be different in my case so I decided to open a new thread.
Any idea what might be the issue? Any suggestion on how to fix it or how to proceed?
Seeing that it appears to be a timeout of some sort, could it be related to the fact that some drives are spinned down at that particular time by the configured power management? Note that the same error also shows up for a drive that is guaranteed to be active at the time, but I guess the timeout would still happen if middlewared is waiting for some other drive to spin up (seeing that all the scrubs start at the same time). Could be that the weekly scrub works fine because it doesn't start at the same time as the other scrubs and doesn't require powering anything up.