Scheduled scrub errors

birt · Oct 15, 2017

I have some 10 scheduled scrubs (default setting, 35 days) that were working fine with 11.0-U2 but started emailing me errors after updating to 11.0-U3, errors that also persist with 11.0-U4.

What is strange is that the weekly scrub for the boot drive is working fine. The other scrubs send me two types of error emails:
Subject: Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 35 d4tb4-backups

Code:

Traceback (most recent call last):
  File "/usr/local/bin/midclt", line 10, in <module>
	sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 325, in main
	with Client(uri=args.uri) as c:
  File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 117, in __init__
	raise ClientException('Failed connection handshake')
middlewared.client.client.ClientException: Failed connection handshake

or
Subject: Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 35 backup_vm

Code:

Traceback (most recent call last):
  File "/usr/local/bin/midclt", line 10, in <module>
	sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 325, in main
	with Client(uri=args.uri) as c:
  File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 114, in __init__
	self._ws.connect()
  File "/usr/local/lib/python3.6/site-packages/middlewared/client/client.py", line 51, in connect
	rv = super(WSClient, self).connect()
  File "/usr/local/lib/python3.6/site-packages/ws4py/client/__init__.py", line 216, in connect
	bytes = self.sock.recv(128)
socket.timeout: timed out
   starting first scrub (since reboot) of pool 'backup_vm'

As I mentioned, the same scrubs were working perfectly fine in 11.0-U2 and were only sending an email with "starting scrub of...".

I saw another thread with tracebacks related to scrubs but the issue seems to be different in my case so I decided to open a new thread.

Any idea what might be the issue? Any suggestion on how to fix it or how to proceed?

Seeing that it appears to be a timeout of some sort, could it be related to the fact that some drives are spinned down at that particular time by the configured power management? Note that the same error also shows up for a drive that is guaranteed to be active at the time, but I guess the timeout would still happen if middlewared is waiting for some other drive to spin up (seeing that all the scrubs start at the same time). Could be that the weekly scrub works fine because it doesn't start at the same time as the other scrubs and doesn't require powering anything up.

dlavigne · Oct 16, 2017

Please create a report at bugs.freenas.org and post the issue number here.

birt · Oct 16, 2017

https://bugs.freenas.org/issues/26204

Important Announcement for the TrueNAS Community.

Scheduled scrub errors

birt

Dabbler

dlavigne

Guest

birt

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Scheduled scrub errors

birt

Dabbler

dlavigne

Guest

birt

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Scheduled scrub errors"

Similar threads