Reoccurring middlewared failures - smartd service failures - ServiceService.result():140

crfranck

Cadet
Joined
Jun 5, 2019
Messages
5
Looking for help/advice. For the last week EVERY night just after 0300 EST my server has produced Warnings, blocked tasks, LOCKED files, etc. This started while running 11.2-U4 and continued after upgrading to 11.2-U5. No system changes were made prior to this daily event taking place. My jails continue to function (PLEX, syncthing) but the core system is extremely sluggish (SMB, SSH, FreeNAS webgui, etc). I have only been able to recover by rebooting. No issues seen on reboot. Everything runs fine until just after 0300.

I have attached my middlewared.log file containing the entire event. Included you will see a date timestamp every minute (inserted via a running cron job to help track). I have also attached a process list from before the event (clean, no issues) and another after (you will see many periodic daily events LOCKED). Crontab is attached. I disabled the periodic daily within cron and still had the same issue so it does not seem to come from there.

FreeBSD Sherlock.local 11.2-STABLE FreeBSD 11.2-STABLE #0 r325575+6aad246318c(HEAD): Mon Jun 24 17:25:47 UTC 2019
Static IP is configured, DNS is set to an internal router and Google DNS. Not experiencing any network connectivity issues that I am aware of.
All drives report no errors with smartctl. Values are within normal limits.
Mirror 2x512GB SSD for jails, metadata, cache.
ZFS2 6x4TB drives in one pool.
Mirror 2x8TB in another pool. (This will be removed after an upgrade to just one 6x8TB pool).

Could there be a hardware issue with the motherboard? There is a 4-port SATA III PCIe controller in there as well.

Examples are below:

Code:
[2019/07/19 03:09:09] (WARNING) middlewared._loop_monitor_thread():1205 - Task seems blocked:
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/service.py", line 265, in _get_status
    running, pids = await f()
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/service.py", line 566, in _started_smartd
    result = await self._started("smartd")
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/service.py", line 372, in _started
    proc = await Popen(pgrep, shell=True, stdout=PIPE, stderr=PIPE, close_fds=True)
  File "/usr/local/lib/python3.6/asyncio/subprocess.py", line 210, in create_subprocess_shell
    stderr=stderr, **kwds)
  File "/usr/local/lib/python3.6/asyncio/base_events.py", line 1180, in subprocess_shell
    protocol, cmd, True, stdin, stdout, stderr, bufsize, **kwargs)
  File "/usr/local/lib/python3.6/asyncio/unix_events.py", line 200, in _make_subprocess_transport
    **kwargs)
  File "/usr/local/lib/python3.6/asyncio/base_subprocess.py", line 39, in __init__
    stderr=stderr, bufsize=bufsize, **kwargs)
  File "/usr/local/lib/python3.6/asyncio/unix_events.py", line 706, in _start
    universal_newlines=False, bufsize=bufsize, **kwargs)
  File "/usr/local/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.6/subprocess.py", line 1318, in _execute_child
    part = os.read(errpipe_read, 50000)

[2019/07/19 03:09:10] (WARNING) ServiceService.result():140 - Failed to get status for smartd

[2019/07/19 03:17:39] (WARNING) ServiceService.result():140 - Failed to get status for snmp

[2019/07/19 03:55:07] (WARNING) middlewared._loop_monitor_thread():1205 - Task seems blocked:
  File "/usr/local/lib/python3.6/site-packages/middlewared/schema.py", line 664, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/system.py", line 304, in info
    serial = await self._system_serial()
  File "/usr/local/lib/python3.6/site-packages/middlewared/plugins/system.py", line 356, in _system_serial
    stdout=subprocess.PIPE,
  File "/usr/local/lib/python3.6/asyncio/subprocess.py", line 225, in create_subprocess_exec
    stderr=stderr, **kwds)
  File "/usr/local/lib/python3.6/asyncio/base_events.py", line 1211, in subprocess_exec
    bufsize, **kwargs)
  File "/usr/local/lib/python3.6/asyncio/unix_events.py", line 200, in _make_subprocess_transport
    **kwargs)
  File "/usr/local/lib/python3.6/asyncio/base_subprocess.py", line 39, in __init__
    stderr=stderr, bufsize=bufsize, **kwargs)
  File "/usr/local/lib/python3.6/asyncio/unix_events.py", line 706, in _start
    universal_newlines=False, bufsize=bufsize, **kwargs)
  File "/usr/local/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.6/subprocess.py", line 1318, in _execute_child
    part = os.read(errpipe_read, 50000)

 

Attachments

  • middleware.d.log.txt
    11.8 KB · Views: 377
  • ps_after.txt
    11.1 KB · Views: 356
  • ps_before.txt
    7.7 KB · Views: 360
  • crontab.txt
    2.3 KB · Views: 360
D

dlavigne

Guest
It's probably worth creating a report at bugs.ixsystems.com. If you do, post the issue number here.
 

crfranck

Cadet
Joined
Jun 5, 2019
Messages
5
Just as an update... I disabled the smartd service and have NOT had the reoccurring issue. I am now in a a bit of trial and error to narrow down what exactly is happening. This will include conducting smart tests against each drive, trying to determine exactly which script is causing issues with smartd (if that is truly part of the problem), as well as removing a pool from the server to narrow down motherboard and PCIe controllers.

I will post any valuable updates that may help others as well as post any bug reports that result of the testing.
 

marcvk

Cadet
Joined
Dec 28, 2019
Messages
3
Hi,

did you ever manage to solve this problem? Since updating to 11.2-U7 I have exactly the same problems as you describe. I have "solved" it by rebooting. I'll try disabling smartd, but of course this should not count as a permanent solution.

Cheers,
Marc.
 

marcvk

Cadet
Joined
Dec 28, 2019
Messages
3
I disabled smartd, I don't get any smartd errors anymore (which seems logical), however the behaviour has not changed, in the morning the system is again very sluggish. I also get the same middleware errors as before. I'll make a bug report for this.
 

crfranck

Cadet
Joined
Jun 5, 2019
Messages
5
I don't have a specific answer but for now my errors have gone away. Rebooting always resolved the issue but it would normally come back. I found that certain periodic security tasks would never seem to finish and my guess is that this put such a strain on the system that other processes started to report issues. I ended up permanently deleting the various tasks that were causing issues (if it was still running by about ~6am I assumed it was never going to finish) - all of which were non-critical system audit related. My server is running two jails, and each jails shares a main data pool. It is a large media server and I realized that both jails and the root system were all running the same pkg checks with their respective cron jobs. Just too many files to deal with. Killing the tasks and restarting each cron would improve system performance... just like restarting middlewared... but it never was as good as rebooting. I then upgraded to 11.2-U7. I have been happy with U7.. U5 and U6 never seemed as stable for me as U4. NOTE: My issues with middlewared started out of nowhere... no system changes had taken place... just one morning I found things unresponsive after months (years?) of never having the issue... the cron jobs ran for years without me noticing... then one day it all blew up. Currently, I have been running without the middlewared issue since mid-November (1.5 months ago).
 
Top