SOLVED Truenas SCALE (Bluefin) smartd is not running

DonZalmrol

Dabbler
Joined
Oct 6, 2021
Messages
14
Hi,

TLDR : SMART simply won't start.

Full story: I've recently upgraded from CORE to SCALE Angelfish and tested everything out, then I did an upgrade from Angelfish to Bluefin and received an issue with S.M.A.R.T. not running/starting. I then did a clean install from scratch with Bluefin (22.12.0) and I keep getting the same issue.

While everything worked without any issues at all with CORE (running stable for +- 2 years) and then this weekend with Angelfish.

Hardware:
  • HP DL380 Gen9
  • HP P440AR controller in HBA mode
  • HPE 12Gb SAS Expander Card
  • 128GB RAM DDR4 ECC
  • Intel Xeon E5-2630L v3 @ 1.80GHz
  • Nvidia Quadro P2000 5GB
  • 2x Intel SSD 120GB boot-pool
  • 2x Samsung SSD 512 mirrored pool for APPS storage
  • 5x 14TB HGST DC HC530 for DATA storage
Errors:
[EFAULT] Jan 08 16:48:37 systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon... Jan 08 16:48:37 smartd[72413]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build) Jan 08 16:48:37 smartd[72413]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org Jan 08 16:48:37 smartd[72413]: Opened configuration file /etc/smartd.conf Jan 08 16:48:37 smartd[72413]: Configuration file /etc/smartd.conf parsed but has no entries Jan 08 16:48:37 smartd[72413]: Unable to monitor any SMART enabled devices. Try debug (-d) option. Exiting... Jan 08 16:48:37 systemd[1]: smartmontools.service: Main process exited, code=exited, status=17/n/a Jan 08 16:48:37 systemd[1]: smartmontools.service: Failed with result 'exit-code'. Jan 08 16:48:37 systemd[1]: Failed to start Self Monitoring and Reporting Technology (SMART) Daemon.

Error: Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 181, in call_method
result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1255, in _call
return await methodobj(*prepared_call.args)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1284, in nf
return await func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1152, in nf
res = await f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/service.py", line 168, in start
raise CallError(await service_object.failure_logs() or 'Service not running after start')
middlewared.service_exception.CallError: [EFAULT] Jan 08 16:48:37 systemd[1]: Starting Self Monitoring and Reporting Technology (SMART) Daemon...
Jan 08 16:48:37 smartd[72413]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Jan 08 16:48:37 smartd[72413]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
Jan 08 16:48:37 smartd[72413]: Opened configuration file /etc/smartd.conf
Jan 08 16:48:37 smartd[72413]: Configuration file /etc/smartd.conf parsed but has no entries
Jan 08 16:48:37 smartd[72413]: Unable to monitor any SMART enabled devices. Try debug (-d) option. Exiting...
Jan 08 16:48:37 systemd[1]: smartmontools.service: Main process exited, code=exited, status=17/n/a
Jan 08 16:48:37 systemd[1]: smartmontools.service: Failed with result 'exit-code'.
Jan 08 16:48:37 systemd[1]: Failed to start Self Monitoring and Reporting Technology (SMART) Daemon.

I have no idea what could be causing this, nothing special is set on my SMART configuration nor disks. And everything worked without any issues these past years with the exact same HW & controller.

EDIT: Could it be a missing driver in the newest release of Bluefin?

Thanks in advance!
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Lets hear from other users... is SMART working for you or not?
 

DonZalmrol

Dabbler
Joined
Oct 6, 2021
Messages
14
There seems to be a related bug scheduled for fixing in 22.12.1

Hmm might be related, though I'm getting a different message
2023-01-09_11h19_10.png


And my storage view
2023-01-09_11h18_43.png


I also just noticed that my IPMI is no longer recognized (HP ILO4):
2023-01-09_11h22_40.png


Really starting to think some drivers for the HPE DL380 Gen9 are not included in the linux image :)
 

rafi_1

Cadet
Joined
Aug 14, 2020
Messages
1
Hi,

TLDR : SMART simply won't start.

Full story: I've recently upgraded from CORE to SCALE Angelfish and tested everything out, then I did an upgrade from Angelfish to Bluefin and received an issue with S.M.A.R.T. not running/starting. I then did a clean install from scratch with Bluefin (22.12.0) and I keep getting the same issue.

While everything worked without any issues at all with CORE (running stable for +- 2 years) and then this weekend with Angelfish.

Hardware:
  • HP DL380 Gen9
  • HP P440AR controller in HBA mode
  • HPE 12Gb SAS Expander Card
  • 128GB RAM DDR4 ECC
  • Intel Xeon E5-2630L v3 @ 1.80GHz
  • Nvidia Quadro P2000 5GB
  • 2x Intel SSD 120GB boot-pool
  • 2x Samsung SSD 512 mirrored pool for APPS storage
  • 5x 14TB HGST DC HC530 for DATA storage
Errors:




I have no idea what could be causing this, nothing special is set on my SMART configuration nor disks. And everything worked without any issues these past years with the exact same HW & controller.

EDIT: Could it be a missing driver in the newest release of Bluefin?

Thanks in advance!

I have exactly the same problem. Smartd service can't start, because the config file /etc/smartd.conf is empty and can't be parsed. I get this problem since the update to Scale 22.12.0
I'm using a HPE ProLiant DL380p Gen8 server with a HPE Smart Array P822 Controller in HBA mode.

I also can't see the IPMI anymore inside the truenas gui like DonZalmrol
 

troyh

Cadet
Joined
Feb 16, 2023
Messages
3
I am having exactly the same problem(s) with a fresh install of SCALE 22.12.0 on a DL380 Gen9. The S.M.A.R.T service (smartmontools.service) will not start as the /etc/smartd.conf file is empty. Trying to add proper contents to the file manually simply results with it getting overwritten with an empty file when trying to start the service. Additionally IPMI does not show up in webUI but "ipmitool sensors" form the shell works fine.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
If its any help
Code:
/dev/sdc -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdc -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sda -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sda -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sde -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sde -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdf -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdf -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdy -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdy -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdx -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdx -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdw -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdw -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdv -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdv -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/nvme0n1 -d nvme -a -d removable -n never -W 0,0,0 -m root -M exec /usr/loca                                                                                                                           l/libexec/smart_alert.py\
-s S/../.././(00)\

/dev/nvme0n1 -d nvme -a -d removable -n never -W 0,0,0 -m root -M exec /usr/loca                                                                                                                           l/libexec/smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/nvme1n1 -d nvme -a -d removable -n never -W 0,0,0 -m root -M exec /usr/loca                                                                                                                           l/libexec/smart_alert.py\
-s S/../.././(00)\

/dev/nvme1n1 -d nvme -a -d removable -n never -W 0,0,0 -m root -M exec /usr/loca                                                                                                                           l/libexec/smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdd -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdd -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdo -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdo -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdu -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdu -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sds -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sds -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdn -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdn -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdh -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdh -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdj -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdj -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdi -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdi -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdk -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdk -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdq -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdq -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdr -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdr -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdl -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdl -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdm -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdm -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdg -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdg -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdb -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdb -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdp -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s S/../.././(00)\

/dev/sdp -a -d removable -n never -W 0,0,0 -m root -M exec /usr/local/libexec/sm                                                                                                                           art_alert.py\
-s L/../(08|16)/./(02)\

/dev/sdt -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s S/../.././(00)\

/dev/sdt -a -d removable -n never -W 0,40,45 -m root -M exec /usr/local/libexec/                                                                                                                           smart_alert.py\
-s L/../(08|16)/./(02)\

is my smartd.conf - but its kinda system specific.
Interestingly it seems to contain duplicates (mostly)
 

troyh

Cadet
Joined
Feb 16, 2023
Messages
3
It looks like the reason that the SMART issue is only hitting HP / HPE ProLiant servers with Smart Array controllers is because the smartctl command does not seem to be picking up the drives using the "auto" device type. Explicitly using the cciss,N device type does allow smartctl to see the drives. After making that discovery I was able to find this post in the forum:

https://www.truenas.com/community/threads/21-08-smartd-smartctl-s-m-a-r-t-extra-options.95185/

In my case I have a P440ar and a P840 with a mix of SAS and SATA disk and using the device type "cciss,0" seems to work for all of them. Based on the information from the linked post I did a quick hack to hard code the cciss,0 as the args value in /usr/lib/python3/dist-packages/middlewared/common/smart/smartctl.py

Code:
    args = args + ["-d", "cciss,0"]
    return args


I then restarted the middlewared service(systemctl status middlewared), and I now have a fully populated /etc/smartd.conf file and am able to start the SMART services.

Still no clue why IPMI is not being detected.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
HPE Smart Array P822 Controller in HBA mode.

This isn't compatible with TrueNAS, please refer to


The CCISS driver is not particularly robust and putting it in a lobotomized "HBA mode" does not fix this; it just makes it a moronic RAID controller. You should replace the Smart Array controller with a true HBA if you want problem-free operation.
 

kongster

Cadet
Joined
Sep 14, 2022
Messages
2
Lets hear from other users... is SMART working for you or not?
I have the same error.
1677427260215.png


I am running TrueNAS-SCALE-22.12.1. I noticed this warning message only after upgraded to Bluefin and recent update to 22.12.1 didn't help.
I have a PC/Desktop type hardware, my CPU Alder Lake Core i5-12600K, 64GB DDR4, Nvme boot drive, and use on motherboard SATA ports connecting 4 WD 12TB WD Red Plus NAS internal HDD running ZFS. The motherboard model is GIGABYTE Z690 UD AX DDR4 (LGA 1700/ Intel Z690/ ATX/ DDR4/ Triple M.2/ PCIe 5.0/ USB 3.2 Gen2X2/ Type-C/WiFi 6/2.5GbE LAN/Motherboard).

Hope someone can point me to whether this warning is serious and what can I do to address it, thanks!

Anthony
 

DonZalmrol

Dabbler
Joined
Oct 6, 2021
Messages
14
Strange, I'm subscribed to my own topic, but hadn't received any updates... I can confirm with the latest update (valentines upgrade) the issues has disappeared and seems that SMART is running again.

I'll confirm if its now actually using the disks from my P440AR (HBA) correctly.
@jgreco noted about the HBA, its currently not in my budget to purchase two HBAs to drive my DL380 gen9. And it works without any issues (atm for 3 years) for my Truenas setup.
 
Last edited:

DonZalmrol

Dabbler
Joined
Oct 6, 2021
Messages
14
Checked yesterday evening and noticed that while SMART is running again, it is not detecting my disks. It seems I'm not able to change the smartconfig file as my sudo permissions are not working for some reason, I'm also against changing it a bit as I probably need to update the file after each upgrade and it not being a true HBA...

Will create a separate thread for a suitable HBA as I'm doubting between 2 types.
 

DonZalmrol

Dabbler
Joined
Oct 6, 2021
Messages
14
Solved by replacing the HP P440AR with the LSI SAS 9300-16i.
Funny side note, once replaced I could view the "old" SMART results...
 
Top