smartd failing to start FreeNAS-9.10.1 running on Dell R730xd

Status
Not open for further replies.

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
I would greatly appreciate some help to diagnose an issue I have with S.M.A.R.T. failing to start.

FreeNAS Version: FreeNAS-9.10.1 (d989edd)

Hardware: Dell R730xd fitted with a PERC H730 Mini controller set to HBA mode.

On install the Alert indicator in the UI reports that smartd is not running.

Switching to the Services page in the UI and sliding the “S.M.A.R.T” switch in the UI causes an animated bar to appear, when it disappears the switch is still in the off position.

/var/log/messages contains:
Code:
Oct  6 13:38:38 grey notifier: Starting smartd.
Oct  6 13:38:38 grey root: /usr/local/etc/rc.d/smartd: WARNING: failed to start smartd
Oct  6 13:38:38 grey notifier: /usr/local/etc/rc.d/smartd: WARNING: failed to start smartd


I have configured a single short test and smartd.conf has content:
Code:
cat /usr/local/etc/smartd.conf
################################################
# smartd.conf generated by /etc/ix.rc.d/ix-smartd
################################################
/dev/mfisyspd0 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd1 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd4 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd5 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd6 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd7 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd8 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd9 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd10 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd11 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py
/dev/mfisyspd2 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py -s S/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1|2|3|4|5|6|7)/(19)
/dev/mfisyspd3 -a -n never -W 0,0,0 -m root -M exec /usr/local/www/freenasUI/tools/smart_alert.py -s S/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1|2|3|4|5|6|7)/(19)
 
Last edited by a moderator:

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
The HDDs are reporting that they are SMART capable

Code:
# camcontrol devlist
<HGST HUS726060AL5214 KK06>  at scbus0 target 0 lun 0 (pass0)
<HGST HUS726060AL5214 KK06>  at scbus0 target 1 lun 0 (pass1)
<HGST HUS726060AL5214 KK06>  at scbus0 target 2 lun 0 (pass2)
<HGST HUS726060AL5214 KK06>  at scbus0 target 3 lun 0 (pass3)
<HGST HUS726060AL5214 KK06>  at scbus0 target 4 lun 0 (pass4)
<HGST HUS726060AL5214 KK06>  at scbus0 target 5 lun 0 (pass5)
<HGST HUS726060AL5214 KK06>  at scbus0 target 6 lun 0 (pass6)
<HGST HUS726060AL5214 KK06>  at scbus0 target 7 lun 0 (pass7)
<HGST HUS726060AL5214 KK06>  at scbus0 target 8 lun 0 (pass8)
<HGST HUS726060AL5214 KK06>  at scbus0 target 9 lun 0 (pass9)
<HGST HUS726060AL5214 KK06>  at scbus0 target 10 lun 0 (pass10)
<HGST HUS726060AL5214 KK06>  at scbus0 target 11 lun 0 (pass11)
<SanDisk LT0200MO D40Z>  at scbus0 target 12 lun 0 (pass12)
<SanDisk LT0200MO D40Z>  at scbus0 target 13 lun 0 (pass13)


Code:
# smartctl -a /dev/pass0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:  HGST
Product:  HUS726060AL5214
Revision:  KK06
Compliance:  SPC-4
User Capacity:  6,001,175,126,016 bytes [6.00 TB]
Logical block size:  512 bytes
Physical block size:  4096 bytes
Formatted with type 2 protection
LU is fully provisioned
Rotation Rate:  7200 rpm
Form Factor:  3.5 inches
Logical Unit id:  0x5000cca24d1eb0cc
Serial number:  NCGJW7HT
Device type:  disk
Transport protocol:  SAS (SPL-3)
Local Time is:  Thu Oct  6 11:15:49 2016 BST
SMART support is:  Available - device has SMART capability.
SMART support is:  Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:  31 C
Drive Trip Temperature:  50 C

Manufactured in week 51 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  48
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  56
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 2571175460864

Error counter log:
  Errors Corrected by  Total  Correction  Gigabytes  Total
  ECC  rereads/  errors  algorithm  processed  uncorrected
  fast | delayed  rewrites  corrected  invocations  [10^9 bytes]  errors
read:  0  0  0  0  680  7.077  0
write:  0  0  0  0  758  2.567  0
verify:  0  0  0  0  25336  1.706  0

Non-medium error count:  0

SMART Self-test log
Num  Test  Status  segment  LifeTime  LBA_first_err [SK ASC ASQ]
  Description  number  (hours)
# 1  Background short  Completed  80  6  - [-  -  -]
# 2  Reserved(7)  Completed  64  6  - [-  -  -]

Long (extended) Self Test duration: 45749 seconds [762.5 minutes]


Furthermore I have used smartctl to initiate and obtain results of a test on a drive
# smartctl -t short /dev/pass0

Code:
# smartctl -l selftest /dev/pass0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log
Num  Test  Status  segment  LifeTime  LBA_first_err [SK ASC ASQ]
  Description  number  (hours)
# 1  Background short  Completed  80  201  - [-  -  -]
# 2  Background short  Completed  80  200  - [-  -  -]
# 3  Background short  Completed  80  6  - [-  -  -]
# 4  Reserved(7)  Completed  64  6  - [-  -  -]

Long (extended) Self Test duration: 45749 seconds [762.5 minutes]
 

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
If I start the daemon on the command line in debug mode I get the following:
#smartd -d
Code:
smartd 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org


Opened configuration file /usr/local/etc/smartd.conf
Configuration file /usr/local/etc/smartd.conf parsed.
Device: /dev/mfisyspd0, unable to autodetect device type
Device: /dev/mfisyspd1, unable to autodetect device type
Device: /dev/mfisyspd4, unable to autodetect device type
Device: /dev/mfisyspd5, unable to autodetect device type
Device: /dev/mfisyspd6, unable to autodetect device type
Device: /dev/mfisyspd7, unable to autodetect device type
Device: /dev/mfisyspd8, unable to autodetect device type
Device: /dev/mfisyspd9, unable to autodetect device type
Device: /dev/mfisyspd10, unable to autodetect device type
Device: /dev/mfisyspd11, unable to autodetect device type
Device: /dev/mfisyspd2, unable to autodetect device type
Device: /dev/mfisyspd3, unable to autodetect device type
Unable to monitor any SMART enabled devices. Try debug (-d) option. Exiting...


So it appears that smartd cannot determine the device type even though smartctl will happily control and interrogate the devices.
 
Last edited:

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
PERC H730 Mini controller set to HBA mode.
This may be your issue... I am unsure if the HBA mode is truly HBA.

Will have to do a little checking and see what Chipset this has, but maybe others will chime in as well...
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If I start the daemon on the command line in debug mode I get the following:
#smartd -d
Code:
smartd 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
 
 
Opened configuration file /usr/local/etc/smartd.conf
Configuration file /usr/local/etc/smartd.conf parsed.
Device: /dev/mfisyspd0, unable to autodetect device type
Device: /dev/mfisyspd1, unable to autodetect device type
Device: /dev/mfisyspd4, unable to autodetect device type
Device: /dev/mfisyspd5, unable to autodetect device type
Device: /dev/mfisyspd6, unable to autodetect device type
Device: /dev/mfisyspd7, unable to autodetect device type
Device: /dev/mfisyspd8, unable to autodetect device type
Device: /dev/mfisyspd9, unable to autodetect device type
Device: /dev/mfisyspd10, unable to autodetect device type
Device: /dev/mfisyspd11, unable to autodetect device type
Device: /dev/mfisyspd2, unable to autodetect device type
Device: /dev/mfisyspd3, unable to autodetect device type
Unable to monitor any SMART enabled devices. Try debug (-d) option. Exiting...


So it appears that smartd cannot determine the device type even though smartctl will happily control and interrogate the devices.
Well, there's your problem. You're using mfi, meaning hardware RAID. Stop doing that.

Will have to do a little checking and see what Chipset this has, but maybe others will chime in as well...
SAS3008, most likely.

To be fair, LSI SAS3 stuff is supposed to be capable of using direct-attach drives even in Hardware RAID mode, I think. However, effort will be needed to debug and validate this and it will have to come from interested users.

If you can prove that SAS3 stuff is supposed to allow for direct-attach, I recommend you file a bug report to try and coax smartctl into properly detecting drives.
 

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
This may be your issue... I am unsure if the HBA mode it truly HBA.

Will have to do a little checking and see what Chipset this has, but maybe others will chime in as well...

Thanks. I was constrained as to what I could buy when we acquired this server but before purchasing I did check that the PERC H730 was in the FreeBSD hardware compatibility list https://www.freebsd.org/relnotes/9-STABLE/hardware/support.html#disk

I could understand if nothing worked but being able to run a test using smartctl suggests that most of it is working.
 

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
Good catch, I totally overlooked the "mfi" info in the output.

So the catch there is knowing that "mfi" in the device string means "hardware raid" why would I not know that! o_O

OK so next dumb question coming, excuse the newbie. Should the smartd.conf file have entries to match those used in smartctl e.g. /dev/pass0 ? NB I didn't populate the smartd.conf file.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
So the catch there is knowing that "mfi" in the device string means "hardware raid" why would I not know that!
That was actually a poke at myself for overlooking the entry not at you... :P

As far as the HBA Mode, maybe reference this article from Dell to ensure you did the steps outlined?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
OK so next dumb question coming, excuse the newbie. Should the smartd.conf file have entries to match those used in smartctl e.g. /dev/pass0 ? NB I didn't populate the smartd.conf file.
I think so, but it's something that needs to be fixed at the middleware level, since smartd.conf can be overwritten whenever the middleware feels like it, hence the bug ticket suggestion.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
SAS3008, most likely.
I am thinking that it actually may be a SAS3108, but still too lazy to check.

OP, can you post the output of sas2flash -listall (In CODE Tags please)? That should tell us the chipset detected.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Hmm, while I agree this is related to a HW RAID passthrough issue, it's strange that camcontrol devlist shows /dev/passXX devices which show SMART data, but the SMART configuration is using /dev/mfiXX devices which does not.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I am thinking that it actually may be a SAS3108, but still too lazy to check.
Indeed, 3108.

I hope someone takes interest in validating the use of these things as HBAs, since it'd be nice to be able to say "all LSI SAS3 products are fine and dandy".
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Hmm, while I agree this is related to a HW RAID passthrough issue, it's strange that camcontrol devlist shows /dev/passXX devices which show SMART data, but the SMART configuration is using /dev/mfiXX devices which does not.
I'm thinking that might be some trickery to allow the new HBA functionality to work in the mfi stack without breaking older SAS1 and SAS2 devices which don't support anything but RAID.

Edit: And without a complete rewrite that only keeps the mfi name for SAS3 devices, despite using completely different code. Which would break older drivers with new hardware...
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Submit a bug report.

The drives support smart. Smartd is either flipping out, needs education, or is misconfigured.
 

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
That was actually a poke at myself for overlooking the entry not at you... :p

As far as the HBA Mode, maybe reference this article from Dell to ensure you did the steps outlined?

Sorry I wasn't having a crack, more commenting on the significance of the device naming convention; I'm evidently going to have to educate myself!

I was fully of the opinion that the hardware RAID was disabled and I suppose what I was lacking was the ability to ask FreeNAS / FreeBSD what it thought.

I hadn't read the Dell article but on 1st boot when we acquired the machine I had a good dig about in the BIOS and the options I set then match those given in the article.

The POST reports:
Code:
14 Non-RAID Disk(s) found on the host adapter
14 Non-RAID Disk(s) handled by the BIOS


I suppose the thing that is confusing me is the difference in opnion between smartctl and whatever populated the smartd.conf file (/etc/ix.rc.d/ix-smartd?)

I have a number of options:
  • If there's just an issue with the auto-population of smartd.conf then maybe I can hack that and put "correct" values in there, assuming I can work out what they should be.
  • Or I write some scripts to run smartctl and run them from cron
  • If the whole FreeNAS setup (rather than just the smartd config) still believes there is hardware RAID in force then this would cause me to question the prudence of continuing with FreeNAS and I'll have to resort to another server OS.
I shall continue to investigate and haul myself up the FreeBSD learning curve.
 

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
I am thinking that it actually may be a SAS3108, but still too lazy to check.

OP, can you post the output of sas2flash -listall (In CODE Tags please)? That should tell us the chipset detected.

It would appear sas2flash expects LSI adapters and the PERC is not that!
Code:
# sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

   No LSI SAS adapters found! Limited Command Set Available!
   ERROR: Command Not allowed without an adapter!
   ERROR: Couldn't Create Command -listall
   Exiting Program.


I know the community strongly recommends LSI adapters but unfortunately when purchasing this machine I didn't have the option. The PERC H730 is listed in the FreeBSD HCL as supported by the mrsas(4) driver
 

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
Following up on my previous post apparently the mrsas(4) driver https://www.freebsd.org/cgi/man.cgi?query=mrsas&sektion=4&manpath=freebsd-release-ports is "substantially different than the old "MegaRAID" Driver mfi(4)".

I clearly need to work out what driver is in use on my machine. I'm assuming (and could well be wrong) that the "mfi" characters at the start of the device names in smartd.conf implies that the mfi driver is being used rather than mrsas.https://www.freebsd.org/cgi/man.cgi...opos=0&manpath=FreeBSD+10.3-RELEASE+and+Ports
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
SAS3 needs sas3flash, not sas2flash.
 

chownp

Dabbler
Joined
Oct 6, 2016
Messages
15
mrsas(4) man page confirms "The mrsas driver exposes devices as /dev/da?, whereas mfi(4) exposes devices as /dev/mfid?"

OK, I'm learning a little more. Ran dmesg() which produced the following:
Code:
pci2: <ACPI PCI bus> on pcib3
AVAGO MegaRAID SAS FreeBSD mrsas driver version: 06.709.07.00-fbsd
mfi0: <Invader> port 0x2000-0x20ff mem 0x92000000-0x9200ffff,0x91f00000-0x91ffffff irq 26 at device 0.0 on pci2
mfi0: Using MSI
mfi0: Megaraid SAS driver Ver 4.23
mfi0: FW MaxCmds = 928, limiting to 128
mfi0: MaxCmd = 928, Drv MaxCmd = 128, MaxSgl = 70, state = 0xb73c03a0
mfip0: <SCSI Passthrough Bus> on mfi0
pcib4: <ACPI PCI-PCI bridge> irq 47 at device 2.0 on pci0
pci3: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> irq 47 at device 3.0 on pci0
pci1: <ACPI PCI bus> on pcib5


I *think* that means the mrsas driver is loading.

later on we have the following repeated for each drive.
Code:
mfisyspd0 on mfi0
mfisyspd0: 5723166MB (11721045168 sectors) SYSPD volume (deviceid: 0)
mfisyspd0:  SYSPD volume attached
mfisyspd1 on mfi0
mfisyspd1: 5723166MB (11721045168 sectors) SYSPD volume (deviceid: 1)
mfisyspd1:  SYSPD volume attached


Does the "mfi*" above imply that although the mrsas driver is loading it is not being used?

These are followed by
Code:
mfi0: 2783 (boot + 46s/0x0002/info) - Inserted: PD 00(e0x20/s0)
mfi0: 2784 (boot + 46s/0x0002/info) - Inserted: PD 00(e0x20/s0) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=5000cca24d1eb0cd,0000000000000000

for each drive, and then a batch of
Code:
pass0 at mfi0 bus 0 scbus0 target 0 lun 0
pass0: <HGST HUS726060AL5214 KK06> Fixed Uninstalled SPC-4 SCSI device
pass0: Serial Number NCGJW7HT
pass0: 150.000MB/s transfers


The mrsas(4) man page states "A disk (virtual disk/physical disk) attached to the mrsas driver will be visible to the user through camcontrol(8) as /dev/da? device nodes." but at present I can't get camcontrol to list anything other than the passthough devices e.g. pass<n>

If I ls /dev I get entries of the format mfisyspd<n> and pass<n> but no da<n>
 
Last edited:
Status
Not open for further replies.
Top