Smartd fails to start at boot, but manually starts ok.

Status
Not open for further replies.

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
I'm not sure if this is a bug, or something not configured right on my end.

System specs: i5-3570, 32gig, 10 drives in z3. M1015 controller. Freenas 8.3.1-p2.

On bootup, it seems like smartd is failing to start. Here's syslogs:

Code:
May  6 03:40:28 <console.info> nas May  6 03:41:41 kernel: Starting smartd.
May  6 03:40:29 <daemon.crit> nas May  6 03:41:41 smartd[2324]: Device: /dev/ada3, can't monitor Temperature, ignoring -W Directive
May  6 03:40:29 <console.info> nas May  6 03:41:41 kernel: May  6 03:41:41 nas smartd[2324]: Device: /dev/ada3, can't monitor Temperature, ignoring -W Directive
May  6 03:40:29 <daemon.crit> nas May  6 03:41:41 smartd[2324]: Unable to register ATA device /dev/ada3 at line 14 of file /usr/local/etc/smartd.conf
May  6 03:40:29 <console.info> nas May  6 03:41:41 kernel: May  6 03:41:41 nas smartd[2324]: Unable to register ATA device /dev/ada3 at line 14 of file /usr/local/etc/smartd.conf
May  6 03:40:29 <daemon.crit> nas May  6 03:41:41 smartd[2324]: Unable to register device /dev/ada3 (no Directive -d removable). Exiting.
May  6 03:40:29 <console.info> nas May  6 03:41:41 kernel: May  6 03:41:41 nas smartd[2324]: Unable to register device /dev/ada3 (no Directive -d removable). Exiting.
May  6 03:40:29 <user.notice> nas May  6 03:41:41 root: /etc/rc: WARNING: failed to start smartd
May  6 03:40:29 <console.info> nas May  6 03:41:41 kernel: /etc/rc: WARNING: failed to start smartd



Confirming smartd is not running:

Code:
root@nas ~ # ps aux | grep smart
root  3133  0.0  0.0  8980  1352   0  R+    3:46AM   0:00.00 grep smart


/dev/ada3 is a small sata ssd I'm using to boot from instead of usb flash. I got tired of the delay changing settings, and writing to the drive. I had this handy, and have lots of free sata ports, so I used it.

Here's smartd.conf:

Code:
root@nas ~ # cat /usr/local/etc/smartd.conf
################################################
# smartd.conf generated by /etc/rc.d/ix-smartd
################################################
/dev/da0 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da4 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da1 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da2 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da3 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/ada2 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da6 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/ada1 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/ada0 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da5 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/ada3 -n never -W 5,35,45 -m my.email@nowhere.com


It is showing that ada3 is configured in smartd.

However, under "view disks", ada3 is not listed so I can't see if the "enable smart" option is checked or not.

Also, in the scheduled smart tests section, the list of drives you get upon editing the schedule doesn't show ada3. All other drives are listed, and I have long smart tests scheduled weekly. Those long tests work fine if smartd is running.

Here's logs when I disable smartd, then re-enable it under "control services":

Code:
May  6 03:48:39 <user.debug> nas May  6 03:49:52 manage.py: [middleware.notifier:177] Calling: restart(smartd)
May  6 03:48:39 <user.debug> nas May  6 03:49:52 manage.py: [middleware.notifier:135] Executing: /usr/sbin/service ix-smartd quietstart
May  6 03:48:39 <user.debug> nas May  6 03:49:52 manage.py: [middleware.notifier:149] Executed: /usr/sbin/service ix-smartd quietstart
May  6 03:48:39 <user.debug> nas May  6 03:49:52 manage.py: [middleware.notifier:135] Executing: /usr/sbin/service smartd forcestop
May  6 03:48:39 <user.debug> nas May  6 03:49:52 manage.py: [middleware.notifier:149] Executed: /usr/sbin/service smartd forcestop
May  6 03:48:39 <user.debug> nas May  6 03:49:52 manage.py: [middleware.notifier:135] Executing: /usr/sbin/service smartd restart
May  6 03:48:39 <user.debug> nas May  6 03:49:52 manage.py: [middleware.notifier:149] Executed: /usr/sbin/service smartd restart
May  6 03:48:42 <user.debug> nas May  6 03:49:55 manage.py: [middleware.notifier:170] Popen()ing: /bin/pgrep -F /var/run/smartd.pid smartd
May  6 03:48:44 <user.debug> nas May  6 03:49:56 manage.py: [middleware.notifier:177] Calling: restart(smartd)
May  6 03:48:44 <user.debug> nas May  6 03:49:56 manage.py: [middleware.notifier:135] Executing: /usr/sbin/service ix-smartd quietstart
May  6 03:48:44 <user.debug> nas May  6 03:49:56 manage.py: [middleware.notifier:149] Executed: /usr/sbin/service ix-smartd quietstart
May  6 03:48:44 <user.debug> nas May  6 03:49:56 manage.py: [middleware.notifier:135] Executing: /usr/sbin/service smartd forcestop
May  6 03:48:44 <user.debug> nas May  6 03:49:56 manage.py: [middleware.notifier:149] Executed: /usr/sbin/service smartd forcestop
May  6 03:48:44 <user.debug> nas May  6 03:49:56 manage.py: [middleware.notifier:135] Executing: /usr/sbin/service smartd restart
May  6 03:48:48 <user.debug> nas May  6 03:50:01 manage.py: [middleware.notifier:149] Executed: /usr/sbin/service smartd restart
May  6 03:48:48 <user.debug> nas May  6 03:50:01 manage.py: [middleware.notifier:170] Popen()ing: /bin/pgrep -F /var/run/smartd.pid smartd



Confirming it's running:

Code:
root@nas ~ # ps aux | grep smart
root  3985  0.0  0.0 13292  3316  ??  I     3:50AM   0:00.00 /usr/local/sbin/smartd -i 1800 -c /usr/local/etc/smartd.conf -p /var/run/smartd.pid
root  4081  0.0  0.0  8980  1404   0  S+    3:53AM   0:00.00 grep smart



Checking smartd.conf again after manually restarting it:

Code:
root@nas ~ # cat /usr/local/etc/smartd.conf
################################################
# smartd.conf generated by /etc/rc.d/ix-smartd
################################################
/dev/da0 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da4 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da1 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da2 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da3 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/ada2 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da6 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/ada1 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/ada0 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)
/dev/da5 -n never -W 5,35,45 -m my.email@nowhere.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1)/(02)


It started now because ada3 is no longer in the config file. The only thing that changed was manually restarting the service from the gui.

Smartctl 'info' for the ssd in question:

Code:
root@nas ~ # smartctl -i -q noserial /dev/ada3
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     SanDisk SSD P4 8GB
Firmware Version: SSD 8.10
User Capacity:    8,012,390,400 bytes [8.01 GB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 2d
Local Time is:    Mon May  6 04:07:42 2013 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled



How is ada3 making it into smartd.conf anyway? There's no mention of it in the GUI (that I can see), so I'm at a loss as how to correct this. It is of course listed in "sysctl kern.disks". Why is there a difference between how smartd.conf is generated initially, and how it's generated upon manually restarting the service?

Having to manually kick over smartd each time the server is rebooted is a pain. If I forget, it won't warn me about high drive temperature. It also won't run any scheduled smart tests. And if a drive drops out, it won't warn me that it can't open that drive. I don't understand what's going on 'behind the scenes'.

Any additional information that's needed, let me know, and I'll post it.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
It is showing that ada3 is configured in smartd.

However, under "view disks", ada3 is not listed so I can't see if the "enable smart" option is checked or not.
Seems like it ought to be listed.

How is ada3 making it into smartd.conf anyway? There's no mention of it in the GUI (that I can see), so I'm at a loss as how to correct this. It is of course listed in "sysctl kern.disks". Why is there a difference between how smartd.conf is generated initially, and how it's generated upon manually restarting the service?
Did you used to have a disk ada3 in the past? See what the following shows:
Code:
sqlite3 /data/freenas-v1.db '.dump storage_disk'

sqlite3 /data/freenas-v1.db '.dump system_smarttest'

sqlite3 /data/freenas-v1.db '.dump system_smarttest_smarttest_disks'
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Ok, here we go:

Code:
root@nas ~ # sqlite3 /data/freenas-v1.db '.dump storage_disk'
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE "storage_disk" ("disk_enabled" bool, "disk_acousticlevel" varchar(120), "disk_hddstandby" varchar(120), "disk_serial" varchar(30), "disk_multipath_name" varchar(30), "disk_identifier" varchar(42), "disk_togglesmart" bool, "disk_advpowermgmt" varchar(120), "disk_transfermode" varchar(120), "disk_multipath_member" varchar(30) NOT NULL DEFAULT '', "disk_description" varchar(120), "disk_smartoptions" varchar(120), "id" integer PRIMARY KEY, "disk_name" varchar(120));
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',7,'da0');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',13,'da4');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',16,'da1');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',17,'da2');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',24,'da3');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',37,'ada2');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',38,'da6');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',42,'ada1');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',43,'ada0');
INSERT INTO "storage_disk" VALUES(1,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',45,'da5');
INSERT INTO "storage_disk" VALUES(0,'Disabled','Always On','disk-serial','','{serial}disk-serial',1,'Disabled','Auto','','','',46,'ada3');
COMMIT;

Code:
root@nas ~ # sqlite3 /data/freenas-v1.db '.dump system_smarttest'
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE "system_smarttest" ("smarttest_dayweek" varchar(100), "smarttest_daymonth" varchar(100), "smarttest_month" varchar(100), "smarttest_type" varchar(2), "id" integer PRIMARY KEY, "smarttest_hour" varchar(100), "smarttest_desc" varchar(120));
INSERT INTO "system_smarttest" VALUES('1','*','1,2,3,4,5,6,7,8,9,10,11,12','L',1,'02','storage-long-tests');
COMMIT;

Code:
root@nas ~ # sqlite3 /data/freenas-v1.db '.dump system_smarttest_smarttest_disks'
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE "system_smarttest_smarttest_disks" ("id" integer NOT NULL PRIMARY KEY, "smarttest_id" integer NOT NULL, "disk_id" integer NOT NULL);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(1,1,7);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(2,1,13);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(3,1,16);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(4,1,17);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(5,1,24);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(6,1,37);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(7,1,38);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(8,1,42);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(9,1,43);
INSERT INTO "system_smarttest_smarttest_disks" VALUES(11,1,45);
CREATE UNIQUE INDEX "system_smarttest_smarttest_disks_smarttest_id__disk_id" ON "system_smarttest_smarttest_disks"("smarttest_id", "disk_id");
CREATE INDEX "system_smarttest_smarttest_disks_fabc7dbe" ON "system_smarttest_smarttest_disks" ("smarttest_id");
CREATE INDEX "system_smarttest_smarttest_disks_9f8f1583" ON "system_smarttest_smarttest_disks" ("disk_id");
COMMIT;



I've redacted the serial numbers from the output. Otherwise it's unchanged.

I think I see the problem? ada3 has "disk_togglesmart" ENABLED, however, "disk_enabled" is DISABLED. I assume disk_enabled being disabled prevents it from showing up in "view disks"? And I assume "disk_togglesmart" is used to build smartd.conf? So when smartd.conf is initially generated, it doesn't take into account that the whole disk is 'disabled'? But on subsequent 'building' of smartd.conf, it skips the disk because it's 'disabled', even though it's enabled for smart?

I ran the following command to disable smart on the already 'disabled' disk:

Code:
sqlite3 /data/freenas-v1.db 'update storage_disk set disk_togglesmart=0 where disk_serial=ssd_serial'


And rebooted. Smartd was successfully started on boot. smartd.conf only shows the correct spinning disks now.

So I guess the problem is whatever code generates smartd.conf on initial boot doesn't care that "disk_enabled" is 0. Whereas the code that re-generates smartd.conf on a manual service restart ignores such disk.

Is it 'correct' for disk_enabled to be 0 for that disk in my case? It's not a storage disk, so I guess that's why it was disabled? So it doesn't show up as a disk freenas should be able to 'manage'?

I suppose you may want a disk 'hidden' from 'view disks', but still want it smart monitored. If someone is booting off of a small spinning disk, disk_enabled would be 0 to hide it from the gui, but you'd probably still want smartd to monitor it.

I'm not totally clear on what 'disk_enabled' is used for when set to 0.

Anyway, I'm not sure if this is a bug or not. I've solved my problem for now. It's weird that there seems to be different logic that builds smartd.conf depending on whether the system is first booting, or the service is manually restarted.

Thanks for pointing me in the right direction.

Any more info needed, let me know.
 
Status
Not open for further replies.
Top