Bug in FreeNAS smartmontools

Status
Not open for further replies.

Stranded Camel

Explorer
Joined
May 25, 2017
Messages
79
I've got a five-disk RAID Z2 array made up of WD Gold 10 TB disks that I've had running for about half a year under FreeNAS-11.1-U5. I noticed a while ago that not one of these disks has ever successfully completed a single SMART test.

It doesn't matter if I run the tests under load or right after boot and without firing up any jails or my VM, nor if I run the tests normally or in captive mode, nor if I do short or long tests -- every single test always fails under FreeNAS! The failure consists of either an error (the two I get are "Aborted by host" and "Interrupted (host reset)") or of the test hanging at 90% completion, for weeks if I let it go that long.

So I was naturally worried about having bad disks, bad SATA cables, badly seated cables, bad SATA ports and so on -- we FreeNAS users are trained to blame hardware for everything, of course.

But before I started tearing my box apart and replacing components, on a hunch I booted it up using a live Linux USB stick and ran the smartmontools tests from it. And guess what? Everything works perfectly under Linux! Took close to 20 hours, but all the SMART tests proceeded accordingly and completed properly. No errors by the way.

The version of smartmontools I used under Linux wasn't even particularly new -- it was smartctl 6.6 2016-05-31 r4324. The version in FreeNAS 11.1-U5 is actually newer than that -- 6.6 2017-11-05 r4594.

Clearly, the FreeNAS version of smartmontools has a serious bug. The question is, is this a regression or are what look to be the same program (Linux and FreeBSD smartmontools) actually quite different software? I know that might seem odd, but a great deal of software that seems to be identical between GNU and FreeBSD (same names, etc) is actually radically different -- FreeBSD coreutils ( ls, find, grep, etc.), for example, are quite different, and extremely limited, compared to their GNU counterparts.

In any case, beware! Before doing anything based on your FreeNAS SMART info, confirm that there's actually a problem by booting into a live Linux distro and using its smartmonutils!
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,464
Clearly, the FreeNAS version of smartmontools has a serious bug.
Almost certainly not, actually. All it does is tell the disk to start the specified test, and (later) report what the disk says is the result of that task. The problem may not be hardware, but it almost certainly isn't smartmontools itself.
 

Stranded Camel

Explorer
Joined
May 25, 2017
Messages
79
Almost certainly not, actually. All it does is tell the disk to start the specified test, and (later) report what the disk says is the result of that task. The problem may not be hardware, but it almost certainly isn't smartmontools itself.

The two main errors, Aborted by host and Interrupted (host reset), clearly point to FreeNAS being the culprit, as it is the host. The question then becomes what component of FreeNAS is causing the failure? You are right that in these cases it may not be smartmontools that's at fault, but rather some other part of FreeNAS. In any case, on my system I have two options due to this problem -- live without SMART data, or boot into a live Linux environment and run SMART tests from there. Both suck.

As to the third error, which is tests hanging at 90% completion, since it doesn't happen with Linux's smartmontools, I can't see any alternative than the culprit being FreeNAS's version of this software. SMART and drive commands are a dark, arcane art, and it wouldn't be surprising at all if GNU smartmontools (which is used by about 70x more machines on the internet than BSDs [source]) had far better support for much more hardware than FreeNAS and other BSDs.
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
smartmontools is a third-party program. It's the same program on FreeBSD and Linux.

What is more likely is a difference in how FreeBSD or Linux are using or configuring the hardware. Looking at hardware info in the original post, I'm skeptical about the PCIe SATA host adapter: http://ableconn.com/support_1.php?gid=61
That's an ASM1062 SATA controller, which might be fine, and a JMicron JMB575, which ...might also be fine. Neither of those are generally recommended for FreeNAS. If possible, I would connect the drive to be tested to a motherboard SATA port, then test it. If it completes the tests, problem located.

Finally, beware misleading numbers like that website which claims to show how many machines use FreeBSD to host websites versus Linux. Those numbers might be accurate (although I wonder), but that does not show the full picture. Somewhere around 40-50% of internet traffic is from Netflix servers, although most don't serve websites. Guess which operating system their systems run?
 
Last edited:

Stranded Camel

Explorer
Joined
May 25, 2017
Messages
79
smartmontools is a third-party program. It's the same program on FreeBSD and Linux.

What is more likely is a difference in how FreeBSD or Linux are using or configuring the hardware. Looking at hardware info in the original post, I'm skeptical about the PCIe SATA host adapter: http://ableconn.com/support_1.php?gid=61
That's an ASM1062 SATA controller, which might be fine, and a JMicron JMB575, which ...might also be fine. Neither of those are generally recommended for FreeNAS. If possible, I would connect the drive to be tested to a motherboard SATA port, then test it. If it completes the tests, problem located.

Your suspicions are quite logical -- third-party hardware is always deserving of a skeptical glance.

However, my main zpool (5x10TB WD Golds) actually is connected to the motherboard (it's mentioned in my sig, but it's a small detail among much other info). So the card can't be the cause.

Ironically, my secondary zpool, made up of old, beaten-up drives that I really don't care too much about, is what's connected to the Ableconn card, and SMART tests work just fine on them. Go figure.

Finally, beware misleading numbers like that website which claims to show how many machines use FreeBSD to host websites versus Linux. Those numbers might be accurate (although I wonder), but that does not show the full picture. Somewhere around 40-50% of internet traffic is from Netflix servers, although most don't serve websites. Guess which operating system their systems run?

I only mentioned that because what concerns us here isn't traffic flow but number of installed machines, since the issue at hand is apparently compatibility with a wide assortment of hardware. Linux may be messy and chaotic, but it's been about a decade since I've even bothered to look at a Linux hardware compatibility list before buying machines -- 99.99% of hardware just works on Linux. And the rest requires a small bit of work.

FreeBSD obviously is a long way away from that point, unfortunately. And I say unfortunately, because while it's limited in a lot of ways, it's also probably the most stable OS ever made, except perhaps for the system that ran the Space Shuttle's computers. If it weren't for OS upgrades and power outages that outlast my UPS, I'd never have rebooted my FreeNAS install.
 
Status
Not open for further replies.
Top