Issue with Badblocks running on FreeNAS?

SubnetMask · Jan 8, 2018

I got eight used 2TB HGST SAS drives the end of December and per recommendations here, started running Badblocks on them. Since FreeNAS is BSD based, I ran the command to allow it to run right, and it's been running on all eight drives - since 12/22. So they've been running for just over three weeks. Does this seem normal, or is something not right? The other thing is I ran badblocks from the physical console so that I wouldn't have to mess with TMux and such over SSH - the problem is at this point, the console seems totally frozen - I can't switch between sessions, and the one it's on isn't updating. FreeNAS itself is still responsive, it's still serving up the volume to VMWare, there is activity showing on all of the drives, and badblocks does seem to be running as one drive had some bad blocks, and the log file for those bad blocks was updated on 1/5 with more bad blocks found. It also started updating with more bad blocks when I pulled that drive to send back, so I had to kill the PID for that process.

Edit - I left out that I'm running BB using 'badblocks -b 512 -wsv -o "da11.bb" /dev/da11', not using a larger block size because from what I've read, specifying a size other than the block size on the drive can result in bad blocks being missed.

Should I be concerned that it's taking so long, and that the console is frozen?

danb35 · Jan 8, 2018

SubnetMask said:
Should I be concerned that it's taking so long, and that the console is frozen?

I'd say yes--neither of those sounds at all normal. When I've run badblocks on 6 TB disks, it completed within a week.

millst · Jan 9, 2018

Yes. My 8TB drives took ~4 days.

Did you follow the guide closely? It mentions a kernel flag and how it shouldn't be run on a production system. Detaching from the terminal would have let you log in remotely and see what's going on (even with the frozen term).

-tm

wblock · Jan 9, 2018

The sysctl setting is a "please let me blow away my system drive" option and should not be used.

As far as why it is taking so long, 2TB are probably 4K blocks natively. So using a block size of 512 is going to have some serious write amplification. If there are actually any bad blocks on the drives, the retries can add a lot of time.

The "console not responding" is not clear. If it's one of the consoles running a program, pressing ctrl-T should show something.

SubnetMask · Jan 9, 2018

I've since rebooted that FreeNAS and moved four of the drives over to another machine that's running Ubuntu and essentially dedicated to running badblocks. The one they were running on wasn't 'production', but it did have a VMWare datastore on it on the other four drives in it.

As far as the guide, there isn't much to it (The smartctl tests don't seem to work right on these SAS drives), and yes, I did run the '
sysctl kern.geom.debugflags=0x10' beforehand. If that's to allow raw disk I/O, how will badblocks run if you don't run that command?

As far as the console being frozen, you know how you can use ALT+F1, F2, F3, etc to get to different console sessions? The screen was frozen and I was not able to switch between sessions. It was totally unresponsive.

As far as the drives and sector size, they are HUS723020ALS640 (7K3000 series) drives, which according to the datasheet, have sector sizes of 512, 520 or 528. These were formatted 520 because they were from an EMC Clarion best I can tell, but have been re-formatted to 512 bytes. These drives don't have the 'AF' logo on them. I believe it wasn't until the 7K4000 drives that 4k sectors became available, and even then, it was optional - not every drive/model had 4k sectors.

wblock · Jan 9, 2018

SubnetMask said:
sysctl kern.geom.debugflags=0x10' beforehand. If that's to allow raw disk I/O, how will badblocks run if you don't run that command?

It really has nothing to do with raw I/O. Instead, it is a safety feature that prevents overwriting drives with active read/write mounts. In other words, drives that are actually being used. So setting that sysctl means "yes, make it possible to overwrite my actual, in-use data."

SubnetMask · Jan 9, 2018

wblock said:
It really has nothing to do with raw I/O. Instead, it is a safety feature that prevents overwriting drives with active read/write mounts. In other words, drives that are actually being used. So setting that sysctl means "yes, make it possible to overwrite my actual, in-use data."

Ah. So basically, if the drives are freshly inserted and not assigned to anything other than being a device listed in /dev/, badblocks won't have any issues doing what it needs to do? If so, that would be good info to add to the 'howto'.

wblock · Jan 10, 2018

The drive does not even have to be fresh. It really just needs to not be in use, which essentially means "does not have a filesystem mounted", although there are some GEOM things that could also do it. When that happens, the GEOM system blocks writing to it, so it gives an error message.

I sent a PM to the author of that document asking about fixing the description of that sysctl, but have had no response. I should probably go add some text to it.

Important Announcement for the TrueNAS Community.

Issue with Badblocks running on FreeNAS?

SubnetMask

Contributor

danb35

Hall of Famer

millst

Contributor

wblock

Documentation Engineer

SubnetMask

Contributor

wblock

Documentation Engineer

SubnetMask

Contributor

wblock

Documentation Engineer

Similar threads

Important Announcement for the TrueNAS Community.

Issue with Badblocks running on FreeNAS?

Contributor

Hall of Famer

Contributor

Documentation Engineer

Contributor

Documentation Engineer

Contributor

Documentation Engineer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Issue with Badblocks running on FreeNAS?"

Similar threads