Incorrect SMART reports on swapped drives

Status
Not open for further replies.

Brentnall

Dabbler
Joined
May 24, 2015
Messages
16
So this is an issue that has already been resolved by a simple reboot of FreeNAS however I wanted to have a discussion about whether it should have occurred in the first place. Also I want to check whether there is a better procedure for pulling drives from a running system.

Basically I had 4x1TB WD drives and 4x3TB Samsung drives in two separate pools which have been in production for a while with regular successful SMART checks, pools scrubs, and no major issues.
I recently acquired 8x8TB Red drives and, as there are no more free bays on the server, intended to replace the above existing drives and create one big pool with the new 8TB drives.

First destroyed the existing two pools and selected the "Mark the disks as new" check box while doing so.
I then simply pulled the drives. Is there a correct method of telling FreeNAS that you're about to detach a drive? If so, I haven't managed to find it.
Finally this was followed by slotting in the new 8TB drives and starting the long processes of burn in testing (oh joy!).

They initially flew through a short SMART test with no problems and then initiated a long SMART test on all 8 drives.
At this point I started to see reallocated sector smart errors fairly frequently over the course of an hour before stopping the test manually to investigate.
Importantly, they were being reported ONLY on the 4 drives that were in slots previously occupied by the Samsung drives.
However, when checking the SMART output for these drives no reallocated sectors were being reported:

Jun 25 10:52:11 roflnas smartd[79064]: Device: /dev/da3 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.
Jun 25 10:52:11 roflnas smartd[79064]: Device: /dev/da4 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.
Jun 25 10:52:11 roflnas smartd[79064]: Device: /dev/da5 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.
Jun 25 10:52:11 roflnas smartd[79064]: Device: /dev/da6 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0


I initially suspected a sas cable issue as all 4 drives were connected to the HBA via the same cable.
However I started troubleshooting with a simple reboot of FreeNAS, since which I haven't seen a single additional error.
The long SMART tests have all completed successfully and I will now be moving on to dd and badblocks testing.

So what exactly happened there, did FreeNAS cache those SMART stats from the Sammy drives in memory and attempt to apply them to the new drives?
It's fairly likely that there were a few reallocated sectors on the Samsung drives over the time they had been in use.
If so did SMART not recognise that these were completely different drives?

Is it likely that a simple restart of the SMART service instead of a full system reboot would have resolved the erroneous errors too?
 
Last edited:

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Also I want to check whether there is a better procedure for pulling drives from a running system.
It's always best to shutdown when replacing a drive. Pulling a drive while it's running can cause issues like what you're currently seeing.
 
Status
Not open for further replies.
Top