SOLVED Realloc sector count SMART problem - are my pools in danger and how to fix it?

Mr. Slumber

Contributor
Joined
Mar 10, 2019
Messages
182
Dear community,

thanks to the fantastic scripts of @Spearfoot (please find them here) and running a daily short SMART test and a weekly long SMART test I just found out that one of my HDDs changed from ReAlloc count = 0 to ReAlloc count = 80.
It's a pool containing of 3 HDDs (Seagate EXOS 10TB) with 1 vdev and raidz1. There are currently no alarms logged for this pool by FreeNAS (setup please see my signature).

So what are the next steps? Stop the machine immediately and rip the "prefail" HDD out, replace it with a new one and resilver? Or don't panic?

What would you do? Thanks for sharing you opinion! :smile:

prefail.jpg
 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
I'd enter a state of mind of 'fair panic'.
If drive still is under warranty I'd investigate options for RMA.

Depending on the situation of other backups to the most critical data, I'd adjust my playfulness...
If it would be my main rig in that situation, I'd start paying daily notice to the development of all smart statistics of the particular drive.
Potentially, nothing happens the next week or whatever time frame you choose. Then I'd be "calming down slightly".

If the number would continue to grow, in the situation where I'm not interested in scrambling a backup-restore scenario, I'd shut down the machine and get another drive in there before doing much else.

to get the full smartoutput;
smartctl -a /dev/da4

Good luck!
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Dear community,

thanks to the fantastic scripts of @Spearfoot (please find them here) and running a daily short SMART test and a weekly long SMART test I just found out that one of my HDDs changed from ReAlloc count = 0 to ReAlloc count = 80.
It's a pool containing of 3 HDDs (Seagate EXOS 10TB) with 1 vdev and raidz1. There are currently no alarms logged for this pool by FreeNAS (setup please see my signature).

So what are the next steps? Stop the machine immediately and rip the "prefail" HDD out, replace it with a new one and resilver? Or don't panic?

What would you do? Thanks for sharing you opinion! :smile:

View attachment 42735
I concur with @Dice that you should take steps to replace da4 as soon as possible.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I just found out that one of my HDDs changed from ReAlloc count = 0 to ReAlloc count = 80.
That is a big jump from zero to 80. Something major is wrong with the drive physically. I wouldn't be surprised if a SMART Long test failed.
 

Mr. Slumber

Contributor
Joined
Mar 10, 2019
Messages
182
Thank so much for your posts, very much appreciated :smile:

The last long SMART test was on Monday (that's when I first recognized something isn't right). Did another one today (Friday) and luckily "ReAlloc Sec count is still 80".
Ok, but why take risks with my production system so I will replace the HDD.

Contacted Seagate on Wednesday (gave them all the data of the HDD, errors and my contact data) but hey, they didn't even bother to call or eMail me back and of course no RMA: just silence...

Thank you very much Seagate of course that's why I bought enterprise class drives from you to be completely ignored by you with a faulty 21 months old HDD... :mad:

What is the best way to replace the faulty drive? Shutdown TrueNAS, change HDD, boot and then resilvering? Sorry for asking but what's the best practice to replace a drive? (I read the "Hard drive troubleshooting guide" in the signature of @joeschmuck, highly recommended!)

Thanks again, appreciate your time! :smile:
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
What is the best way to replace the faulty drive? Shutdown TrueNAS, change HDD, boot and then resilvering? Sorry for asking but what's the best practice to replace a drive?
It's in the User Guide, very clear. Follow that and you will be fine.
 

Mr. Slumber

Contributor
Joined
Mar 10, 2019
Messages
182
It's in the User Guide, very clear. Follow that and you will be fine.

Thank you, will do it that way. In case anbody else got a similar problem I'll leave a link here.

Thanks again all of you for your help!
 
Last edited:

Mr. Slumber

Contributor
Joined
Mar 10, 2019
Messages
182
So, Seagate got back to me finally and now they replaced the EXOS 10TB (ST10000NM0086) with a new EXOS 16TB (ST10000NM001G). Nice, so no hard feelings anymore.

But... since the pool consisted of 3x 10TB HDDs and now one will be replaced it will consist of 2x 10TB HDD and 1x 16TB HDD.
Question: do I need to "do something" with the 16TB HDD before installing it and then click "Replace" in the pool?

Of course there will be no extra benefit of the 16TB HDD because the pool will not "grow magically" but ok.

Thanks for an answer. :smile:
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
So, Seagate got back to me finally and now they replaced the EXOS 10TB (ST10000NM0086) with a new EXOS 16TB (ST10000NM001G). Nice, so no hard feelings anymore.

But... since the pool consisted of 3x 10TB HDDs and now one will be replaced it will consist of 2x 10TB HDD and 1x 16TB HDD.
Question: do I need to "do something" with the 16TB HDD before installing it and then click "Replace" in the pool?

Of course there will be no extra benefit of the 16TB HDD because the pool will not "grow magically" but ok.

Thanks for an answer. :smile:

I recently received a DOA 4Tb drive. It happens. Since you are at the point of failure, but not yet degraded, I would do some kind of short test of the new drive. Connect it to a host and run badblocks for a couple hours at a minimum. I can understand not wanting to do a full burn in, but at least smoke test it.
 

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
I would do a proper burn-in of the new drive. Then, if you have the space in your case, I would replace the drive online. This way, all the drives remain part of the pool until the replacement is successful. I have done this with 6 drives at once in my system (moving from 4TB to 6TB drives in a pool) and it greatly reduces risk.

Cheers,
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Indeed... @Scharbag is correct. Consider my previous only valid if you can do an online replacement. If you have to pull the failing drive to provision the replacement, you need to have full trust (full burn-in) in the replacement. The faulty drive may fully fail upon removal, and an incoming DOA drive would result in your pool having no redundancy while you wait for another RMA.
 

Mr. Slumber

Contributor
Joined
Mar 10, 2019
Messages
182
Thanks for your feedback, appreciate it!

Today this "Seagate nightmare" continued: UPS delivered the promised HDD at least that's what I thought...
Inside the package wasn't the promised 16TB model but a 10TB model. Ok, I sent a 10TB HDD in, so ok, seems fair.

Thanks to you guys I did not install it in my FreeNAS machine but put in a test machine (that saved me a lot of trouble further down the road I think... :smile:), fired it up and...

:mad::mad::mad:
I didn't hear the angels whisper but a loud mechanical screeching noise coming out of the HDD followed by a crescendo of not so funny beeping sounds...lasted about 5min and then... silence... :mad:
Tried it in another test system, on my Mac, via USB adapter, in a Linux machine but every time the same...

So happy that I listened to your advices before I installed this HDD in my server! :smile:

Thank you very much Seagate for sending a DOA HDD, let's see what you will do. Definitely lost me as a future customer but hey, they don't care... :wink:
 
Last edited:

Dan Tudora

Patron
Joined
Jul 6, 2017
Messages
276
hello
! lucky you
call urgent courier/transporter to returned the good HDD, joking (an HDD with bad sector is much good than another who just scratch the surface)
my advice is to buy another HDD, maybe WD, and must to be CMR and replace to resolve situation
success
 

Dan Tudora

Patron
Joined
Jul 6, 2017
Messages
276
hello
maybe courier/transporter just play footbool with your HDD, or vendor/furnisor do not protect well
I have situation
but return/resolve/RMA that HDD
but I am not happy with that
cherss
 

Mr. Slumber

Contributor
Joined
Mar 10, 2019
Messages
182
Contacted Seagate on Wednesday (gave them all the data of the HDD, errors and my contact data) but hey, they didn't even bother to call or eMail me back and of course no RMA: just silence...
This was on November 11.th...

Update 2020.11.26: Seagate was very fast with sending me a return label for the faulty HDD they send me. So I did sent it back and wow, the next day it was recieved by Seagate, the said thank you via eMail (kind of ;-) and that was that. :oops:

Update 2020.12.04: No HDD was sent, no respone to my eMails just silence...
Just to recap: 23 days ago I established the first contact with Seagate and since then they just managed to send me a faulty HDD which again I had to return. Pretty good service for an enterprise HDD don't you think :wink:
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
This is exactly why I keep cold, burned-in spares on hand. It's one thing to deal with a drive failure, resilver, etc. It's another to be living on the edge as one drive is going offline and the coming resilver may result in additional drive failures.

ALSO: I'll simply contrast this experience with buying used at goharddrive.com, where drives just starting to SMART-out were either replaced without any questions or the $$$ were refunded. Biggest issue with goharddrive: US-centric operation. Shipping costs, etc. negate benefit to ROW customers.
 

Dan Tudora

Patron
Joined
Jul 6, 2017
Messages
276
I keep cold, burned-in spares on hand
hello
please do not keep HDD "on hand" because you have 37.2 celsius degrees and that temperature is not good for that HDD :wink:
under 30 is much better
BUT you have 1 (one) cold spare
you will be warned about Mr. Slumber situation
buy cold spare
success
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
please do not keep HDD "on hand" because you have 37.2 celsius degrees and that temperature is not good for that HDD :wink: under 30 is much better
Haha. "Keeping something on hand" != holding it in a hand (it is simply an English expression for storing something nearby)
 

Mr. Slumber

Contributor
Joined
Mar 10, 2019
Messages
182
Wow, Seagate wrote me an eMail, adressing a totally different person... Next day another eMail but hey, it was for me, telling me that there were some logistical problems with the new HDD they wanted to send me and they don't know what to do about that... Really? :mad:

In the evening UPS rang at my door and delivered a new HDD for me. Guess what: not the EXOS modell I sent in but a Lenovo SATA 7200 rpm 10TB HDD (maybe this is also manufactured by Seagate but don't know exactly).
Did a burn in test (thanks for this advice again!), plugged it in the server, clicked "replace", resilvering was done after some hours and boom... FreeNAS was online again.

Took only nearly 1 month so hey, that's why I bought Seagate enterprise HDDs... :tongue: (never ever again my friend...)
 
Top