NOS HGST 6TB He8 drive intermittently dropping from pool

CP Waite

Dabbler
Joined
Nov 4, 2016
Messages
19
Hey everyone, Its been a while since I've needed any assistance, my FreeNAS build has been humming along problem free for quite a while. Recently I purchased two New* (i'll get to it in a minute) HPE/HGST HUH728060ALE604 drives from a seller on Newegg. These drives were "New" with 2017 manufacture dates, but factory sealed and with zero hours on the clock. I ran some burn in tests, a full SMART test suite. One drive has a couple errors logged but nothing thats alarming, neither has any reallocated sectors, and both have clean SMART reports. The issue I'm having is randomly, usually at night when the scrub runs, one drive (the same drive each time) drops from the array and then gets re-added within 5-10 minutes. Errors are logged in FreeNAS but are cleared on their own. The only reason I even would know is because my email alerts for SMART status. Here is the content of the latest email, I can't really make heads nor tails of it but maybe you guys can:

Code:
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <MB6000GEQUT HPG7> s/n 2RG9YSHX detached
GEOM_MIRROR: Device swap1: provider ada3p1 disconnected.
(ada3:ahcich3:0:0:0): Periph destroyed
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <MB6000GEQUT HPG7> ACS-2 ATA SATA 3.x device
ada3: Serial Number 2RG9YSHX
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 5723166MB (11721045168 512 byte sectors)
GEOM_ELI: Device mirror/swap1.eli destroyed.
GEOM_MIRROR: Device swap1: provider destroyed.
GEOM_MIRROR: Device swap1 destroyed.
GEOM_MIRROR: Device mirror/swap1 launched (2/2).
GEOM_ELI: Device mirror/swap1.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <MB6000GEQUT HPG7> s/n 2RG9YSHX detached
GEOM_MIRROR: Device swap1: provider ada3p1 disconnected.
(ada3:ahcich3:0:0:0): Periph destroyed
GEOM_ELI: Device mirror/swap1.eli destroyed.
GEOM_MIRROR: Device swap1: provider destroyed.
GEOM_MIRROR: Device swap1 destroyed.
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <MB6000GEQUT HPG7> ACS-2 ATA SATA 3.x device
ada3: Serial Number 2RG9YSHX
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 5723166MB (11721045168 512 byte sectors)


And here is the SMARTCTL output:
Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0027   133   100   054    Pre-fail  Always       -       107
  3 Spin_Up_Time            0x0023   253   100   024    Pre-fail  Always       -       48 (Average 48)
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   128   100   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       910
 10 Spin_Retry_Count        0x0033   100   100   060    Pre-fail  Always       -       0
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
180 Unknown_HDD_Attribute   0x002b   100   100   098    Pre-fail  Always       -       0
194 Temperature_Celsius     0x0022   181   176   000    Old_age   Always       -       33 (Min/Max 21/35)
196 Reallocated_Event_Count 0x0033   100   100   000    Pre-fail  Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       776         -
# 2  Short offline       Completed without error       00%       608         -
# 3  Short offline       Completed without error       00%       440         -
# 4  Extended offline    Completed without error       00%       396         -
# 5  Short offline       Completed without error       00%       273         -
# 6  Extended offline    Completed without error       00%        48         -
# 7  Short offline       Completed without error       00%         0         -


As best I know these drives are NOT SMR (I did extensive research before purchasing these). Is it a bad drive? It seems fine, no issues aside from this odd behavior. These drives relaced older 2TB drives, same SATA ports, same cables. Could it be something as stupid as a bad cable? Anything worrisome jump out at anyone here?

Server specs:
P8Z77-V LK
i5-3450
16gb Ram
6x SATA drives
750W Corsair PSU
8GB Cruzer Fit boot drive

This setup has worked flawlessly for about 5 years so while its old, I don't think its to blame here. I welcome everyone's thoughts on this. Cheers!
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Is it the drive or is it the slot? You may want to shuffle drives around to see if the issue persists with the drive, or arises only on that slot.
 

CP Waite

Dabbler
Joined
Nov 4, 2016
Messages
19
You mean port? I don't have a backplane. Should I try swapping cables between a working drive and this drive? And this is a noob question, but will that screw up my drive mapping in FreeNAS?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Are your drives configured as a RAIDZ set? Are you using encryption? Typically, you can shuffle drives around within a RAIDZ set, but if the pool is encrypted, you'll have to power down between shuffles.
 

CP Waite

Dabbler
Joined
Nov 4, 2016
Messages
19
Oh yea sorry I should have provided more info on my pool. Its a pool of 3 sets of mirrored drives, so ZFS Mirror I believe is the correct term? A mirrored pair of 3TB disks, a mirrored pair of 4TB disks, and a mirrored pair of 6TB disks (the new ones) for a total pool size of 13TB. I don't use RAIDZ. And no encryption. I'm not entirely sure my consumer board actually supports hot-swap for SATA so I'd power down, but downtime isn't a huge issue, its mainly a media (Plex) and backup/file-storage server.

Graphically mapped:

MainVolume
—Mirror
——ada0
——ada1
—Mirror
——ada2
——ada3
—Mirror
——ada4
——ada5

In this topography ada2 and ada3 are the two new 6tb drives. ada3 is the drive in question .
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, if you swap ada2 and ada3 in the mirror, you'll be able to see if the fault follows the disk, or the SATA port.
 

CP Waite

Dabbler
Joined
Nov 4, 2016
Messages
19
Awesome, thanks! I'll give that a whirl and post back. It might be a few weeks as the error happens fairly infrequently.
 

CP Waite

Dabbler
Joined
Nov 4, 2016
Messages
19
Flipped the cables, and double checked everything was seated. now the waiting game...
 

CP Waite

Dabbler
Joined
Nov 4, 2016
Messages
19
2 weeks in and no similar errors. Fingers crossed it was just a cable that wasn't seated well.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
That's great to hear.
 

CP Waite

Dabbler
Joined
Nov 4, 2016
Messages
19
Yea, thanks for the troubleshooting help so far. I also recently added a pair of 120mm fans in front of the hard drive cage in my case which dropped the hdd temps ~10°C across the board. They were all under 35°C so hardly hot but more cooling is never a bad thing. I doubt that has had any effect but you never know.
 
Top