Ramdom Disk reset on ST1000DM010-2EP102

Z80user

Cadet
Joined
Jun 13, 2020
Messages
6
Recently I try to create a FreeNAS server with 3 disks of 1TB (ST1000DM010-2EP102) but I get random restarts of the disks as some dissapear and reappear without touch anything

on the UI/storage/disk sometimes the disk isn't on it, other appear but take some time
trying to format it (wipe with zeros) and I reach any % and the wipeing Disk process stop as it will be finnish
when I try to add it (replace) a disk from other pool with one of this one and I can do it, when I try to add another one the % on the resilvering process come back and start it again from 0% allmost forever

My 2 systems
core2quad with 8 GB of RAM https://www.msi.com/Motherboard/support/P965_Platinum (FreeNAS-11.3-U3.2 on 1 USB of 16GB)
Ryzen 3600G with 16GB of RAM https://www.gigabyte.com/Motherboard/B450M-S2H-rev-10 (FreeNAS-11.3-U3.2 on 2 USB of 32GB)
Xeon x5680 with 24GB or RAM https://www.asus.com/Motherboards/P6T_WS_Professional/ (Windows server 2008 R2 Datacenter)
Ryzen 3 3900X with 32GB of RAM https://www.asus.com/Motherboards/PRIME-X570-PRO/ (Linux Debian 10 & Linux Mint 20)

Happen the same on both systems with FreeNAS, is allmost impossible to use it
on both Windows and Linux machine the disk work perfect

NOTE: the "Hardware ECC recovered" value didn't appear to be right and sometimes that value reset itself and look more like the number of sector read/writes on the disk, I get 62.100 M of errors and today have a raw value of 0x06B9

Someone ask me to test it with a diferent power Supply but that isn't the problem as I test it with 3 diferent powerSupply (of more enought power) and 2 USB HDD case and fail on all

NOTE: The problem appear on the Screen attack to the sever but not on every case
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
These may be SMR drives . The symptoms are typical when they are used in a NAS.
Also there are particular bios settings required when using Ryzen CPUs. Please search the forum.
 

Z80user

Cadet
Joined
Jun 13, 2020
Messages
6
but as I say this also happen on the old core2quad.
as a curiosity Windows Server don't detect it enough fast as you can remove the sata connector from the disk and plug it again and the system don't will tell you even with the disk manager open, doing that on FreeNAS you will see it on the screen attack to the server (I don't know where I can see it from the web interface)

here say that disk is CMR https://www.seagate.com/es/es/internal-hard-drives/cmr-smr-list/
and also isn't on the list of SMR drivers than are here https://www.ixsystems.com/community/resources/list-of-known-smr-drives.141/

with the search I can't find anything about that particular disk related to the "reset" issue.
are one with the ECC but is with another size of 1.5TB Seagate disk, and is for the old version (ST31500541AS) and that model is twice as high as this one than is a new design

btw the load on the server is none or low, just doing the scrub or the resilvering and 1 file transfer at maximum but not during the resilvering process
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
It would appear that the only common failing component are the hard drives. Is that correct ?
build a pool with a single drive of another type and see if it is stable.

From other posts regarding Ryzen stability :

  • Disable Cool 'n Quiet in the BIOS
  • Disable CPU C-states, especially C-6.
 

Z80user

Cadet
Joined
Jun 13, 2020
Messages
6
yes, only the HDD (3 of them, 1 send one back to RMA and I get it back (a diferent one)

I see that... multiple times, but htat was just after clean the screen (pressing enter) and try with the laptop to create the Vdev/pool

the same output but I change the Serial number with Z80user, just in case
Code:
g_access(952): provider ada2 has error 6 set
g_access(952): provider ada2 has error 6 set
ada2 at ahcich5 bus 0 scbus6 target 0 lun 0
ada2: <ST1000DM010-2EP102 CC43> s/n Z80user detached
(ada2:ahcich5:0:0:0): Periph destroyed
ada2 at ahcich5 bus 0 scbus6 target 0 lun 0
ada2: <ST1000DM010-2EP102 CC43> ATA8-ACS SATA 3.x device
ada2: Serial Number Z80user
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors)
ada2: quirks=0x1<4K>
ses0: ada2,pass2 in 'Slot 04', SATA Slot: scbus6 target 0
ada2 at ahcich5 bus 0 scbus6 target 0 lun 0
ada2: <ST1000DM010-2EP102 CC43> s/n Z80user detached
(ada2:ahcich5:0:0:0): Periph destroyed
ada2 at ahcich5 bus 0 scbus6 target 0 lun 0
ada2: <ST1000DM010-2EP102 CC43> ATA8-ACS SATA 3.x device
ada2: Serial Number Z80user
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors)
ada2: quirks=0x1<4K>
ses0: ada2,pass2 in 'Slot 04', SATA Slot: scbus6 target 0

I put a laptop near of the server and when I try to create the pool I get this error on the UI interface:

[EFAULT] Failed to wipe disk ada2: [EFAULT] Command gpart create -s gpt /dev/ada2
failer (code 1): gpart: arg0 'ada2': invalid argument

NOTE: I used the core2quad instead of the Ryzen by now as it have more sata ports to can test more things at the same time

to temporaly fix the problem, I put the HDD in a USB case than lie about too many things (the name of the HDD)

I found that info searching for "g_access(952)"
not sure how I can fix it, but now I can think on how to fix it and other people too., the USB case is just a temporal fix, I just use it to test

the next lines is EXTRA nothing to do with the problem just to clarificate the problem a bit more as how I fix it temporally
I move the disk on the sATA port to the USB case and placed the disk on the USB case into the system (I know is "damaged" but not too much) and I get this messages

(ada3:ahcich3:0:0:0): Retrying command
(ada3:ahcich3:0:0:0): CAM status: ATA Status Error
(ada3:ahcich3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
(ada3:ahcich3:0:0:0): RES: 51 40 ba 00 40 40 00 00 00 00 00
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Its funny because i have these Drives in a iSCSI array but the one has started reporting errors as well. urmm very fishy.

Code:
mps0: mpssas_prepare_remove: Sending reset for target ID 15

da10 at mps0 bus 0 scbus0 target 15 lun 0

mps0: da10: <ATA ST1000DM010-2EP1 CC43> s/n Z9AR1MFP detached

(da10:mps0:0:15:0): WRITE(10). CDB: 2a 00 02 2b 5a c0 00 00 60 00

Unfreezing devq for target ID 15

(da10:mps0:0:15:0): CAM status: CCB request aborted by the host

(da10:mps0:0:15:0): Error 5, Periph was invalidated

(da10:mps0:0:15:0): Periph destroyed

mps0: SAS Address for SATA device = 3c2f56516485484e

mps0: SAS Address from SATA device = 3c2f56516485484e

da10 at mps0 bus 0 scbus0 target 15 lun 0

da10: <ATA ST1000DM010-2EP1 CC43> Fixed Direct Access SPC-4 SCSI device

da10: Serial Number Z9AR1MFP

da10: 600.000MB/s transfers

da10: Command Queueing enabled

da10: 953869MB (1953525168 512 byte sectors)

da10: quirks=0x8<4K>

ses0: da10,pass11: Element descriptor: 'ArrayDevice04'

ses0: da10,pass11: SAS Device Slot Element: 1 Phys at Slot 4

ses0:  phy 0: SATA device

ses0:  phy 0: parent 500605b0000274bf addr 500605b0000274a4

mps0: mpssas_prepare_remove: Sending reset for target ID 15

da10 at mps0 bus 0 scbus0 target 15 lun 0

mps0: da10: Unfreezing devq for target ID 15

<ATA ST1000DM010-2EP1 CC43> s/n Z9AR1MFP detached

(da10:mps0:0:15:0): Periph destroyed

mps0: SAS Address for SATA device = 3c2f56516485484e

mps0: SAS Address from SATA device = 3c2f56516485484e

da10 at mps0 bus 0 scbus0 target 15 lun 0

da10: <ATA ST1000DM010-2EP1 CC43> Fixed Direct Access SPC-4 SCSI device

da10: Serial Number Z9AR1MFP

da10: 600.000MB/s transfers

da10: Command Queueing enabled

da10: 953869MB (1953525168 512 byte sectors)

da10: quirks=0x8<4K>

ses0: da10,pass11: Element descriptor: 'ArrayDevice04'

ses0: da10,pass11: SAS Device Slot Element: 1 Phys at Slot 4

ses0:  phy 0: SATA device

ses0:  phy 0: parent 500605b0000274bf addr 500605b0000274a4



These may be SMR drives . The symptoms are typical when they are used in a NAS.
Also there are particular bios settings required when using Ryzen CPUs. Please search the forum.

They are CMR Drives. https://www.seagate.com/gb/en/internal-hard-drives/cmr-smr-list/
 
Last edited:
Top