Known good disks won't connect to the volume/detect by FreeNAS

Status
Not open for further replies.

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
I ran a scrub overnight on my NAS box and when returning in the morning I noticed that my volume was degraded. Looks like a single disk is not being detected properly. I rebooted to see if it would come back and still hasn't and is not listed by the WebGUI as even being connected. When reconnecting it my log shot out this:

Code:
NOP FLUSHQUEUE
CAM status: Command timeout
Error 5, retries exhausted


This listed twice on my physical console screen, this has happened before but I replaced the drive assuming bad. Nothing seems to get this drive to connect and I am 99% certain it's a good drive. I'm going to do further testing (cable switches, and port switches) while I'm waiting for any information. I looked up the NOP FLUSHQUEUE and found almost nothing here or on the rest of the web. Any ideas?

My current board is an Intel DX580G since it's not listed in my sig, rest of the specs can be found there though. Thanks all!

IMPORTANT UPDATE: When unplugging the drive in question and a second drive that was running, I plugged in the non-detected one first and it reconnected as normal but the drive that was running when plugged back in is now not detected? So, it's not the actual drives failing, cables are in the same place, but random drives are not being detected, from my current tests the last drive that powers out of the array is not detected. Also, drives are obviously spinning up and not making any weird noises.
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Hardware specs, pool layout and FreeNAS version need to be in your post not your signature. People can't read your signature.

Sent from my Nexus 5X using Tapatalk
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
Hardware specs, pool layout and FreeNAS version need to be in your post not your signature. People can't read your signature.

Sent from my Nexus 5X using Tapatalk
Sorry, I didn't know that.. Here they are:

Build: FreeNAS-9.10.2-U3

Motherboard: Intel DX580G
Intel Xeon W3565 @ 3.2 GHz
2 x 4GB ECC RAM (8GB Total)
Boot: Mirrored 2x 120GB SSD's
VDEV1: RaidZ2 w/ 5x 1TB Drives
VDEV2: RaidZ2 w/ 5x 1TB Drives

Note: RAM has been fully tested with several passes (Even though it's ECC)
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
All 10 drives connected to Sata ports on motherboard? You need to tell more of the story is you want help. I tried explaining this once already.

Sent from my Nexus 5X using Tapatalk
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
8 total on the board (two drives mentioned so far have been this case), 4 are on a Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01) controller that goes to a hotplug bay (That's all Dell proprietary stuff but I've had 0 issues with these disks ever).
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Yep throw that thing in the trash.

Sent from my Nexus 5X using Tapatalk
 
Last edited by a moderator:

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
I mean that is kind of irrelevant, I've been using that card in a passthrough mode for a very long time and the drives on that card aren't having any issues whatsoever, just the drives on the board are. I'm gonna post some log output for certain commands, maybe someone will come up with something.

dmesg | grep -e ada
Code:
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD10EALS-00Z8A0 05.01D05> ATA8-ACS SATA 2.x device
ada0: Serial Number WD-WCATR0571584
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST9160314AS 0020LVM1> ATA8-ACS SATA 2.x device
ada1: Serial Number 6VC4A1YP
ada1: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 152627MB (312581808 512 byte sectors)
ada1: Previously was known as ad6
ada2 at ahcich8 bus 0 scbus11 target 0 lun 0
ada2: <ACSC2M064S25 1.095.06> ACS-2 ATA SATA 3.x device
ada2: Serial Number 986012580186
ada2: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 512bytes)
ada2: Command Queueing enabled
ada2: 61057MB (125045424 512 byte sectors)
ada2: Previously was known as ad20
ada3 at ahcich9 bus 0 scbus12 target 0 lun 0
ada3: <ST1000DM003-1ER162 CC61> ACS-2 ATA SATA 3.x device
ada3: Serial Number Z4YCX1T0
ada3: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 953869MB (1953525168 512 byte sectors)
ada3: quirks=0x1<4K>
ada3: Previously was known as ad22
ada4 at ahcich10 bus 0 scbus13 target 0 lun 0
ada4: <ST3000DM001-1CH166 CC27> ACS-2 ATA SATA 3.x device
ada4: Serial Number Z1F444PS
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 2861588MB (5860533168 512 byte sectors)
ada4: quirks=0x1<4K>
ada4: Previously was known as ad24
ada5 at ahcich11 bus 0 scbus14 target 0 lun 0
ada5: <ST1000DL002-9TT153 CC98> ATA8-ACS SATA 2.x device
ada5: Serial Number W1V0FWJM
ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada5: 953869MB (1953525168 512 byte sectors)
ada5: quirks=0x1<4K>
ada5: Previously was known as ad26
ada6 at ahcich12 bus 0 scbus15 target 0 lun 0
ada6: <ST1500DM003-9YN16G HP16> ATA8-ACS SATA 3.x device
ada6: Serial Number W240QC3W
ada6: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada6: Command Queueing enabled
ada6: 1430799MB (2930277168 512 byte sectors)
ada6: quirks=0x1<4K>
ada6: Previously was known as ad28
GEOM_ELI: Device ada6p1.eli created.
GEOM_ELI: Device ada4p1.eli created.
GEOM_ELI: Device ada0p1.eli created.
GEOM_ELI: Device ada3p1.eli created.
GEOM_ELI: Device ada5p1.eli created.
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD10EALS-00Z8A0 05.01D05> s/n WD-WCATR0571584 detached
GEOM_ELI: Device ada0p1.eli destroyed.
GEOM_ELI: Detached ada0p1.eli on last close.
(ada0:ahcich0:0:0:0): Periph destroyed
ada0 at ahcich13 bus 0 scbus16 target 0 lun 0
ada0: <ST1000DL002-9TT153 CC3C> ATA8-ACS SATA 3.x device
ada0: Serial Number W1V1BM12
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors)
ada0: quirks=0x1<4K>
ada0: Previously was known as ad30
ada0 at ahcich13 bus 0 scbus16 target 0 lun 0
ada0: <ST1000DL002-9TT153 CC3C> s/n W1V1BM12 detached
(ada0:ahcich13:0:0:0): Periph destroyed
ada0 at ahcich13 bus 0 scbus16 target 0 lun 0
ada0: <ST1000DL002-9TT153 CC3C> ATA8-ACS SATA 3.x device
ada0: Serial Number W1V1BM12
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors)
ada0: quirks=0x1<4K>
ada0: Previously was known as ad30


ls /dev/
Code:
./		  ad28p1.eli@ ada3p2	  console	 da2		 ggctl	   pass11	  sysmouse	ugen2.1@
../		 ad28p2@	 ada4		consolectl  da2p1	   gptid/	  pass2	   ttyv0	   ugen2.2@
acpi		ad30@	   ada4p1	  cpuctl0	 da2p1.eli   hpet0	   pass3	   ttyv1	   ugen3.1@
ad20@	   ad30p1@	 ada4p1.eli  cpuctl1	 da2p2	   io		  pass4	   ttyv2	   ugen4.1@
ad20p1@	 ad30p2@	 ada4p2	  cpuctl2	 da3		 iscsi	   pass5	   ttyv3	   ugen5.1@
ad20p2@	 ad6@		ada5		cpuctl3	 da3p1	   kbd0@	   pass6	   ttyv4	   ugen6.1@
ad22@	   ad6p1@	  ada5p1	  cpuctl4	 da3p1.eli   kbd1@	   pass7	   ttyv5	   ugen7.1@
ad22p1@	 ad6p2@	  ada5p1.eli  cpuctl5	 da3p2	   kbd2@	   pass8	   ttyv6	   ugen8.1@
ad22p1.eli@ ada0		ada5p2	  cpuctl6	 devctl	  kbdmux0	 pass9	   ttyv7	   ukbd0
ad22p2@	 ada0p1	  ada6		cpuctl7	 devctl2	 klog		pci		 ttyv8	   ums0
ad24@	   ada0p2	  ada6p1	  crypto	  devstat	 kmem		ptmx		ttyv9	   urandom@
ad24p1@	 ada1		ada6p1.eli  ctty		dtrace/	 led/		pts/		ttyva	   usb/
ad24p1.eli@ ada1p1	  ada6p2	  da0		 dumpdev@	mdctl	   random	  ttyvb	   usbctl
ad24p2@	 ada1p2	  apm		 da0p1	   fd/		 mem		 rdma_cm*	ttyvc	   vboxdrv
ad26@	   ada2		apmctl	  da0p1.eli   fido		mpt0		reroot/	 ttyvd	   vboxdrvu
ad26p1@	 ada2p1	  atkbd0	  da0p2	   fw0@		nfslock	 ses0		ttyve	   vboxnetctl
ad26p1.eli@ ada2p2	  audit	   da1		 fw0.0	   null		snp		 ttyvf	   xpt0
ad26p2@	 ada3		bpf		 da1p1	   fwmem0@	 pass0	   stderr@	 ufssuspend  zero
ad28@	   ada3p1	  bpf0@	   da1p1.eli   fwmem0.0	pass1	   stdin@	  ugen0.1@	zfs
ad28p1@	 ada3p1.eli  cam/		da1p2	   geom.ctl	pass10	  stdout@	 ugen1.1@


camcontrol devlist
Code:
<ST9160314AS 0020LVM1>			 at scbus1 target 0 lun 0 (pass1,ada1)
<ATA WDC WD10EALX-759 1H15>		at scbus8 target 0 lun 0 (pass2,da0)
<ATA WDC WD10EZEX-60Z 0A80>		at scbus8 target 1 lun 0 (pass3,da1)
<ATA WDC WD10EARX-00N AB51>		at scbus8 target 2 lun 0 (pass4,da2)
<ATA ST1000DM003-1ER1 CC43>		at scbus8 target 3 lun 0 (pass5,da3)
<DP BACKPLANE 1.05>				at scbus8 target 8 lun 0 (pass6,ses0)
<ACSC2M064S25 1.095.06>			at scbus11 target 0 lun 0 (pass7,ada2)
<ST1000DM003-1ER162 CC61>		  at scbus12 target 0 lun 0 (pass8,ada3)
<ST3000DM001-1CH166 CC27>		  at scbus13 target 0 lun 0 (pass9,ada4)
<ST1000DL002-9TT153 CC98>		  at scbus14 target 0 lun 0 (pass10,ada5)
<ST1500DM003-9YN16G HP16>		  at scbus15 target 0 lun 0 (pass11,ada6)
<ST1000DL002-9TT153 CC3C>		  at scbus16 target 0 lun 0 (ada0,pass0)


zpool status
Code:
  pool: StoBro
 state: DEGRADED
status: One or more devices has been removed by the administrator.
		Sufficient replicas exist for the pool to continue functioning in a
		degraded state.
action: Online the device using 'zpool online' or replace the device with
		'zpool replace'.
  scan: resilvered 1.18M in 0h0m with 0 errors on Wed May  3 11:17:22 2017
config:

		NAME											STATE	 READ WRITE CKSUM
		StoBro										  DEGRADED	 0	 0	 0
		  raidz2-0									  DEGRADED	 0	 0	 0
			gptid/44117792-10be-11e7-827c-bc5ff4f70096  ONLINE	   0	 0	 0
			gptid/8db89af1-f92d-11e6-859b-78e7d18072ab  ONLINE	   0	 0	 0
			5293962910809783256						 REMOVED	  0	 0	 0  was /dev/gptid/e268f907-2eb5-11e7-8691-7071bce911f3
			gptid/94737d25-f77f-11e6-bd72-78e7d18072ab  ONLINE	   0	 0	 0
			gptid/9550d831-f77f-11e6-bd72-78e7d18072ab  ONLINE	   0	 0	 0
		  raidz2-1									  ONLINE	   0	 0	 0
			gptid/0755e406-10e0-11e7-827c-bc5ff4f70096  ONLINE	   0	 0	 0
			gptid/205b43d8-10a9-11e7-af46-bc5ff4f70096  ONLINE	   0	 0	 0
			gptid/6c70d834-0dbe-11e7-af46-bc5ff4f70096  ONLINE	   0	 0	 0
			gptid/6d82276e-0dbe-11e7-af46-bc5ff4f70096  ONLINE	   0	 0	 0
			gptid/6e7e3ac4-0dbe-11e7-af46-bc5ff4f70096  ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Mar 29 11:31:00 2017
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  mirror-0  ONLINE	   0	 0	 0
			ada1p2  ONLINE	   0	 0	 0
			ada2p2  ONLINE	   0	 0	 0

errors: No known data errors


Simply a SS showing 'Disks'
5fa7f7ba56.png


Hope this helps further my case, thanks again.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
I'm convinced it is a hardware problem, likely controller/cable-type of deal.
Does any of your drives produce SMART ID#199 count>0?
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
I've changed all the cables with no luck, I'm testing port by port now but I am in the middle of a reboot. Once it's back up I'll test each drive with smartctl but since the one drive can't connect I can't really test it. I'll reply in a few minutes here.

Quick Update: All drives are detected in BIOS just fine.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Should post the smartctl -a output of the two drives.

Could also be power supply becoming unable to support the load over time?
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
After I did a full hard reboot, drained capacitors and such, every disk is now detected I'm going to post the output for the two drives that had the issue because I'm not positive that this won't happen again.

smartctl -a /dev/ada0
Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Caviar Blue (SATA)
Device Model:	 WDC WD10EALS-00Z8A0
Serial Number:	WD-WCATR0571584
LU WWN Device Id: 5 0014ee 204400692
Firmware Version: 05.01D05
User Capacity:	1,000,204,886,016 bytes [1.00 TB]
Sector Size:	  512 bytes logical/physical
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:	Wed May  3 12:26:11 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
										was suspended by an interrupting command								 from host.
										Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(15060) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off supp								ort.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 175) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x3037) SCT Status supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_								FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -									   12
  3 Spin_Up_Time			0x0027   190   172   021	Pre-fail  Always	   -									   3500
  4 Start_Stop_Count		0x0032   098   098   000	Old_age   Always	   -									   2647
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -									   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -									   0
  9 Power_On_Hours		  0x0032   084   084   000	Old_age   Always	   -									   12347
10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -									   0
11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -									   0
12 Power_Cycle_Count	   0x0032   098   098   000	Old_age   Always	   -									   2618
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -									   138
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -									   2508
194 Temperature_Celsius	 0x0022   112   094   000	Old_age   Always	   -									   35
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -									   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -									   0
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -									   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -									   1
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -									   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


smartctl -a /dev/ada7
Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Barracuda Green (AF)
Device Model:	 ST1000DL002-9TT153
Serial Number:	W1V1BM12
LU WWN Device Id: 5 000c50 049ee90ec
Firmware Version: CC3C
User Capacity:	1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5900 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:	Wed May  3 12:27:13 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
										was completed without error.
										Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(  623) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   1) minutes.
Extended self-test routine
recommended polling time:		( 167) minutes.
Conveyance self-test routine
recommended polling time:		(   2) minutes.
SCT capabilities:			  (0x30b7) SCT Status supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   104   099   006	Pre-fail  Always	   -	   1300472
  3 Spin_Up_Time			0x0003   093   092   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   091   091   020	Old_age   Always	   -	   9835
  5 Reallocated_Sector_Ct   0x0033   100   100   036	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   066   053   030	Pre-fail  Always	   -	   38689181186
  9 Power_On_Hours		  0x0032   012   012   000	Old_age   Always	   -	   77141
10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   624
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   098   000	Old_age   Always	   -	   65556
189 High_Fly_Writes		 0x003a   100   100   000	Old_age   Always	   -	   0
190 Airflow_Temperature_Cel 0x0022   056   052   045	Old_age   Always	   -	   44 (Min/Max 44/45 #1)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   326
193 Load_Cycle_Count		0x0032   096   096   000	Old_age   Always	   -	   9940
194 Temperature_Celsius	 0x0022   044   048   000	Old_age   Always	   -	   44 (128 0 0 0 0)
195 Hardware_ECC_Recovered  0x001a   032   004   000	Old_age   Always	   -	   1300472
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   19188 (197 63 0)
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   1896963471
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   2284499047

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I think it might be relevant to the marvell RAID controller on the board, while it's not hosting a RAID array it seems to always have the drives in question attached to those 2 ports.
 
Last edited:
Joined
Jul 10, 2016
Messages
521
Your motherboard has a mix of Intel and Marvel SATA ports. Have you tried switching SATA ports around to see if that makes a difference?

Also you seem to run a mismatch of drives, most of the "desktop"-class. Did you set TLER / ERC settings? What's the type/model of the particular HDD that has issues?

What's your power supply?
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
PSU is an Antec 450W Bronze, the problem with testing the ports is this problem is 100% intermittent, I can't force recreate it sometimes they work and sometimes they don't. It just SEEMS like the drives I've had issues with were always connected to those ports. I don't mess with TLER / ERC settings, just choose not to.

Edit: Checked just for you, only some of my drives support TLER / ERC, and out of those few only one currently has it turned on. Should it be all or none or is it good to keep just a few on and the unsupported ones off.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Code:
Model Family:	Seagate Barracuda Green (AF)
Device Model:	ST1000DL002-9TT153

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		 FLAG	VALUE WORST THRESH TYPE	 UPDATED WHEN_FAILED RAW_VALUE
 9 Power_On_Hours		 0x0032 012 012 000	Old_age Always	 -	 77141
194 Temperature_Celsius	0x0022 044 048 000	Old_age Always	 -	 44 (128 0 0 0 0)

Your old Seagate Barracuda is running a little 'hot' (44 degrees) and it's old! 77141 hours! You've gotten your money's worth out of that rascal!

It might help your system's stability if you can improve the HDD cooling; 44 degrees really is undesirably hot.
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
Code:
Model Family:	Seagate Barracuda Green (AF)
Device Model:	ST1000DL002-9TT153

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		 FLAG	VALUE WORST THRESH TYPE	 UPDATED WHEN_FAILED RAW_VALUE
9 Power_On_Hours		 0x0032 012 012 000	Old_age Always	 -	 77141
194 Temperature_Celsius	0x0022 044 048 000	Old_age Always	 -	 44 (128 0 0 0 0)

Your old Seagate Barracuda is running a little 'hot' (44 degrees) and it's old! 77141 hours! You've gotten your money's worth out of that rascal!

It might help your system's stability if you can improve the HDD cooling; 44 degrees really is undesirably hot.

I noticed that too when I was first reading through that, being honest though without an unlimited budget I'm gonna run it until it kicks the bucket. Been trying to get the drives cooler but there is only so much I can do, there are quite a few fans in the case.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
marvell sata ports are not super good on freebsd, pretty common knowledge.
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
marvell sata ports are not super good on freebsd, pretty common knowledge.
Yeah, I've already been researching those, that's why I said
I think it might be relevant to the marvell RAID controller on the board

Anyway, I'm gonna go ahead and keep testing even though the issue is no longer persisting, I don't want this to happen all the time. If anyone comes up with something else let me know :)

Thanks guys!
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
Get a reliable HBA and pop in rather than the error prone marvel raid junk.

Yes, that has always been the plan once the money starts rolling in. I know which HBA's are really good w/ FreeNAS and can handle a lot of drives, that's the dream (or a 60 drive storinator :rolleyes:). Thanks!
 
Status
Not open for further replies.
Top