SSD dropped from mirror, tested OK, FLUSHCACHE errors in dmesg

Status
Not open for further replies.

Beer

Dabbler
Joined
May 21, 2016
Messages
38
The other day one of my SSDs dropped from a mirrored volume. They are both Samsung 950 Pro drives

I powered down, tested the SSD using smartctl on another linux machine with a known working cable. It passed smartctl short test with flying colors and no suspicious items on the smart readout. The SSDs are very new, both were installed brand new 8ish months ago and have not gotten hardly any use at all.

I also tested the SATA power cable AND port (modular) on the PSU.

If I run dmesg, I get a bunch of these errors, which also show up during startup:

Code:
GEOM_RAID5: Module loaded, version 1.3.20140711.62 (rev f91e28e40bf7)
ahcich31: Timeout on slot 25 port 0
ahcich31: is 00000000 cs 02000000 ss 00000000 rs 02000000 tfd c0 serr 00000000 cmd 0004d917
(ada3:ahcich31:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada3:ahcich31:0:0:0): CAM status: Command timeout
(ada3:ahcich31:0:0:0): Retrying command
ahcich31: Timeout on slot 0 port 0
ahcich31: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd d0 serr 00000000 cmd 0004c017
(ada3:ahcich31:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada3:ahcich31:0:0:0): CAM status: Command timeout
(ada3:ahcich31:0:0:0): Error 5, Retries exhausted
ahcich31: Timeout on slot 11 port 0
ahcich31: is 00000000 cs 00000800 ss 00000000 rs 00000800 tfd c0 serr 00000000 cmd 0004cb17
(ada3:ahcich31:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada3:ahcich31:0:0:0): CAM status: Command timeout
(ada3:ahcich31:0:0:0): Retrying command
ahcich31: Timeout on slot 18 port 0
ahcich31: is 00000000 cs 00040000 ss 00000000 rs 00040000 tfd d0 serr 00000000 cmd 0004d217
(ada3:ahcich31:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada3:ahcich31:0:0:0): CAM status: Command timeout
(ada3:ahcich31:0:0:0): Error 5, Retries exhausted
ahcich31: Timeout on slot 25 port 0
ahcich31: is 00000000 cs 06000000 ss 00000000 rs 06000000 tfd d0 serr 00000000 cmd 0004d917
(ada3:ahcich31:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
(ada3:ahcich31:0:0:0): CAM status: Command timeout
(ada3:ahcich31:0:0:0): Retrying command
ahcich31: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich31: Timeout on slot 27 port 0
ahcich31: is 00000000 cs 08000000 ss 00000000 rs 08000000 tfd 80 serr 00000000 cmd 0004db17
(aprobe0:ahcich31:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich31:0:0:0): CAM status: Command timeout
(aprobe0:ahcich31:0:0:0): Retrying command
ahcich31: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich31: Timeout on slot 28 port 0
ahcich31: is 00000000 cs 10000000 ss 00000000 rs 10000000 tfd 80 serr 00000000 cmd 0004dc17
(aprobe0:ahcich31:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich31:0:0:0): CAM status: Command timeout
(aprobe0:ahcich31:0:0:0): Error 5, Retries exhausted
ahcich31: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich31: Timeout on slot 29 port 0
ahcich31: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 80 serr 00000000 cmd 0004dd17
(aprobe0:ahcich31:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich31:0:0:0): CAM status: Command timeout
(aprobe0:ahcich31:0:0:0): Error 5, Retry was blocked
ada3 at ahcich31 bus 0 scbus33 target 0 lun 0
ada3: <Samsung SSD 850 PRO 256GB EXM03B6Q> s/n S39KNX0HA63503R detached
ahcich31: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich31: Timeout on slot 30 port 0
ahcich31: is 00000000 cs 40000000 ss 00000000 rs 40000000 tfd 80 serr 00000000 cmd 0004de17
(ada3:ahcich31:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00
(ada3:ahcich31:0:0:0): CAM status: Command timeout
(ada3:ahcich31:0:0:0): Error 5, Periph was invalidated
ahcich31: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich31: Poll timeout on slot 0 port 0
ahcich31: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd 80 serr 00000000 cmd 0004c017
(aprobe0:ahcich31:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich31:0:0:0): CAM status: Command timeout
(aprobe0:ahcich31:0:0:0): Error 5, Retries exhausted
ahcich31: Timeout on slot 1 port 0
ahcich31: is 00000000 cs 00000006 ss 00000000 rs 00000006 tfd 80 serr 00000000 cmd 0004c117
(ada3:ahcich31:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
(ada3:ahcich31:0:0:0): CAM status: Command timeout
(ada3:ahcich31:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich31:0:0:0): WRITE_DMA. ACB: ca 00 f8 21 80 40 00 00 00 00 10 00
(ada3:ahcich31:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich31:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich31:0:0:0): Periph destroyed
ahcich31: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich31: Poll timeout on slot 4 port 0
ahcich31: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 00000000 cmd 0004c417
(aprobe0:ahcich31:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich31:0:0:0): CAM status: Command timeout
(aprobe0:ahcich31:0:0:0): Error 5, Retries exhausted


The missing drive is ada1, I'm not sure why it's showing ada3:ahci31 in the errors. There should be 2 SSDs at the top

EC1fcFw.png


smartctl --scan

Code:
/dev/ada0 -d atacam # /dev/ada0, ATA device
/dev/ada1 -d atacam # /dev/ada1, ATA device
/dev/ada2 -d atacam # /dev/ada2, ATA device
/dev/ada3 -d atacam # /dev/ada3, ATA device
/dev/ada4 -d atacam # /dev/ada4, ATA device
/dev/ada5 -d atacam # /dev/ada5, ATA device
/dev/ada6 -d atacam # /dev/ada6, ATA device
/dev/ada7 -d atacam # /dev/ada7, ATA device
/dev/ada8 -d atacam # /dev/ada8, ATA device


Code:
zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h2m with 0 errors on Mon Apr  2 03:47:37 2018
config:

   NAME		STATE	 READ WRITE CKSUM
   freenas-boot  ONLINE	   0	 0	 0
	 mirror-0  ONLINE	   0	 0	 0
	   ada0p2  ONLINE	   0	 0	 0
	   ada1p2  ONLINE	   0	 0	 0

errors: No known data errors

  pool: lightning
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
   the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0h2m with 0 errors on Fri Apr 20 17:16:15 2018
config:

   NAME											STATE	 READ WRITE CKSUM
   lightning									   DEGRADED	 0	 0	 0
	 mirror-0									  DEGRADED	 0	 0	 0
	   gptid/3314a75a-e0e5-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   17822343706091218419						UNAVAIL	  0	 0	 0  was /dev/gptid/334432cb-e0e5-11e6-94a9-000c293a029e

errors: No known data errors

  pool: thunder
 state: ONLINE
  scan: scrub repaired 0 in 3h11m with 0 errors on Wed Mar 14 06:11:40 2018
config:

   NAME											STATE	 READ WRITE CKSUM
   thunder										 ONLINE	   0	 0	 0
	 raidz2-0									  ONLINE	   0	 0	 0
	   gptid/241dc692-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/24e8bc15-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/25b9f266-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/26894821-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/275eb7c5-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/28270aa2-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0

errors: No known data errors


Code:
[root@freenas] /dev# zpool online lightning 17822343706091218419
warning: device '17822343706091218419' onlined, but remains in faulted state



Code:
[root@freenas] /dev# smartctl -a /dev/ada1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	 VMware Virtual SATA Hard Drive
Serial Number:	01000000000000000001
LU WWN Device Id: 5 000c29 bfddddcd0
Firmware Version: 00000001
User Capacity:	17,179,869,184 bytes [17.1 GB]
Sector Size:	  512 bytes logical/physical
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA/ATAPI-6 T13/1410D revision 0
Local Time is:	Mon Apr 23 20:20:50 2018 CDT
SMART support is: Unavailable - device lacks SMART capability.

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
[root@freenas] /dev# smartctl -a -T permissive /dev/ada1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	 VMware Virtual SATA Hard Drive
Serial Number:	01000000000000000001
LU WWN Device Id: 5 000c29 bfddddcd0
Firmware Version: 00000001
User Capacity:	17,179,869,184 bytes [17.1 GB]
Sector Size:	  512 bytes logical/physical
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA/ATAPI-6 T13/1410D revision 0
Local Time is:	Mon Apr 23 20:20:55 2018 CDT
SMART support is: Unavailable - device lacks SMART capability.

SMART Disabled. Use option -s with argument 'on' to enable it.
(override with '-T permissive' option)
[root@freenas] /dev# smartctl
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

ERROR: smartctl requires a device name as the final command-line argument.


Use smartctl -h to get a usage summary

[root@freenas] /dev# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h2m with 0 errors on Mon Apr  2 03:47:37 2018
config:

   NAME		STATE	 READ WRITE CKSUM
   freenas-boot  ONLINE	   0	 0	 0
	 mirror-0  ONLINE	   0	 0	 0
	   ada0p2  ONLINE	   0	 0	 0
	   ada1p2  ONLINE	   0	 0	 0

errors: No known data errors

  pool: lightning
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
   the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0h2m with 0 errors on Fri Apr 20 17:16:15 2018
config:

   NAME											STATE	 READ WRITE CKSUM
   lightning									   DEGRADED	 0	 0	 0
	 mirror-0									  DEGRADED	 0	 0	 0
	   gptid/3314a75a-e0e5-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   17822343706091218419						UNAVAIL	  0	 0	 0  was /dev/gptid/334432cb-e0e5-11e6-94a9-000c293a029e

errors: No known data errors

  pool: thunder
 state: ONLINE
  scan: scrub repaired 0 in 3h11m with 0 errors on Wed Mar 14 06:11:40 2018
config:

   NAME											STATE	 READ WRITE CKSUM
   thunder										 ONLINE	   0	 0	 0
	 raidz2-0									  ONLINE	   0	 0	 0
	   gptid/241dc692-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/24e8bc15-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/25b9f266-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/26894821-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/275eb7c5-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/28270aa2-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0

errors: No known data errors
[root@freenas] /dev# zpool online /dev/ada1
missing device name
usage:
   online [-e] <pool> <device> ...
[root@freenas] /dev# zpool online lightning lightning
cannot online lightning: no such device in pool
[root@freenas] /dev# zpool online lightning 17822343706091218419
warning: device '17822343706091218419' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
[root@freenas] /dev# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h2m with 0 errors on Mon Apr  2 03:47:37 2018
config:

   NAME		STATE	 READ WRITE CKSUM
   freenas-boot  ONLINE	   0	 0	 0
	 mirror-0  ONLINE	   0	 0	 0
	   ada0p2  ONLINE	   0	 0	 0
	   ada1p2  ONLINE	   0	 0	 0

errors: No known data errors

  pool: lightning
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
   the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0h2m with 0 errors on Fri Apr 20 17:16:15 2018
config:

   NAME											STATE	 READ WRITE CKSUM
   lightning									   DEGRADED	 0	 0	 0
	 mirror-0									  DEGRADED	 0	 0	 0
	   gptid/3314a75a-e0e5-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   17822343706091218419						UNAVAIL	  0	 0	 0  was /dev/gptid/334432cb-e0e5-11e6-94a9-000c293a029e

errors: No known data errors

  pool: thunder
 state: ONLINE
  scan: scrub repaired 0 in 3h11m with 0 errors on Wed Mar 14 06:11:40 2018
config:

   NAME											STATE	 READ WRITE CKSUM
   thunder										 ONLINE	   0	 0	 0
	 raidz2-0									  ONLINE	   0	 0	 0
	   gptid/241dc692-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/24e8bc15-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/25b9f266-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/26894821-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/275eb7c5-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	   gptid/28270aa2-e0f1-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0

errors: No known data errors
[root@freenas] /dev# smartctl -a /dev/ada1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	 VMware Virtual SATA Hard Drive
Serial Number:	01000000000000000001
LU WWN Device Id: 5 000c29 bfddddcd0
Firmware Version: 00000001
User Capacity:	17,179,869,184 bytes [17.1 GB]
Sector Size:	  512 bytes logical/physical
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA/ATAPI-6 T13/1410D revision 0
Local Time is:	Mon Apr 23 20:49:28 2018 CDT
SMART support is: Unavailable - device lacks SMART capability.

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
[root@freenas] /dev# smartctl -a /dev/ada2
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Samsung based SSDs
Device Model:	 Samsung SSD 850 PRO 256GB
Serial Number:	S39KNX0HA63496N
LU WWN Device Id: 5 002538 d4162c297
Firmware Version: EXM03B6Q
User Capacity:	256,060,514,304 bytes [256 GB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	Solid State Device
Form Factor:	  2.5 inches
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Mon Apr 23 20:49:32 2018 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
				   was never started.
				   Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0)   The previous self-test routine completed
				   without error or no self-test has ever
				   been run.
Total time to complete Offline
data collection:		(	0) seconds.
Offline data collection
capabilities:			 (0x53) SMART execute Offline immediate.
				   Auto Offline data collection on/off support.
				   Suspend Offline collection upon new
				   command.
				   No Offline surface scan supported.
				   Self-test supported.
				   No Conveyance Self-test supported.
				   Selective Self-test supported.
SMART capabilities:			(0x0003)   Saves SMART data before entering
				   power-saving mode.
				   Supports SMART auto save timer.
Error logging capability:		(0x01)   Error logging supported.
				   General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   2) minutes.
Extended self-test routine
recommended polling time:	 ( 133) minutes.
SCT capabilities:		   (0x003d)   SCT Status supported.
				   SCT Error Recovery Control supported.
				   SCT Feature Control supported.
				   SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
  9 Power_On_Hours		  0x0032   097   097   000	Old_age   Always	   -	   10298
 12 Power_Cycle_Count	   0x0032   099   099   000	Old_age   Always	   -	   42
177 Wear_Leveling_Count	 0x0013   098   098   000	Pre-fail  Always	   -	   67
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010	Pre-fail  Always	   -	   0
181 Program_Fail_Cnt_Total  0x0032   100   100   010	Old_age   Always	   -	   0
182 Erase_Fail_Count_Total  0x0032   100   100   010	Old_age   Always	   -	   0
183 Runtime_Bad_Block	   0x0013   100   100   010	Pre-fail  Always	   -	   0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000	Old_age   Always	   -	   0
190 Airflow_Temperature_Cel 0x0032   066   053   000	Old_age   Always	   -	   34
195 ECC_Error_Rate		  0x001a   200   200   000	Old_age   Always	   -	   0
199 CRC_Error_Count		 0x003e   100   100   000	Old_age   Always	   -	   0
235 POR_Recovery_Count	  0x0012   099   099   000	Old_age   Always	   -	   24
241 Total_LBAs_Written	  0x0032   099   099   000	Old_age   Always	   -	   15400954249

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
  255		0	65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I'm not finding a whole lot on google for this. Or other tests to perform. Any ideas? Thanks

This is a virtualized environment, all drives are on a passed through intel wellsburg ahci SATA controller. I know, not ideal, but it's been running fine for year. I haven't really touched the server in months. Just happened out of nowhere.

I found similar a report of another user very recently:
https://forums.freenas.org/index.php?threads/what-does-a-nop-flushqueue-message-mean.54736/
And:
https://forums.freenas.org/index.ph...onnect-to-the-volume-detect-by-freenas.54188/


Freenas 9.10.2-U2 (updating to U6 completely breaks my freenas install so I can't upgrade)
 
Last edited:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Well it looks like your ada1 is one of your boot drives and is a VMDK. This is confirmed by your zpool status; freenas-boot contains ada0 and ada1. Keep in mind the adaX identifiers shift around and generally FreeNAS should be using a disks gptid i.e. 241dc692-e0f1-11e6-94a9-000c293a029e.

I looks like the disk with gptid 334432cb-e0e5-11e6-94a9-000c293a029e is missing (the Samsung in this case) and is not showing up. I would be willing to bet it wont show up in camcontrol devlist either. This means you will not be able to online the disk. its just not there to "online".

Backup your data and try swapping the SSD ports. This will eliminate the possibility of a bad port/cable/connector.
Code:
[root@freenas] /dev# smartctl -a /dev/ada1
...
=== START OF INFORMATION SECTION ===
Device Model:	 VMware Virtual SATA Hard Drive
...
User Capacity:	17,179,869,184 bytes [17.1 GB]
...

[root@freenas] /dev# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h2m with 0 errors on Mon Apr  2 03:47:37 2018
config:

   NAME		STATE	 READ WRITE CKSUM
   freenas-boot  ONLINE	   0	 0	 0
	mirror-0  ONLINE	   0	 0	 0
	 ada0p2  ONLINE	   0	 0	 0
	 ada1p2  ONLINE	   0	 0	 0

errors: No known data errors

  pool: lightning
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
   the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0h2m with 0 errors on Fri Apr 20 17:16:15 2018
config:

   NAME											STATE	 READ WRITE CKSUM
   lightning									   DEGRADED	 0	 0	 0
	mirror-0									  DEGRADED	 0	 0	 0
	 gptid/3314a75a-e0e5-11e6-94a9-000c293a029e  ONLINE	   0	 0	 0
	 17822343706091218419						UNAVAIL	  0	 0	 0  was /dev/gptid/334432cb-e0e5-11e6-94a9-000c293a029e

errors: No known data errors

[root@freenas] /dev# zpool online lightning lightning
cannot online lightning: no such device in pool
[root@freenas] /dev# zpool online lightning 17822343706091218419
warning: device '17822343706091218419' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
...
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Have you tried updating the SSD firmware?
 

Beer

Dabbler
Joined
May 21, 2016
Messages
38
Well it looks like your ada1 is one of your boot drives and is a VMDK. This is confirmed by your zpool status; freenas-boot contains ada0 and ada1. Keep in mind the adaX identifiers shift around and generally FreeNAS should be using a disks gptid i.e. 241dc692-e0f1-11e6-94a9-000c293a029e.

I looks like the disk with gptid 334432cb-e0e5-11e6-94a9-000c293a029e is missing (the Samsung in this case) and is not showing up. I would be willing to bet it won't show up in camcontrol devlist either. This means you will not be able to online the disk. its just not there to "online".

Backup your data and try swapping the SSD ports. This will eliminate the possibility of a bad port/cable/connector.

Correct, it doesn't show up in devlist

Code:
[root@freenas] ~# camcontrol devlist
<VMware Virtual SATA Hard Drive 00000001>  at scbus2 target 0 lun 0 (pass0,ada0)
<VMware Virtual SATA Hard Drive 00000001>  at scbus3 target 0 lun 0 (pass1,ada1)
<NECVMWar VMware SATA CD02 1.00>   at scbus4 target 0 lun 0 (pass2,cd0)
<Samsung SSD 850 PRO 256GB EXM03B6Q>  at scbus32 target 0 lun 0 (pass3,ada2)
<HGST HDN724040ALE640 MJAOA5E0>	at scbus36 target 0 lun 0 (pass4,ada3)
<HGST HDN724040ALE640 MJAOA5E0>	at scbus37 target 0 lun 0 (pass5,ada4)
<HGST HDN724040ALE640 MJAOA5E0>	at scbus38 target 0 lun 0 (pass6,ada5)
<HGST HDN724040ALE640 MJAOA5E0>	at scbus39 target 0 lun 0 (pass7,ada6)
<HGST HDN724040ALE640 MJAOA5E0>	at scbus40 target 0 lun 0 (pass8,ada7)
<HGST HDN724040ALE640 MJAOA5E0>	at scbus41 target 0 lun 0 (pass9,ada8)


I did update the firmware, still no go.

I think I've made one discovery... all of those timeout errors, after updating disk firmware and booting back up the system, I got about 5 minutes worth of timeout errors, then freenas started up. Nothing new there. I shut freenas down and started the VM again and those errors went away. So I think something is just in an unready state in of some sort in VMware by the time the Freenas VM kicks on. Not sure.

I can deal with that for now if that's the case.

However, still no disk. I have indeed tested a new cable, everything is fine there. Only untested hardware is the sata port. I can try swapping the SSDs, there's nothing important on them right now.

I may also boot up a linux iso USB on the vmware box and see if it can find the SSD. If it can't then I'm 99% sure it's the port, because the disk, cable and sata power cable have all been tested.
 
Last edited:

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Only untested hardware is the sata port. I can try swapping the SSDs, there's nothing important on them right now.

I may also boot up a linux iso USB on the vmware box and see if it can find the SSD. If it can't then I'm 99% sure it's the port, because the disk, cable and sata power cable have all been tested.
Sounds like a good plan. Looking forward to your findings.
 
Status
Not open for further replies.
Top