FreeNAS suddenly detaches a drive in pool

blckhm · Sep 24, 2018

Hi,

Our freenas has been configured as 20x 8TB seagate archive hdd in single pool with 2way mirror setup.

We bring that setup online 4 weeks ago without any problem but after that, our drives generates some error and freenas looks detaches these drives suddenly from mirror.

So we need to manually disconnect drive from enclosure and reattach drive. pool goes to resilvering and everythings fine.

This was happened 3 times in 2 weeks and now we are worried.

I'll attach some logs about error.
We could not find any specific report as like that.

Code:

Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-STABLE #2 r321665+366f54a78b2(freenas/11.1-stable): Wed Mar 21 23:04:13 UTC 2018
	root@gauntlet:/freenas-11-releng/freenas/_BE/objs/freenas-11-releng/freenas/_BE/os/sys/FreeNAS.amd64 amd64
FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 5.0.0svn)
CPU: Intel(R) Xeon(R) CPU E5-2630L v3 @ 1.80GHz (1800.00-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306f2  Family=0x6  Model=0x3f  Stepping=2
  Features=0x1f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT>
  Features2=0xfffa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x21<LAHF,ABM>
  Structured Extended Features=0x27ab<FSGSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,NFPUSG>
  XSAVE Features=0x1<XSAVEOPT>
  TSC: P-state invariant
Hypervisor: Origin = "VMwareVMware"
real memory  = 56908316672 (54272 MB)
avail memory = 54198685696 (51687 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <PTLTD	  APIC  >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 8 core(s)
WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 2.0> irqs 0-23 on motherboard
..
..
..
..
..
	(da4:mpr0:0:11:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 890 terminated ioc 804b loginfo 31110e03 scsi 0 state c xfer 0
	(da4:mpr0:0:11:0): READ(16). CDB: 88 00 00 00 00 02 68 92 b9 20 00 00 00 08 00 00 length 4096 SMID 981 terminated ioc 804b lo(da4:mpr0:0:11:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
ginfo 31110e03 scsi 0 state c xfer 0
(da4:mpr0:0:11:0): CAM status: CCB request completed with an error
(da4:mpr0:0:11:0): Retrying command
(da4:mpr0:0:11:0): READ(16). CDB: 88 00 00 00 00 02 68 92 b9 20 00 00 00 08 00 00
(da4:mpr0:0:11:0): CAM status: CCB request completed with an error
(da4:mpr0:0:11:0): Retrying command
(da4:mpr0:0:11:0): READ(16). CDB: 88 00 00 00 00 02 68 92 b9 20 00 00 00 08 00 00
(da4:mpr0:0:11:0): CAM status: SCSI Status Error
(da4:mpr0:0:11:0): SCSI status: Check Condition
(da4:mpr0:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da4:mpr0:0:11:0): Retrying command (per sense data)
(da4:mpr0:0:11:0): READ(16). CDB: 88 00 00 00 00 02 68 92 b9 20 00 00 00 08 00 00
(da4:mpr0:0:11:0): CAM status: SCSI Status Error
(da4:mpr0:0:11:0): SCSI status: Check Condition
(da4:mpr0:0:11:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da4:mpr0:0:11:0): Info: 0x26892b920
(da4:mpr0:0:11:0): Error 5, Unretryable error
mpr0: mprsas_prepare_remove: Sending reset for target ID 11
da4 at mpr0 bus 0 scbus3 target 11 lun 0
da4: <ATA ST8000AS0002-1NA AR13> s/n Z8403XVT detached
(da4:mpr0:0:11:0): Periph destroyed
mpr0: clearing target 11 handle 0x000d
mpr0: At enclosure level 0, slot 3, connector name (	)
mpr0: Unfreezing devq for target ID 11
mpr0: SAS Address for SATA device = 371a444567948a47
mpr0: SAS Address from SAS device page0 = 5003048001ce5083
mpr0: SAS Address from SATA device = 371a444567948a47
mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000d> enclosureHandle<0x0002> slot 3
mpr0: At enclosure level 0 and connector name (	)
ses0: da4,pass5: Element descriptor: 'Slot03'
da4 at mpr0 bus 0 scbus3 target 11 lun 0
ses0: da4,pass5: SAS Device Slot Element: 1 Phys at Slot 3
ses0:  phy 0: SATA device
ses0:  phy 0: parent 5003048001ce50bf addr 5003048001ce5083
da4: <ATA ST8000AS0002-1NA AR13> Fixed Direct Access SPC-4 SCSI device
da4: Serial Number Z8403XGD
da4: 1200.000MB/s transfers
da4: Command Queueing enabled
da4: 7630885MB (15628053168 512 byte sectors)
da4: quirks=0x80<SMR_DM>
Local NSM refuses to monitor worker2
Limiting closed port RST response from 280 to 200 packets/sec
Limiting closed port RST response from 273 to 200 packets/sec
	(da13:mpr0:0:20:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 329 Aborting command 0xfffffe000105d8f0
mpr0: Sending reset from mprsas_send_abort for target ID 20
mpr0: mprsas_prepare_remove: Sending reset for target ID 20
da13 at mpr0 bus 0 scbus3 target 20 lun 0
da13: <ATA ST8000AS0002-1NA AR17> s/n Z840WDAJ detached
	(da13:mpr0:0:20:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 854 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
	(da13:mpr0:0:20:0): READ(10). CDB: 28 00 44 cd c8 28 00 01 00 00 length 131072 SMID 1142 terminated ioc 804b loginfo 31130000(da13:mpr0:0:20:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
 scsi 0 state c xfer 0
	(da13:mpr0:0:20:0): READ(10). CDB: 28 00 44 cd c7 28 00 01 00 00 length 131072 SMID 1015 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
	(da13:mpr0:0:20:0): READ(10). CDB: 28 00 44 cd c9 28 00 01 00 00 length 131072 SMID 748 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
mpr0: clearing target 20 handle 0x0017
mpr0: At enclosure level 0, slot 12, connector name (	)
mpr0: Unfreezing devq for target ID 20
mpr0: Unfreezing devq for target ID 20
(da13:mpr0:0:20:0): CAM status: CCB request completed with an error
(da13:mpr0:0:20:0): Error 5, Periph was invalidated
(da13:mpr0:0:20:0): READ(10). CDB: 28 00 44 cd c8 28 00 01 00 00
(da13:mpr0:0:20:0): CAM status: CCB request completed with an error
(da13:mpr0:0:20:0): Error 5, Periph was invalidated
(da13:mpr0:0:20:0): READ(10). CDB: 28 00 44 cd c7 28 00 01 00 00
(da13:mpr0:0:20:0): CAM status: CCB request completed with an error
(da13:mpr0:0:20:0): Error 5, Periph was invalidated
(da13:mpr0:0:20:0): READ(10). CDB: 28 00 44 cd c9 28 00 01 00 00
(da13:mpr0:0:20:0): CAM status: CCB request completed with an error
(da13:mpr0:0:20:0): Error 5, Periph was invalidated
(da13:mpr0:0:20:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da13:mpr0:0:20:0): CAM status: Command timeout
(da13:mpr0:0:20:0): Error 5, Periph was invalidated
GEOM_MIRROR: Device swap3: provider da13p1 disconnected.
(da13:mpr0:0:20:0): Periph destroyed
mpr0: SAS Address for SATA device = 371a44458b80844d
mpr0: SAS Address from SAS device page0 = 5003048001ce509c
mpr0: SAS Address from SATA device = 371a44458b80844d
mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0017> enclosureHandle<0x0002> slot 12
mpr0: At enclosure level 0 and connector name (	)
ses0: da13,pass14: Element descriptor: 'Slot12'
da13 at mpr0 bus 0 scbus3 target 20 lun 0
ses0: da13,pass14: SAS Device Slot Element: 1 Phys at Slot 12
ses0:  phy 0: SATA device
ses0:  phy 0: parent 5003048001ce50bf addr 5003048001ce509c
da13: <ATA ST8000AS0002-1NA AR17> Fixed Direct Access SPC-4 SCSI device
da13: Serial Number Z840WDAJ
da13: 1200.000MB/s transfers
da13: Command Queueing enabled
da13: 7630885MB (15628053168 512 byte sectors)
da13: quirks=0x80<SMR_DM>
	(da18:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 722 Aborting command 0xfffffe0001080de0
mpr0: Sending reset from mprsas_send_abort for target ID 25
mpr0: mprsas_prepare_remove: Sending reset for target ID 25
da18 at mpr0 bus 0 scbus3 target 25 lun 0
da18: <ATA ST8000AS0002-1NA AR17> s/n Z840WDWG detached
	(da18:mpr0:0:25:0): WRITE(10). CDB: 2a 00 45 c9 81 b0 00 00 08 00 length 4096 SMID 353 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
mpr0: clearing target 25 handle 0x001c
(da18:mpr0:0:25:0): WRITE(10). CDB: 2a 00 45 c9 81 b0 00 00 08 00
mpr0: At enclosure level 0, slot 17, connector name (	)
mpr0: Unfreezing devq for target ID 25
mpr0: Unfreezing devq for target ID 25
(da18:mpr0:0:25:0): CAM status: CCB request completed with an error
(da18:mpr0:0:25:0): Error 5, Periph was invalidated
(da18:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da18:mpr0:0:25:0): CAM status: Command timeout
(da18:mpr0:0:25:0): Error 5, Periph was invalidated
GEOM_MIRROR: Device swap1: provider da18p1 disconnected.
(da18:mpr0:0:25:0): Periph destroyed

And here is the sample smart log of a disk which is detached from pool

Code:

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Archive HDD
Device Model:	 ST8000AS0002-1NA17Z
LU WWN Device Id: 5 000c50 0936b1c4e
Firmware Version: AR17
User Capacity:	8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5980 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Mon Sep 24 18:20:17 2018 +03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection:		 (	0) seconds.
Offline data collection
capabilities:			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   1) minutes.
Extended self-test routine
recommended polling time:	 ( 936) minutes.
Conveyance self-test routine
recommended polling time:	 (   2) minutes.
SCT capabilities:			(0x30b5)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   108   099   006	Pre-fail  Always	   -	   19827984
  3 Spin_Up_Time			0x0003   092   090   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   47
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   088   060   030	Pre-fail  Always	   -	   4970814419
  9 Power_On_Hours		  0x0032   089   089   000	Old_age   Always	   -	   9676
 10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   45
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   098   000	Old_age   Always	   -	   8590065668
189 High_Fly_Writes		 0x003a   100   100   000	Old_age   Always	   -	   0
190 Airflow_Temperature_Cel 0x0022   072   061   045	Old_age   Always	   -	   28 (Min/Max 27/29)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   454
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   498
194 Temperature_Celsius	 0x0022   028   040   000	Old_age   Always	   -	   28 (0 16 0 0 0)
195 Hardware_ECC_Recovered  0x001a   108   099   000	Old_age   Always	   -	   19827984
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   9603 (13 255 0)
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   11219455640
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   294052578821

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline	Completed without error	   00%	  9630		 -
# 2  Short offline	   Completed without error	   00%	  9541		 -
# 3  Short offline	   Completed without error	   00%	  9373		 -
# 4  Extended offline	Completed without error	   00%	  9296		 -
# 5  Short offline	   Completed without error	   00%	  9205		 -
# 6  Short offline	   Completed without error	   00%	  8965		 -
# 7  Extended offline	Completed without error	   00%	  8886		 -
# 8  Short offline	   Completed without error	   00%	  8797		 -
# 9  Short offline	   Completed without error	   00%	  8621		 -
#10  Extended offline	Completed without error	   00%	  8546		 -
#11  Short offline	   Completed without error	   00%	  8453		 -
#12  Short offline	   Completed without error	   00%	  8213		 -
#13  Extended offline	Completed without error	   00%	  8134		 -
#14  Short offline	   Completed without error	   00%	  8045		 -
#15  Short offline	   Completed without error	   00%	  7877		 -
#16  Extended offline	Completed without error	   00%	  7798		 -
#17  Extended offline	Interrupted (host reset)	  00%	  7735		 -
#18  Short offline	   Completed without error	   00%	  7659		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

HBA

Code:

Avago Technologies SAS3 Flash Utility										   
Version 16.00.00.00 (2017.05.02)												
Copyright 2008-2017 Avago Technologies. All rights reserved.					
																			   
	   Adapter Selected is a Avago SAS: SAS3008(C0)							
																			   
Num   Ctlr			FW Ver		NVDATA		x86-BIOS		 PCI Addr	 
----------------------------------------------------------------------------	
																			   
0  SAS3008(C0)  12.00.00.00	0b.02.00.07	08.29.00.00	 00:03:00:00

Johnny Fartpants · Sep 24, 2018

blckhm said:
seagate archive hdd

This says it all I think. I've played with these SMR drives before and have experienced the same sort of behaviour. Long story short I came to the conclusion that they just aren't suited to being used in a NAS.

PS: Full system specs would be helpful but in this case I feel the writings on the wall.

PPS: Strange setup 10 x mirrors and SLOW big ass archive drives. Did you want performance or capacity?

Johnny Fartpants · Sep 24, 2018

Im sure I've read that @Arwen has used these before so perhaps she can give a more experienced response?

blckhm · Sep 24, 2018

Johnny Fartpants said:
This says it all I think. I've played with these SMR drives before and have experienced the same sort of behaviour. Long story short I came to the conclusion that they just aren't suited to being used in a NAS.

PS: Full system specs would be helpful but in this case I feel the writings on the wall.

PPS: Strange setup 10 x mirrors and SLOW big ass archive drives. Did you want performance or capacity?

We did not need write performance at the beginning. Usually these drives works under read intensive works (large video files).
But now, we try to move our images to the pool which runs under write intensive works.

HBA info

Code:

Avago Technologies SAS3 Flash Utility										   
Version 16.00.00.00 (2017.05.02)												
Copyright 2008-2017 Avago Technologies. All rights reserved.					
																			   
	   Adapter Selected is a Avago SAS: SAS3008(C0)							
																			   
Num   Ctlr			FW Ver		NVDATA		x86-BIOS		 PCI Addr	 
----------------------------------------------------------------------------	
																			   
0  SAS3008(C0)  12.00.00.00	0b.02.00.07	08.29.00.00	 00:03:00:00

BTW, Is there any page to list full system specs on web ui?

HoneyBadger · Sep 24, 2018

Hypervisor: Origin = "VMwareVMware"

Is this a virtual FreeNAS install, and were you sure to use VT-d/IOMMU passthrough on your HBA? If this is a virtual install, you've also set up 8 vCPUs on an 8 physical-core machine, which could cause other issues.

Seagate Archive HDD

I suggest testing these drives with 7.62x39mm rounds, applied at high velocity, perpendicular to the platters. Replace any failed units with conventional PMR 8TB drives.

Short answer: SMR "archive" drives are very bad for RAID health and performance.

Arwen · Sep 24, 2018

I own a Seagate 8TB SMR, and while it works fine, I would not buy it today unless the price was really better than other 8TB drives. (This was the ONLY consumer 8TB drive available at the time I bought it.)

That said, if anyone, (Seagate or other), came out with a drive managed SMR drive that was 14TB to 22TB, (using current 8TB to 12TB normal disk platters), I'd consider it.

One thing to consider is that SMR drives can have periods when they are not responding. For my use as a single drive ZFS pool for backups, it worked. But, during testing I found that at times data being written paused.

Investigation into the design of Seagate's SMR drives found that they have a set of normal non-SMR tracks used for write buffering. If this write buffer fills up, it needs to be flushed to SMR tracks before more data can be accepted from the SATA port. That can take time if the SMR tracks need to have the next track read and stored. This both delays further writes and can give the appearance that the drive is hung. These delays can be many seconds long, even up to 5 to 10 seconds. It did not seem to cause any issue with my backups, so I ignored it. (My backup windows are not time constrained.)

I found indications that my 8TB SMR drive has 20GBs of non-SMR data tracks for this write buffer. Thus, any burst of less than this buffer size may not notice the slow down. But, continuous writing, (like my backups, or your case of loading data), would likely experience it several times a minute, (per drive!).

After thinking about this technology for a long while, (sometimes I get paid to think about other people's problems, wish I got paid for this one!), I think SMR drives are an IDEAL case for hybrid spinning platters and flash. For example, if we take a 12TB drive which could be doubled if full SMR, to about 24TBs, I'd guess a 32GB flash drive for write buffer, (and the silly SMR read next track before write), would be good. In fact, since the 32GB flash is not critical, (meaning it does not impact size of the storage), if the flash started wearing down, no problem. Just keep reducing the used blocks of the flash until it's a bit unusable, like 4GBs.

So, after all that, I would never recommend SMR drives for any type of RAID except simple striping.

blckhm · Sep 24, 2018

HoneyBadger said:
Is this a virtual FreeNAS install, and were you sure to use VT-d/IOMMU passthrough on your HBA? If this is a virtual install, you've also set up 8 vCPUs on an 8 physical-core machine, which could cause other issues.

I suggest testing these drives with 7.62x39mm rounds, applied at high velocity, perpendicular to the platters. Replace any failed units with conventional PMR 8TB drives.

Short answer: SMR "archive" drives are very bad for RAID health and performance.

Haha, Thanks for great advice. I think these drives are not too bad for backup only plans. But in production environments, it’s sucks

And yes it’s virtualized. I don’t know what is the vtd iommu but I am sure that, hba connected to guest os via passthrough. So all drives connected as dedicated to freenas and also hba.

HoneyBadger · Sep 24, 2018

SMR is fine for attaching as a USB drive to a Windows machine, where people can just write write write and never delete.

blckhm said:
And yes it’s virtualized. I don’t know what is the vtd iommu but I am sure that, hba connected to guest os via passthrough. So all drives connected as dedicated to freenas and also hba.

PCI passthrough is what I was getting at; it looked that way but I wanted to be sure.

You also seem to have 8 vCPUs assigned to that machine, which is a lot and could be causing some issues with %RDY/%CSTP as your scheduler is trying to find eight simultaneously available threads on a 16-thread CPU. Check your esxtop for those columns for your FreeNAS VM.

Once you've "solved the SMR problem" I'd wager you'll get just as good overall performance, possibly better, with 4 vCPUs.

pro lamer · Sep 26, 2018

a bit off-topic:

Arwen said:
32GB flash drive for write buffer, (

As a SLOG? Or you mean an SSHD?

Sent from my mobile phone

Arwen · Sep 27, 2018

Arwen said:
32GB flash drive for write buffer, (

pro lamer said:
a bit off-topic:
As a SLOG? Or you mean an SSHD?

As a hybrid spinning drive which includes builtin flash for use as a write buffer. Not separate, and not available to the user for any other use, unlike some hybrid drives. (Some hybrid drives use their flash for read caching...) This would have to be included inside the drive in order for it to be useful for drive managed SMR.

HoneyBadger · Sep 27, 2018

Arwen said:
This would have to be included inside the drive in order for it to be useful for drive managed SMR.

The newer Seagate Barracuda Mobile SMR drives include this (just not in the same capacity) branded as "Multi Tier Caching" - they have around 1GB of NAND and a few bands of PMR tracks (around 20GB or so, similar to yours) that are used to capture the writes quickly.

"End-user" workloads are fine, since the likelihood of John/Jane Q. Public writing more than ~20GB of data without giving the drive a chance to spool out to the SMR tracks is unlikely, outside of an application restore.

Once host-aware SMR drives become widely available they may have some merit for a filesystem using very large blocks - imagine if you could be assured that the shingle you're about to invalidate doesn't need to be re-written because it only contains dirty data - but for now, SMR drives need to be avoided like the plague for ZFS.

Arwen · Sep 27, 2018

HoneyBadger said:
...
Once host-aware SMR drives become widely available they may have some merit for a filesystem using very large blocks - imagine if you could be assured that the shingle you're about to invalidate doesn't need to be re-written because it only contains dirty data - but for now, SMR drives need to be avoided like the plague for ZFS.

But, host aware can already be implemented, easily: TRIM. If the host says TRIM some blocks, which happen to be the whole next track over, (which will be over-written), then the SMR drive knows it does not have to backup that next track.

Now all we need is SMR drives with TRIM. If the OS disk drive sees the TRIM feature and does not discrimate against spinning disks with TRIM, then it's done.

HoneyBadger · Sep 27, 2018

Arwen said:
If the host says TRIM some blocks, which happen to be the whole next track over, (which will be over-written), then the SMR drive knows it does not have to backup that next track.

In my opinion, the TRIM/UNMAP command shouldn't be reused because the intention there is that upon receiving that command, the drive should actually erase/clean the LBA/address in question. And the last thing we need is drives and controllers lying about what they're actually doing - if we wanted that, we'd all be using hardware RAID. ;)

But there is one thing that's similar - both SSDs and SMR drives need garbage collection. SSDs need to actually zap the blocks that were TRIMmed so there isn't an erase penalty on the next write. And in a copy/redirect-on-write environment, SMR drives would need to proactively be re-shingling their data towards the "lower layers" so that all of the "low cost write tracks" are exposed at the "top of the stack." The problem with drive-managed is that it does this either completely on its own without informing the controller, or only on-demand, resulting in inconsistent performance.

Once the drives, filesystem, OS, etc, can all communicate, you could see a conversation between ZFS and the drives where both are aware of free vs "free without reshingling" - the tracks could even be sliced into "meta-shingles" and the drive would be able to provide a better estimate of how long it would take to perform the operation and reshingle the data underneath, similar to metaslab LBA weighting now. "Cache" usage, whether it was NAND on the device or the PMR tracks, could also be reported and taken into consideration for whether or not an artificial delay would need to be inserted vis-a-vis the current write throttle.

(Or we could just use PMR drives.)

Important Announcement for the TrueNAS Community.

FreeNAS suddenly detaches a drive in pool

blckhm

Dabbler

Johnny Fartpants

Guru

Johnny Fartpants

Guru

blckhm

Dabbler

HoneyBadger

actually does care

Arwen

MVP

blckhm

Dabbler

HoneyBadger

actually does care

pro lamer

Guru

Arwen

MVP

HoneyBadger

actually does care

Arwen

MVP

HoneyBadger

actually does care

Similar threads

Important Announcement for the TrueNAS Community.

FreeNAS suddenly detaches a drive in pool

Dabbler

Guru

Guru

Dabbler

actually does care

MVP

Dabbler

actually does care

Guru

MVP

actually does care

MVP

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "FreeNAS suddenly detaches a drive in pool"

Similar threads