Samsung 850 EVO: "disk busy" weirdness?

chuegen · Sep 11, 2017

Hi all,

I'm seeing a bit of weirdness and I'm unsure of whether it's causing me some problems with performance.

Problem: 2 of the SSD's (da0,da1) report high "disk busy" numbers compared to the other 6 (da2 through da7) for the same amount of load. I'm finding that the performance of the installation is not where I expect it, and I can't seem to get maximum performance from the configuration - on a "dd if=/dev/zero of=/mnt/DATASSD/NAS-VMD/freenas/outfile bs=1M count=262144", gstat shows da0 and da1 as 80%+ busy while the rest of the drives are only 30% busy.

Dell T320, Running 11.0-U3
32 GB RAM, 8x Samsung 850 EVO 1TB SSD's on Dell PERC controller flashed to "IT" (passthrough) mode
ZFS pool set up as 4 groups of mirror pairs
10GbE twinax to switch, 10 GbE twinax to compute nodes

NFS shares mounted as:
a.b.c.d:/mnt/DATASSD/NAS-VMD on /mnt/pve/NAS-VMD type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=a.b.c.d,mountvers=3,mountport=x,mountproto=udp,local_lock=none,addr=a.b.c.d)

Boot time drive info for da0 and da2:

Code:

da0 at mps0 bus 0 scbus0 target 4 lun 0
da0: <ATA Samsung SSD 850 1B6Q> Fixed Direct Access SPC-3 SCSI device
da0: Serial Number S246NXAG602332H
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 953869MB (1953525168 512 byte sectors)
da0: quirks=0x8<4K>
GEOM_ELI: Device da0p1.eli created.
da2 at mps0 bus 0 scbus0 target 6 lun 0
da2: <ATA Samsung SSD 850 1B6Q> Fixed Direct Access SPC-3 SCSI device
da2: Serial Number S33FNCAH502074V
da2: 600.000MB/s transfers
da2: Command Queueing enabled
da2: 953869MB (1953525168 512 byte sectors)
da2: quirks=0x8<4K>
GEOM_ELI: Device da2p1.eli created.

The only difference seems to be that the da0 and da1 drives may come from a different lot of drives, despite being the same model and same firmware:

During a data migration activity via NFS from a couple of nodes, here's what the performance metrics look like -- first the "disk busy":

But as you can see, the same number of operations and amount of transfer is occurring across the disks:

Any thoughts? Would this higher "disk busy" percentage be holding me up? And if so, any ideas as to why da0 and da1 would be the only two drives showing double or triple the busy% for the same transactions?

Thanks,
-c

Stux · Sep 11, 2017

The drives are part of a separate mirror. They may be seeing more activity, or have in the past.

Or maybe they're warmer and throttling?

Fwiw, you may benefit from a high performance NVMe slog.

Arwen · Sep 11, 2017

Please post the outpout of zpool list -v for the affected pool.

ZFS will attempt to balance the writes so each mirror pair has about the same amount of blocks used.
That said, I don't know why it would only affect half of a mirror pair. (Or 2 mirror pairs.)

Stux · Sep 11, 2017

Arwen said:
Please post the outpout of zpool list -v for the affected pool.

ZFS will attempt to balance the writes so each mirror pair has about the same amount of blocks used.
That said, I don't know why it would only affect half of a mirror pair. (Or 2 mirror pairs.)

These being Evos, I thought it may have to do with the first mirror having recieved more writes in the past, and thus being closer to steady state (ie slow) than the rest of the drives.

Alternatively, is it possible the ZIL is only being written to the first vdev?

chuegen · Sep 11, 2017

Here is output of zpool list -v:

Code:

NAME									 SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
DATASSD								 3.62T   884G  2.76T		 -	15%	23%  1.00x  ONLINE  /mnt
  mirror								 928G   221G   707G		 -	15%	23%
	gptid/848c802e-9733-11e7-bd11-74867ae04eaa	  -	  -	  -		 -	  -	  -
	gptid/84c9bea5-9733-11e7-bd11-74867ae04eaa	  -	  -	  -		 -	  -	  -
  mirror								 928G   221G   707G		 -	15%	23%
	gptid/a474f984-9733-11e7-bd11-74867ae04eaa	  -	  -	  -		 -	  -	  -
	gptid/a4b1774d-9733-11e7-bd11-74867ae04eaa	  -	  -	  -		 -	  -	  -
  mirror								 928G   221G   707G		 -	15%	23%
	gptid/ba97e9eb-9733-11e7-bd11-74867ae04eaa	  -	  -	  -		 -	  -	  -
	gptid/bad8b254-9733-11e7-bd11-74867ae04eaa	  -	  -	  -		 -	  -	  -
  mirror								 928G   221G   707G		 -	15%	23%
	gptid/cd0b31ef-9733-11e7-bd11-74867ae04eaa	  -	  -	  -		 -	  -	  -
	gptid/cd4b50cf-9733-11e7-bd11-74867ae04eaa	  -	  -	  -		 -	  -	  -

All drives were new and installed at same time, although I tried a different configuration first -- 2 4-drive RAIDZ1 sets.... da0-da3 and da4-da7. da0-da1 still exhibited the same higher latency / disk busy %age.

Delete latency for da0-da1 seem to be 2-3x higher than the rest of the drives (da2-da7 averages 1.2 ms, da0-da1 averages over 3).

I find it odd that the two drives that are slow have serial number formats far different than the other 6 SSD's.

I have to admit, I'm not familiar with the help an NVMe slog might provide, I'll go do some reading.

Arwen · Sep 12, 2017

Well, that output removes a potential cause.

I think I agree that the 2 affected drives have something different internally. My own Sansung EVO 840 1TB started to get slower due to it being a multi-level type, with a firmware bug. Once the firmware was updated, it restored most, (but not all), of the speed.

PS: Note that Tri-Level, (and higher), flash devices basically use analog bit detection. Meaning a single cell is not just on or off. It could be in-between adding another bit. Constant reading, (but not over-writing), will cause these bits to loose a little power. Thus, the firmware has to account for this, and eventually re-write the block, (or relocate the block), to maintain reliability.

chuegen · Sep 12, 2017

Thank you for the guidance, it's what I was thinking too but was frustrating me. I originally thought "firmware", but they're all running the same level - and there appears to be no firmware upgrades available for the EVO series.

As for MLC, I just couldn't pass up the price for new 1TB EVO's ($300) and the application isn't a "money is no object" variety one.

chuegen · Sep 12, 2017

Update:
It does, indeed, appear to be the specific drives. I found a spare drive beginning with S/N prefix "S33", so I offlined da0, replaced the drive, resilvered, and started another data migration job. As you might expect, da0's "busy" percentage came in line with the rest of them... note that da1 is now the only drive that shows a high "disk busy" percentage.

rs225 · Sep 12, 2017

I would be amused by the output of sysctl -a | grep delete_method

chuegen · Sep 12, 2017

Code:

kern.cam.da.7.delete_method: ATA_TRIM
kern.cam.da.5.delete_method: ATA_TRIM
kern.cam.da.0.delete_method: ATA_TRIM
kern.cam.da.4.delete_method: ATA_TRIM
kern.cam.da.2.delete_method: ATA_TRIM
kern.cam.da.1.delete_method: ATA_TRIM
kern.cam.da.6.delete_method: ATA_TRIM
kern.cam.da.3.delete_method: ATA_TRIM

Stux · Sep 12, 2017

chuegen said:
I originally thought "firmware", but they're all running the same level - and there appears to be no firmware upgrades available for the EVO series.

Best way to check with Samsung drives us to run the drive (one of them) connected to windows and run Samsung Magician over it to check the firmware status.

Sounds like batch variation.... or a silent hardware revision :(

chuegen · Sep 12, 2017

That was fun - didn't expect it to be such a PITA to get this thing connected to a Windows machine. Nonetheless, Magician says it's running the latest firmware.

I suspect silent hardware revision, myself. Thanks for the help!

Important Announcement for the TrueNAS Community.

Samsung 850 EVO: "disk busy" weirdness?

chuegen

Cadet

Stux

MVP

Arwen

MVP

Stux

MVP

chuegen

Cadet

Arwen

MVP

chuegen

Cadet

chuegen

Cadet

rs225

Guru

chuegen

Cadet

Stux

MVP

chuegen

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Samsung 850 EVO: "disk busy" weirdness?

Cadet

MVP

MVP

MVP

Cadet

MVP

Cadet

Cadet

Guru

Cadet

MVP

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Samsung 850 EVO: "disk busy" weirdness?"

Similar threads