Gut feeling something is wrong

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
Hi all,

My system is:
Supermicro 2U server with TrueNAS Core 13.0U4 on it.
Twin Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz.
256Gb ECC RAM
2 x ATA SAMSUNG MZ7L3240 SSDs for the OS
12 x Seatage EXOS ATA ST6000NM021A-2R7 6Tb disks in 5 x VDEVs and 2 hot spares
2 x Radian RMS-200s in a ZIL mirror
4 x 10Gig NICs, but only 2 of them connected, to two subnets on a 10-gig switch with jumbo frames of 9000.

I have one 17.8 TiB storage pool, and one Zvol on that with ZFS dedupe off, LZ4 compression, sync=always (because of the RMS-200s)

The TrueNAS is sat in a datacentre with 6 x Dell servers connecting to it and using it as shared storage for VMware. Each of those servers has 2 x 10-gig NICs and enough other resources to be well-specced (e.g. twin Intel Xeon Gold CPUs, 256-320Gb RAM each) and I'm using it for about 80 VMs (most of which just sit there all day and don't consume disk)

I benchmarked the system before I installed it with a Crystal DiskMark and got 1012.29 MB/s read and 1172.90 MB/s write, which I thought was decent given it's the speed of a single 10-gig NIC and i'd not yet set up iSCSI multipathing on it at that point.

It's now been about a year, and i'm wondering if i'm either overloading the system with connections or doing something wrong, because I'm sat here watching a Veeam replication task on a single SQL server from the TrueNAS to a separate Synology (sat in RAID10 with a 10-gig card) operate at 16MB/s. I know the Synology is fine because I've just watched another Veeam backup job from local disk on one of the servers start after this one and operate at 244MB/s to the same Synology (the 16MB/s didn't change during that time so it's not as though this other job swamped the Synology).

I guess what I'm asking is what is the best route to attempt troubleshooting this from? i.e. which steps to try first?

The reports from TrueNAS say that their disks are on average 15% "Busy" each disk, and the RMS-200 is on average 1% busy. They also say that disk reads are on average 6ms with writes averaging 1ms.

Is there something wrong?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
How full is the pool?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
10 disks in 5 vdevs in the pool = 26 or so TiB
1 zvol of 17.8TB = 68% or thereabouts. In a perfect world that would be 50% but I doubt it makes that much difference

Where does your 35% come from?

How full is the zvol?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Any encyption? How is the fragmentation of the pool?
Actually, please post the output of zpool list between [CODE] and [/CODE].
 

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
10 disks in 5 vdevs in the pool = 26 or so TiB
1 zvol of 17.8TB = 68% or thereabouts. In a perfect world that would be 50% but I doubt it makes that much difference

Where does your 35% come from?

How full is the zvol?
It actually says 34% on the dashboard.
There’s about 9.5Tb of data on the whole system.
 

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
Any encyption? How is the fragmentation of the pool?
Actually, please post the output of zpool list between [CODE] and [/CODE].
Code:

root@truenas[~]# zpool list
NAME              SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
Colit Disk Pool  27.3T  9.15T  18.1T        -         -    46%    33%  1.00x    ONLINE  /mnt
boot-pool         206G  7.56G   198G        -         -     0%     3%  1.00x    ONLINE  -
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
That fragmentation value is higher than normal.
Also, in what kind of layout are the VDEVs organized? For this type of operations it's usually suggested the use of mirrors (see resource below).

What we are seeing from the zpool list output doesn't seem to match your understanding of the system...There is no 17.8 TiB storage pool.
 
Last edited:

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
That fragmentation value is higher than normal.
Also, in what ind of layout are the VDEVs organized? For this kind of operations it's usually suggested the use of mirrors (see resource below).

What we are seeing from the zpool list output doesn't seem to match your understanding of the system...There is no 17.8 TiB storage pool.

The VDEVs are in 5 mirrors.
 

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
...and apologies, 17.97TiB *FREE* pool, a total of ~26TiB it seems.
 

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
I'm now wondering whether i'm a candidate for a big L2ARC served by an add-in PCIe card with some NVMe Optanes?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Do you have any (what I call) "swing" storage. Anywhere else you can put the VM's whilst you erase the pool, recreate the zvol and then move the VM's back?
Are the VM's thick or thin provisioned (or a mix - just to make things complex)?
Is the zvol sparse or thick provisioned?
What block size did you use?

Is there anything else on the pool other than the zvol?

My thoughts -
The VM Hosts - each have 2 * 10Gb - each presumably connected to its own subnet. Is this multipath - so you could technically have 20Gb of traffic coming into the storage server. In which case you don't have enough SLOG space (20Gb/s in 5 seconds = 100Gb = 12.5GB) as you have only 8GB in your RMS200 mirror I think. Wether this is an actual issue is another matter.

No other ideas at this stage
 
Last edited:

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
Do you have any (what I call) "swing" storage. Anywhere else you can put the VM's whilst you erase the pool, recreate the zvol and then move the VM's back?
Are the VM's thick or thin provisioned (or a mix - just to make things complex)?
Is the zvol sparse or thick provisioned?
What block size did you use?

Is there anything else on the pool other than the zvol?

My thoughts -
The VM Hosts - each have 2 * 10Gb - each presumably connected to its own subnet. Is this multipath - so you could technically have 20Gb of traffic coming into the storage server. In which case you don't have enough SLOG space (20Gb/s in 5 seconds = 100Gb = 12.5GB) as you have only 8GB in your RMS200 mirror I think. Wether this is an actual issue is another matter.

No other ideas at this stage
I do have a second one of these boxes, almost identical but with 12Tb disks, that's less used and with double the free space, so could move them but that'd be a very long migration task.

The VMs are almost exclusively thin provisioned (unless I missed that configuration at VM creation time).

How do I check whether the zvol is sparse? Deduplication is definitely off.

Record Size is set to 128K

There's nothing else on the pool other than the zvol.

The iSCSI is multipath, and yes I could technically have 20Gbit/s of traffic coming in. I know the RMS-200s are only 8Gb in size, but they were cheap and fast.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
"How do I check whether the zvol is sparse? Deduplication is definitely off."
I don't actually know - and my google fu has failed to find out.
 

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
"How do I check whether the zvol is sparse? Deduplication is definitely off."
I don't actually know - and my google fu has failed to find out.
I think you can't have a sparse zvol from my googling.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Code:
zfs get -H -o value volblocksize "BigPool/iSCSI/truenas.hdd"
 

Colit

Dabbler
Joined
Oct 2, 2023
Messages
37
Code:
zfs get -H -o value volblocksize "BigPool/iSCSI/truenas.hdd"
Where am I to get my version of what should go in the quotes there? as I tried "/mnt/Colit Disk Pool" and it didn't barf but just returned a -
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I think you can't have a sparse zvol from my googling.

Run zfs get refreservation - if the value is none then it is likely a sparse zvol.

Where am I to get my version of what should go in the quotes there? as I tried "/mnt/Colit Disk Pool" and it didn't barf but just returned a -

You can just run zfs get volblocksize and it will dump a table of the results, but I would assume that they will mostly be 16K.
 
Top