Space allocation across vdevs

richardm1

Dabbler
Joined
Oct 31, 2013
Messages
19
Code:
root@freenas[~]# zpool list -v rustpool1
NAME                                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rustpool1                               4.71T  2.27T  2.44T        -         -     4%    48%  1.00x  ONLINE  /mnt
  mirror                                3.62T  1.19T  2.43T        -         -     3%    32%
    gptid/74276415-d265-11ea-97fd-0050569be033      -      -      -        -         -      -      -
    gptid/743d9644-d265-11ea-97fd-0050569be033      -      -      -        -         -      -      -
  mirror                                1.09T  1.08T  10.9G        -         -    11%    99%
    gptid/2f43b371-daab-11ea-9716-000c29a8255f      -      -      -        -         -      -      -
    gptid/2f4c080a-daab-11ea-9716-000c29a8255f      -      -      -        -         -      -      -


The larger vdev is the original from when the pool was created. The smaller vdev was added later on. Pretty much everyone everywhere for all eternity have been hammering home the concept that writes are allocated based on the vdev's free space relative to its size (e.g. in a pool with vdev "a" having 4TB free and vdev "b" having 1TB free, vdev "a" will see 4x the writes).

Well, that didn't happen here. The good news is writes to the smaller vdev have stopped and the system is working fine. What knob did I twiddle that broke the space allocator?
 

Alex_K

Explorer
Joined
Sep 4, 2016
Messages
64
Well it's good to hear that that system is still working after smaller disks are filled to the brim.

Because I have pretty much the same situation coming my way, but worse.

My configuration is as follows:
Code:
ESXi: HP DL380p Gen8 12LFF / 2xE5-2630v2 / 96G RAM / 2xPSU / 2x10G 530FLR-SFP+ (Broadcom) / 4x1G NC365T (Intel)
Boot + datastore1:  SATA SSD Intel DC S3110 512GB
FreeNAS VM: passthrough Dell H310 IT P20 2xSeagate_4T_ST4000NM005A + 2xWD_0.5T_WD5003ABYX RAID-10 / 20G RAM VMXNET 3

and zpool:
Code:
root@IT-H3-FreeNAS:/ # zpool list -v h3-p0
NAME                                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
h3-p0                                   4.07T   709G  3.38T        -         -    24%    17%  1.00x  ONLINE  /mnt
  mirror                                3.62T   314G  3.32T        -         -    19%     8%
    gptid/8f323fd9-8517-11eb-8576-000c29a145be      -      -      -        -         -      -      -
    gptid/90431fa0-8517-11eb-8576-000c29a145be      -      -      -        -         -      -      -
  mirror                                 460G   395G  65.0G        -         -    73%    85%
    gptid/916179fc-8517-11eb-8576-000c29a145be      -      -      -        -         -      -      -
    gptid/928fcce3-8517-11eb-8576-000c29a145be      -      -      -        -         -      -      -

Only I think in our system small vdev were added same time as the bigger one. Creation were performed using GUI (legacy GUI because I can't stand new one, but thats differnt topic):
root@IT-H3-FreeNAS:/ # zpool history h3-p0 | grep -v "auto-" | grep -v "scrub" | grep -v zv0
History for 'h3-p0':
2021-03-15.00:49:46 zpool create -o cachefile=/data/zfs/zpool.cache -o failmode=continue -o autoexpand=on -O compression=lz4 -O aclmode=passthrough -O aclinherit=passthrough -f -m /h3-p0 -o altroot=/mnt h3-p0 mirror /dev/gptid/8f323fd9-8517-11eb-8576-000c29a145be /dev/gptid/90431fa0-8517-11eb-8576-000c29a145be mirror /dev/gptid/916179fc-8517-11eb-8576-000c29a145be /dev/gptid/928fcce3-8517-11eb-8576-000c29a145be
2021-03-15.00:49:51 zfs inherit mountpoint h3-p0
2021-03-15.00:49:51 zpool set cachefile=/data/zfs/zpool.cache h3-p0
2021-03-15.00:51:18 zfs set quota=none h3-p0
2021-03-15.00:51:19 zfs set refquota=none h3-p0
2021-03-15.00:51:19 zfs set reservation=none h3-p0
2021-03-15.00:51:19 zfs set refreservation=none h3-p0
2021-03-15.00:51:19 zfs set org.freenas:description= h3-p0
2021-03-15.00:51:20 zfs inherit sync h3-p0
2021-03-15.00:51:20 zfs set compression=lz4 h3-p0
2021-03-15.00:51:20 zfs set atime=off h3-p0
2021-03-15.00:51:20 zfs inherit dedup h3-p0
2021-03-15.00:51:20 zfs inherit recordsize h3-p0
2021-03-15.00:51:21 zfs inherit readonly h3-p0
2021-03-15.00:51:21 zfs inherit exec h3-p0
2021-03-15.00:51:26 zfs set aclmode=passthrough h3-p0
2021-03-15.20:05:11 <iocage> zfs set org.freebsd.ioc:active=yes h3-p0
2021-03-15.20:15:53 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 14523266727040353718
2021-03-15.20:15:53 zpool set cachefile=/data/zfs/zpool.cache h3-p0
2021-03-18.18:25:09 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 14523266727040353718
2021-03-18.18:25:09 zpool set cachefile=/data/zfs/zpool.cache h3-p0
2021-04-21.19:35:26 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 14523266727040353718
2021-04-21.19:35:26 zpool set cachefile=/data/zfs/zpool.cache h3-p0
2021-06-10.23:38:00 zpool import -c /data/zfs/zpool.cache.saved -o cachefile=none -R /mnt -f 14523266727040353718
2021-06-10.23:38:00 zpool set cachefile=/data/zfs/zpool.cache h3-p0

And small vdev received substantially More writes then bigger one

In the first few months 500G disks had absolutely more then twice "allocated" G - and writes - then 4T disks, with a thing to note that now, numbers are closer to equal (though only in G, not in %)

Configuration were made out of curiosity and then kept, because it shown considerably better performance then pure 2x4T disk zpool
Now however... that 73% fragmentation on smaller vdev seems to me very unhealthy. And we host VMs there.

So question is, how to make it real
that writes are allocated based on the vdev's free space
in our with topic starter configuration?
 
Top