lopr
Explorer
- Joined
- Mar 19, 2015
- Messages
- 71
Hello, I am a bit lost on whats wrong with my pool (mirror) consisting of two 250GB SSDs with replication on.
This pool started as a single disk 120GB SSD for testing purposes and I added a 250GB disk to make it a mirror with the help of this post by Duran, checked that autoexpand=on and replaced the old disk with another 250GB SSD to make the pool bigger. Deduplication was on for this pool long before I expanded it. So far so good.
In the past I never experienced any problems with dedup, I was happy about the gained extra space.
Now I wanted to see how the pool is doing in terms of deduplication when I ran into some problems:
that's how I usually checked on deduplication ration and how much data is stored there.
Now I wanted to have a closer look at the dedup table:
well, I have no idea what exactly LSIZE, PSIZE and DSIZE (LSIZE = size after decompression, PSIZE = pysical size on disc, DSIZE = ?) are but I expected one column in the Total line will correspond to the allocated space ALLOC of zpool list and one value will with the used space USED of zfs list, well that's not the case. If I calculate Total referenced DSIZE / Total allocated DSIZE = ~1,46 voila, I get the dedup ratio that zpool list is giving me. Maybe I'm not getting it right, so I checked with zdb:
which gives me the same values (the copies > 1 are another riddle)
while reading man zdb I thought I might try zdb -b to display statistics regarding the number, size (logical, physical and allocated) and deduplication of blocks.
that's not comforting.. but what does that mean?
compression and deduplication values are quite different here, should they be the same as with the -D option?
I read that leaked space is normal on active pools so
I also tried zdb -c to verify the checksum of all metadata blocks
I did a scrub > no errors.
data-wise the pool seems ok, all jails are running without any hickups but I am a tad uneasy..
so my questions:
1. can I zfs-send the pool to my data-pool, destroy and rebuild the SSD-jails pool and send it back or will this copy the corrupt metadata along?
2. is the DDT table correct and is it just me, not reading it correctly or is it borked because of this leaked.. stuff?
3. any other suggestions to fix this?
Right now I am doing a smarttest on both disks and will then export the pool to see if there's any difference..
edit: you can see the configuration of my config in my signature
This pool started as a single disk 120GB SSD for testing purposes and I added a 250GB disk to make it a mirror with the help of this post by Duran, checked that autoexpand=on and replaced the old disk with another 250GB SSD to make the pool bigger. Deduplication was on for this pool long before I expanded it. So far so good.
In the past I never experienced any problems with dedup, I was happy about the gained extra space.
Now I wanted to see how the pool is doing in terms of deduplication when I ran into some problems:
that's how I usually checked on deduplication ration and how much data is stored there.
Code:
~# zfs list SSD-jails NAME USED AVAIL REFER MOUNTPOINT SSD-jails 129G 124G 21.3M /mnt/SSD-jails ~# zpool list SSD-jails NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT SSD-jails 236G 98.6G 137G - 57% 41% 1.46x ONLINE /mnt
Now I wanted to have a closer look at the dedup table:
Code:
~# zpool status -D SSD-jails pool: SSD-jails state: ONLINE scan: scrub repaired 0 in 0h14m with 0 errors on Fri Aug 11 14:33:25 2017 config: NAME STATE READ WRITE CKSUM SSD-jails ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/22e08b06-7d1d-11e7-a6cf-d050995176b8 ONLINE 0 0 0 gptid/acb41f25-7d12-11e7-96fb-d050995176b8 ONLINE 0 0 0 errors: No known data errors dedup: DDT entries 3788677, size 536 on disk, 173 in core bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 3.03M 143G 43.4G 48.2G 3.03M 143G 43.4G 48.2G 2 301K 7.52G 4.30G 4.88G 662K 16.1G 9.18G 10.5G 4 120K 2.21G 1.06G 1.33G 632K 11.0G 5.23G 6.69G 8 131K 981M 471M 805M 1.33M 9.75G 4.66G 8.11G 16 47.0K 132M 59.1M 216M 807K 2.45G 1.02G 3.64G 32 1.41K 51.2M 15.4M 18.7M 56.8K 2.18G 687M 813M 64 300 17.9M 5.17M 5.68M 25.8K 1.59G 378M 421M 128 51 2.70M 302K 396K 8.52K 447M 42.2M 58.3M 256 37 915K 46.5K 148K 12.0K 382M 17.4M 47.9M 512 19 1.63M 56K 76K 11.5K 982M 33.3M 46.0M 1K 3 130K 5K 12K 4.43K 244M 8.83M 17.7M 2K 6 259K 90.5K 104K 17.9K 774M 270M 311M 4K 1 512 512 4K 6.46K 3.23M 3.23M 25.9M 8K 3 1.50K 1.50K 12K 34.9K 17.4M 17.4M 140M 16K 2 1K 1K 8K 47.2K 23.6M 23.6M 189M 32K 3 1.50K 1.50K 12K 152K 76.2M 76.2M 609M 64K 4 2K 2K 16K 409K 205M 205M 1.60G Total 3.61M 154G 49.3G 55.4G 7.18M 189G 65.2G 81.3G
well, I have no idea what exactly LSIZE, PSIZE and DSIZE (LSIZE = size after decompression, PSIZE = pysical size on disc, DSIZE = ?) are but I expected one column in the Total line will correspond to the allocated space ALLOC of zpool list and one value will with the used space USED of zfs list, well that's not the case. If I calculate Total referenced DSIZE / Total allocated DSIZE = ~1,46 voila, I get the dedup ratio that zpool list is giving me. Maybe I'm not getting it right, so I checked with zdb:
Code:
~# zdb -U /data/zfs/zpool.cache -D SSD-jails DDT-sha256-zap-duplicate: 615627 entries, size 579 on disk, 187 in core DDT-sha256-zap-unique: 3172603 entries, size 528 on disk, 170 in core dedup = 1.47, compress = 2.90, copies = 1.25, dedup * compress / copies = 3.41
which gives me the same values (the copies > 1 are another riddle)
while reading man zdb I thought I might try zdb -b to display statistics regarding the number, size (logical, physical and allocated) and deduplication of blocks.
Code:
~# zdb -U /data/zfs/zpool.cache -b SSD-jails Traversing all blocks to verify nothing leaked ... loading space map for vdev 0 of 1, metaslab 228 of 236 ... leaked space: vdev 0, offset 0x3906662000, size 16384 leaked space: vdev 0, offset 0x3906395000, size 4096 [... many more of these ... ] leaked space: vdev 0, offset 0x3905f00000, size 8192 leaked space: vdev 0, offset 0x3905978000, size 4096 leaked space: vdev 0, offset 0x3903f63000, size 4096 block traversal size 18446744069241384960 != alloc 105870340096 (leaked 110338506752) bp count: 2339875 ganged count: 0 bp logical: 40618324480 avg: 17359 bp physical: 14964867072 avg: 6395 compression: 2.71 bp allocated: 23318990848 avg: 9965 compression: 1.74 bp deduped: 27787157504 ref>1: 615625 deduplication: 2.19 SPA allocated: 105870340096 used: 41.78% additional, non-pointer bps of type 0: 221297 Dittoed blocks on same vdev: 558221
that's not comforting.. but what does that mean?
compression and deduplication values are quite different here, should they be the same as with the -D option?
I read that leaked space is normal on active pools so
I also tried zdb -c to verify the checksum of all metadata blocks
Code:
~# zdb -U /data/zfs/zpool.cache -c SSD-jails Traversing all blocks to verify metadata checksums and verify nothing leaked ... loading space map for vdev 0 of 1, metaslab 228 of 236 ... 57.7M completed ( 57MB/s) estimated time remaining: 0hr 29min 17sec zdb_blkptr_cb: Got error 122 reading <0, 0, 0, 2f> -- skipping zdb_blkptr_cb: Got error 122 reading <0, 1534, 0, 1> -- skipping zdb_blkptr_cb: Got error 122 reading <0, 1535, 3, 0> -- skipping zdb_blkptr_cb: Got error 122 reading <0, 1535, 2, 0> -- skipping zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, 0> -- skipping zdb_blkptr_cb: Got error 122 reading <0, 1535, 0, 0> -- skipping 101M completed ( 25MB/s) estimated time remaining: 1hr 06min 33sec zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, 81> -- skipping 131M completed ( 21MB/s) estimated time remaining: 1hr 18min 26sec zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, be> -- skipping zdb_blkptr_cb: Got error 122 reading <0, 1535, 0, 17d0> -- skipping 145M completed ( 20MB/s) estimated time remaining: 1hr 22min 17sec zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, e7> -- skipping zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, e8> -- skipping ^C
I did a scrub > no errors.
data-wise the pool seems ok, all jails are running without any hickups but I am a tad uneasy..
so my questions:
1. can I zfs-send the pool to my data-pool, destroy and rebuild the SSD-jails pool and send it back or will this copy the corrupt metadata along?
2. is the DDT table correct and is it just me, not reading it correctly or is it borked because of this leaked.. stuff?
3. any other suggestions to fix this?
Right now I am doing a smarttest on both disks and will then export the pool to see if there's any difference..
edit: you can see the configuration of my config in my signature
Last edited: