Migrating from FreeNAS 11.3U5 to TrueNAS Scale 22.12 - I lost any compression on videos with Scale

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
I am migrating my tomb of stuff from my current FreeNAS 11.3U5 to my new NAS running the latest TrueNAS Scale 22.12. I got quite excited seeing information like in this thread showing how good ZSTD compression method is and how I can squeeze little more from my data vs. LZ4 I have used on FreeNAS. So excited...

However, soon I noticed that folders with videos are hardly compressed, if any. Something didn't seem right as I could swear same data on my FreeNAS have a decent compression (some 5%) and I was right. Here's an example of two identical folders

1676814845318.png


the result on the right is from Scale and it is about the same for both, LZ4 and ZSTD. The one on left is my LZ4-compressed FreeNAS storage. No de-duplication in either of cases. Additionally, this compression ratio of about 5% is pretty consistent with video files, e.g. a parent folder of the above is 20TB and reporting only 18.9TB of the size on disk.

Is there something I am missing? Could that be just a reporting error in Windows? A different LZ4 compression level?
 
Joined
Oct 22, 2019
Messages
3,641
I wouldn't use Windows Explorer to determine compression efficiency.

It's more accurate to run the following commands on the server itself:

Code:
zfs list -t filesystem -o name,compress,compressratio,recordsize oldpool/dataset newpool/dataset

Code:
du -hs /mnt/oldpool/dataset /mnt/newpool/dataset

Code:
du -hs --apparent-size /mnt/oldpool/dataset /mnt/newpool/dataset


You may have to use sudo / root to get accurate numbers from the "du" commands.
 
Last edited:

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
I checked the size of folders and they match Windows Explorer's data:

FreeNAS - 96GB
TrueNAS - 101GB

this doesn't seem like a reporting error. As for the `zfs list` commands, both datasets on both machines report 1.00x compression ratio.
 
Joined
Oct 22, 2019
Messages
3,641
What were the outputs of all the above commands on the server itself?

EDIT: Are these on different pools, or the same pool on different datasets? Same underlying vdev structure?
 

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
There is one difference and that is the record size. Old FreeNAS uses 1M, the new TrueNAS Scale 128K.
Otherwise, as for the setup:
- the old FreeNAS is a single RaidZ2 Vdev made of 8x4TB drives - lz4 compression, no deduplication
- the new TrueNAS scale is a single RaidZ1 Vdev made of 5x18TB drives - lz4 compression, no deduplication

As for the outputs for each (FreeNAS doesn't :

FreeNAS (compressing videos):
Code:
du -hs /mnt/.../.../...
96G    /mnt/.../.../...
du -s /mnt/.../.../...
100274181    /mnt/.../.../...
du -As /mnt/.../.../...
105565518    /mnt/.../.../...
zfs list -t filesystem -o name,compress,compressratio,recordsize .../...
NAME    COMPRESS   RATIO  RECSIZE
.../... lz4        1.00x  1M

TrueNAS Scale:
Code:
du -hs /mnt/.../.../...
101G    /mnt/.../.../...
du -s /mnt/.../.../...
105473655 /mnt/.../.../...
du -As /mnt/.../.../...
105565507    /mnt/.../.../...
zfs list -t filesystem -o name,compress,compressratio,recordsize .../...
NAME    COMPRESS  RATIO  RECSIZE
.../... lz4       1.00x  128K

EDIT: Added outputs from du with -A argument (FreeBSD didn't recognize --apparent-size)
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
The differences in recordsize and vdev layout will yield differences in disk usage and overhead.

You can clearly see that neither dataset is effectively compressing the records, since compressed video streams are terrible at additional lossless compression. Neither ZSTD nor LZ4 is going to make much of a difference for this type of data. (It doesn't hurt to leave compression enabled, since there's almost zero overhead, plus it allows compression for future "compressible" files.

So the differences you're seeing in disk usage is likely due to a combination of differences in recordsize + vdev.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
EDIT: Added outputs from du with -A argument (FreeBSD didn't recognize --apparent-size)
FreeBSD? I thought both these pools were on your SCALE server? As in, you imported your older pool (from Core) into your SCALE server?
 

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
Is the compression ratio value indicative of anything? I ran that against the whole dataset and there are tons of other files, too, which are compressed effectively.
Also, considering that my original NAS is running its dataset at 1M record size, that would lead me to believe it's results should be even worse as any small file will consume at least 1MB? I am comparing other folders (usually mixed content) and the LZ4 on the old NAS is always doing better (even compared to using ZSTD on the new TrueNAS Scale).

FreeBSD? I thought both these pools were on your SCALE server? As in, you imported your older pool (from Core) into your SCALE server?
Not at all, both servers are separate and active, I have datasets from both mapped to my PC and comparing files directly and moving stuff around and comparing the folder sizes.
 
Joined
Oct 22, 2019
Messages
3,641
Also, considering that my original NAS is running its dataset at 1M record size, that would lead me to believe it's results should be even worse as any small file will consume at least 1MB?
It's the opposite.

Larger recordsize means less overhead. Fewer records are needed to comprise a file. Fewer checksums and fewer pointers. (This is especially true for multimedia files.)

A single file smaller than the recordsize of 1 MiB (say only 64 KiB large) will be stored as a special small record.

As for the "end" of a larger file, in which the last section is smaller than the recordsize? Compression makes this a non-issue, since any dataset with any form of compression enabled will treat this "slack" space as an empty hole. (An incompressible file that is 1.1 MiB with a dataset using a 1M recordsize, will be stored as two records, yet only take up 1.1 MiB of disk space. 1 MiB + 0.1 MiB. (The latter record won't be an additional 1 MiB just because the recordsize is 1 MiB, as long as any compression is enabled on the dataset. Such "slack" at the end will not be stored physically, since it's basically "nothing".)

Where smaller recordsizes shine is when you do a lot of "in-place" editing of existing files directly on the server itself. You minimize the impact of "write amplification". This also has implications for keeping many snapshots in this type of usage (lots of "in-place" editing.) However, for storage and archiving, it's a moot point.

Another case for smaller recordsizes is to match database software writes and modifications.


EDIT: Larger recordsizes also yield better throughput performance for sequential reads, such as streaming videos and opening up large photos.
 
Last edited:

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
I see, clever. In any case, I continue moving over other folders in various capacities (mostly all mixed types of files) and comparing sizes between Windows Explorer and du -hs and du -hA (or -h --apparent-size) and the estimates are basically the same, including differences between an uncompressed size and a compressed value.

And also as for your earlier question, apologies, I noticed I didn't specifically say that I am comparing compressed/uncompressed sizes reported from both servers' separate storage. I switched Scale back to ZSTD (copying folders onto Scale after switching the compression) and as I continue looking over this, then here's what I get:

FolderUncompressed sizeFreeNAS 11.3U5 w/ LZ4TrueNAS Scale 22.12 w/ ZSTD (3)
Folder 1160 MB104 MB111 MB
Folder 2104 MB90 MB94.1 MB
Folder 3270 MB257 MB270 MB
Folder 4346 MB322 MB338 MB
Folder 51.42 GB1.34 GB1.38 GB
Folder 6 (16 MB Lorem Ipsum text)15.8 MB290 KB825 KB

The last folder was literally just me creating a dummy text file made of LI to demonstrate the best compression scenario. Even in this scenario where ZSTD should have excelled it performed considerably worse. I confirmed the above values using du on both servers and all checks out.
 
Joined
Oct 22, 2019
Messages
3,641
Both using the same recordsize? One using LZ4, the other ZSTD?

Remember that existing files remain as-is, even if you later change the dataset properties (such as recordsize and compression.)
 

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
That is correct, I would first change the compression (on the new server) of the dataset and then start copying folders onto it.

The record size is still as previously mentioned. 1MB on the old server (FN), 128K on the new server (TS). You mentioned the different overhead and the vdev configuration, do you reckon that is responsible? If so, are there any information I could read on to get in some sort of an understanding? Maybe to see what to do to get results like on my old server - those savings are huge. As the most of files are video files, I could imagine getting exponentially more savings so, 1TB of 20TB total size of the old FreeNAS dataset could be then 3TB of 60TB dataset on TS (if I were able to achieve that compression).
 
Joined
Oct 22, 2019
Messages
3,641
Here's a dataset I made on TrueNAS Core 13.0-U3.1, in which the recordsize remains 1 MiB, but the only change is the compression (LZ4 vs ZSTD) when I saved the 16MB "lip.txt".

They are both the exact same size (since they are the exact same file, just with a different name.)

"lip-lz4.txt" I saved to the dataset when the compression was set to LZ4.

"lip-zst.txt" I saved to the dataset when the compression was set to ZSTD.

The real/logical size of the file is 16MB, but look at the physical space used:
Code:
% du -hs lip-*
273K    lip-lz4.txt
 77K    lip-zst.txt

% du -hsA lip-*
 16M    lip-lz4.txt
 16M    lip-zst.txt
 
Joined
Oct 22, 2019
Messages
3,641
The record size is still as previously mentioned. 1MB on the old server (FN), 128K on the new server (TS).
Not only does a recordsize of 1MiB use less overhead, but it also allows better efficiency for the inline compression operations. ZFS inline compression works against an entire "record" of pre-compression data. In this case, it's 1 MiB vs 128 KiB.

Before writing to disk, shrinking a single compressible record of 1 MiB is more efficient than shrinking eight separate records of 128 KiB.

Take a look at this:
Code:
% du -hs lip-*
273K    lip-lz4.txt
 77K    lip-zst.txt
533K    lip-zst-128k.txt

Code:
% du -hsA lip-*
 16M    lip-lz4.txt
 16M    lip-zst.txt
 16M    lip-zst-128k.txt

The same 16 MiB file, the same ZSTD compression, but I saved this third file (lip-zst-128.txt) when I set my dataset's recordsize to 128 KiB. (Previously the recordsize was set to 1 MiB.)



If so, are there any information I could read on to get in some sort of an understanding?
Unfortunately, ZFS documentations and discussions are highly entrenched into esoteric and "engineering" discussions.

The FreeBSD Journal has some nice write-ups about ZFS.

There are some discussions on Reddit that you might find helpful.

For example:

("Part B" of his comment is what you might find most useful in your case.)



EDIT: Somewhat off-topic, but my personal rule of thumb is that for general use, mixed files, storing, archiving, multimedia, etc, a 1 MiB recordsize is the best place to start. Once you get into frequent "in-place editing" and database software, that's when you start fine-tuning your datasets' recordsize with more nuance and granularity.
 
Last edited:

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
I did some more testing, too

Text file compression:
15.8MB (uncompressed)
290KB (FreeNAS w/ LZ4, 1M record size)
825KB (TrueNAS Scale w/ ZSDT, 128K record size)
825KB (TrueNAS Scale w/ LZ4, 128K record size)
115KB (TrueNAS Scale w/ ZSDT, 1M record size)
218KB (TrueNAS Scale w/ LZ4, 1M record size)

here it actually works and it works great with the 1M record size, but...

Video file compression:
2.86GB (uncompressed)
2.72GB (FreeNAS w/ LZ4, 1M record size)
2.86GB (TrueNAS Scale w/ ZSDT, 128K record size)
2.86GB (TrueNAS Scale w/ LZ4, 128K record size)
2.86GB (TrueNAS Scale w/ ZSDT, 1M record size)
2.86GB (TrueNAS Scale w/ LZ4, 1M record size)

I wonder how to explain this and what to do to replicate this on TrueNAS Scale. Additionally, I went ahead and got little of rails

2.86GB (TrueNAS Scale w/ ZSDT-19, 1M record size) - damn, 75% CPU usage for almost a minute
2.86GB (TrueNAS Scale w/ ZSTD-FAST-1000, 1M record size)
2.86GB (TrueNAS Scale w/ GZIP-9, 1M record size)

it almost looks like this must be a bug, or maybe ZFS doing something differently? Also, I went back to testing folders with mixed content and 1M record size is definitely performing better than 128K record size, but it is still somewhere between that and my old FreeNAS. Likely the content not possible to compress on TrueNAS, but good to compress on FreeNAS vs. files great to compress on TrueNAS, but not large on their own to catch up sufficiently to FreeNAS.
 
Joined
Oct 22, 2019
Messages
3,641
Because you have different vdev structures between your two systems. It's not comparing apples to apples.

Your large video file is not being compressed in either system. The "LZ4 on FreeNAS" is a red herring. It's a different pool, different vdev type, different number of member disks.

You've already seen that when you take compression out of the picture, a 1 MiB recordsize is already more efficient, and when you introduce compression, it continues to yield better results. (Performance-wise and less overhead.)

So whether or not the types of files being stored are compressible, you're seeing better results with 1 MiB recordsize. (You just see less additional benefit when the files are not compressible.)


2.86GB (TrueNAS Scale w/ ZSDT-19, 1M record size) - damn, 75% CPU usage for almost a minute
Why would you enable ZSTD-19 for a dataset that deals with (uncompressible) multimedia files? It should either be left at LZ4 or ZSTD-1. In fact, if you know ahead of time that a dataset will be dealing with mostly uncompressible data, then opt for LZ4 instead of any level of ZSTD. LZ4 has an "early abort" feature that I don't believe has been implemented with ZSTD yet. (I say "yet", because even back in 2020 they made plans to support this with ZSTD.)


Try not to get caught digging yourself into a rabbit hole. :wink: Keep it simple.
  • Mostly multimedia files?
    • LZ4 + 1 MiB recordsize
  • General purpose, mixed files, at least some decent "compressibility"?
    • ZSTD-3 (or LZ4) + 1 MiB recordsize
  • VMs, database software, etc?
    • Tune it to write/modification patterns
 
Last edited:

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
Thank you for all the information and I don't want to come out blunt, but how can having 8x4TB drives vs. 5x18TB drives make data which shouldn't be compressible compressible?

Also, as for the ZSTD uncompressible data detection, if I observed it all correctly, then yes, it doesn't currently do that - when I was testing ZSTD-19 and considering how long the drives were seeking and CPU at high usage, I believe it didn't skip it (this was including me just copying a single video file over).
 
Joined
Oct 22, 2019
Messages
3,641
how can having 8x4TB drives vs. 5x18TB drives make data which shouldn't be compressible compressible?
There's no "compression" occurring in this case. Compression is a red herring.

I'd wager that there's a difference in combination of "reporting" and how a file's data is laid out across a vdev between your two servers. (Parity and width.)

Or it could even be that FreeNAS 11.3-U5 is not accurately reporting the true file size. On your client (Windows? Linux?) How many bytes does it report the file size? Not the disk usage of the file, but the file size itself.)
 

ohboi

Dabbler
Joined
Mar 23, 2019
Messages
26
I see, here's an example of one selected video file. The first value is the uncompressed (apparent) size, the second is the compressed (size on disk) size:

FreeNAS:
3 188 519 (du -As)
3 028 228 (du -s)
it should be in KiB (-k argument produces identical numbers)

Windows:
3 265 043 376 bytes ~ 3 188 519 KiB (Size)
3 100 904 960 bytes ~ 3 028 228 KiB (Size on disk)

so as far as I can tell, identical.

Could this be related to this topic? u/dlavigne mentioned that zfs list should be used to check the size, so I tried to look at it and I checked it as well;
zfs list reports that the whole dataset uses 19.2 TiB (EDIT: using -p argument I got 21081213744864 bytes ~ 19.17325 TiB),
du -s against the whole dataset reports 20587083035 KiB ~ 19.173 TiB,
Windows reports 21080998749184 bytes ~ 19.173 TiB.

It looks like it might be correct. Windows and du are then reporting the total size (uncompressed) at some 20.203 TiB, which seems to stay true to that ~5% compression estimate (video files make some 19.7 TiB (apparent) / 18.7 TiB (real) of that number).
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
2.86GB (uncompressed)
2.72GB (FreeNAS w/ LZ4, 1M record size)
2.86GB (TrueNAS Scale w/ ZSDT, 128K record size)
2.86GB (TrueNAS Scale w/ LZ4, 128K record size)
2.86GB (TrueNAS Scale w/ ZSDT, 1M record size)

2.86GB (TrueNAS Scale w/ LZ4, 1M record size)

I think you're being led astray by red herrings. (Black = Windows? Red = FreeNAS. Geen = SCALE.) That right there tells you there's an outlier. The compression algorithm makes no difference. The pool/system is what makes the difference.)

Start off with the assumption that compressed video files will not further shrink from lossless compression (regardless of LZ4 or ZSTD).

There's no way your compressed video file is shrinking an additional 5% in size (real or disk usage) due to lossless compression.

You have to start from there. Otherwise, you'll just keep going in circles. There are other reasons for why the different tools (ls, df, du, zfs, explorer, etc) are reporting different disk usages and apparent sizes. Plus, you're comparing two different pools with different vdev layouts.

With our compressible text file tests, we clearly saw that "When all else is equal, on the same pool, the largest recordsize yields less disk usage when combined with ZSTD inline compression, even more profoundly than LZ4 compression. However, smaller recordsizes lose much of this compression efficiency, to the point that LZ4 outperforms ZSTD if the latter is using a smaller recordsize."

This is expected when dealing with highly compressible files. But it all goes out the window once you start dealing with videos and photos, and thus the recordsize + vdev members/width plays a more important role in disk usage.
 
Last edited:
Top