Using 100% of the disks spaces for iscsi

fabioteixei · Feb 10, 2022

Hi there.

I searched but I still not get why I can't use 100% of the available space on my disks to create vDevs and zvols.

I have 5 480 GB SSD and I like to use them in a kind of JBOD storage for storing Hyper-V VHDs using iSCSI.

I'm not as of now worried about parity. I use an external Backup for safety and it's for my home lab environment. If I lost any data I can rebuild the whole lab and all of the VMs with no problem.

Can anyone please help me understand why I can't use all the available space on the disks?

What's the recommended configuration for this kind of setup?

Thanks.

jgreco · Feb 10, 2022

Your questions and the inevitable ones you didn't ask are answered in

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

Ericloewe · Feb 10, 2022

It's worth pointing out that filling up a filesystem to 100% is not a recipe for success on any filesystem, with any workload.

fabioteixei · Feb 10, 2022

jgreco said:
Your questions and the inevitable ones you didn't ask are answered in

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

I have read that but still the doubt remains.

In that article it's said to it's ideal create an zvol with 50% only. So it means that I will throw away 50% of my disk space?

I don't think it's feasible to let 50% of my total disk space inaccessible. It's like throwing away 50% of the money I paid for the disks on the trash.

I intend to use my Truenas only for iSCSI. And I know that il will not use all the space all the time but blocking access to the possibility of using that space in a case of an emergency (moving data from one server to another for instance) don't seems as a good use of my hardware.

Can't I at least create 2 zvols each with 50%?

Ericloewe · Feb 10, 2022

“Throwing money away” is a rather simplistic and inaccurate analysis. Context matters, and fundamentally this is a matter of trading storage capacity for performance, through the magic of computer science. Fill it up more and performance will degrade substantially. Fill it to 95% and it’ll be like watching paint dry. Fill it to 100% and you’ll have a mess on your hands.

And who said anything about blocking storage space?

fabioteixei · Feb 10, 2022

Ericloewe said:
“Throwing money away” is a rather simplistic and inaccurate analysis. Context matters, and fundamentally this is a matter of trading storage capacity for performance, through the magic of computer science. Fill it up more and performance will degrade substantially. Fill it to 95% and it’ll be like watching paint dry. Fill it to 100% and you’ll have a mess on your hands.

And who said anything about blocking storage space?

What I mean by blocking it's that 50% or even 20% off all my disk space will be unavailable if I follow the recommendations.

I know I can force the size to 100% if I want to, and in the end I don't really have to use Truenas or even iSCSI. I can do an SMB 3.0 share and map it to my Hyper-V server since it's a supported source for Hyper-V.

Well, anyways, thanks for your help. I will evaluate my options.

But you guys says that getting to 95% or 100% can get me in trouble. Can you expand on that?

Remember, it's a home lab with no real important data (maybe my Plex VM movie library) and with an external Backup. So resilience it's not a concern for this use case.

Again, thanks very much.

HoneyBadger · Feb 10, 2022

Regardless of the protocol used to share it (SMB/NFS/iSCSI) filling a ZFS pool to 100% will do Very Bad Things.

This is because of the nature of ZFS as a "transactional" and "copy-on-write" filesystem. What this boils down to is that there's no such thing as a "partial write" - similar to how a SQL or other transaction database won't perform an operation unless the entire transaction can be committed to the DB tables, ZFS won't write a new record, change an existing, or mark one for deletion unless there's enough space to write a new copy of the necessary data and/or metadata indicating as such - and then commit that metadata/pool state change all the way up the tree, until finally changing the uberblock to say "the new pool state is valid."

So if you fill the filesystem to a true 100% full, there's no space for ZFS to indicate "hey, I'd like to delete this 128K record" because it doesn't have a way to keep the pool's "current state" valid/immutable for the past transaction (it can't overwrite or delete in-place) while writing the metadata to say "delete record XYZ" for the "future state."

In regards to the other fill levels - with block storage (or SMB being treated as block-equivalent by serving VHD(X) files) the challenge is that you'll end up with fragmentation. Using all NAND lets you avoid the latency penalty of physical disk seeks, but you'll still likely see some degradation in write performance if you manage to outrun the garbage-collection routines on your SSD and have to write to dirty/partially used blocks. Leaving some free space lets the SSD write into unmapped space which is faster. 50% was Ye Olde Thumbrule for when you'd start to see noticeable pain on spinning disks. NAND you can usually push higher than that, but as mentioned before watch out for the GC routines. Better SSDs tend to be able to push closer to the wall; it depends on their firmware, amount of internal overprovisioning, etc. If your SSDs are Intel DC/HGST/etc you may have no problems until 80%+ - if they're SuperHappyFunBee from the Amazon bargain bin, less so.

But here's what you can do.

ZFS does inline compression very well using LZ4 or ZSTD (former tends to be faster, latter tends to have better compression - test with your dataset!) so you can certainly create a sparse ZVOL that's around half the size of your pool. You're striping 5x480G so you'll get roughly 2.3T usable in the pool, make a 1T sparse ZVOL, and then start loading data on it. Compare the logical size of data that you put on it (VHD allocated sizes) and see what kind of compression numbers you get. If you're getting a relatively conservative 1.33:1 compression ratio, that lets you make another 1T ZVOL and only use a grand total of about 1.5T of actual NAND to hold 2T of VHDs. Well under the margin of error.

Warning, here be potential dragons.

If you get better compression, and/or you're absolutely confident that you won't mess something up you can decide to overcommit storage by adding a third 1T ZVOL (3T logical) and make the necessary blood sacrifice to the compression gods to fit that into 2.25T, just squeaking into that 2.3T physical space. But if you're running a 5-drive stripe you're probably okay with some risk anyways. ;)

Cheers

fabioteixei · Feb 10, 2022

HoneyBadger said:
Regardless of the protocol used to share it (SMB/NFS/iSCSI) filling a ZFS pool to 100% will do Very Bad Things.

This is because of the nature of ZFS as a "transactional" and "copy-on-write" filesystem. What this boils down to is that there's no such thing as a "partial write" - similar to how a SQL or other transaction database won't perform an operation unless the entire transaction can be committed to the DB tables, ZFS won't write a new record, change an existing, or mark one for deletion unless there's enough space to write a new copy of the necessary data and/or metadata indicating as such - and then commit that metadata/pool state change all the way up the tree, until finally changing the uberblock to say "the new pool state is valid."

So if you fill the filesystem to a true 100% full, there's no space for ZFS to indicate "hey, I'd like to delete this 128K record" because it doesn't have a way to keep the pool's "current state" valid/immutable for the past transaction (it can't overwrite or delete in-place) while writing the metadata to say "delete record XYZ" for the "future state."

In regards to the other fill levels - with block storage (or SMB being treated as block-equivalent by serving VHD(X) files) the challenge is that you'll end up with fragmentation. Using all NAND lets you avoid the latency penalty of physical disk seeks, but you'll still likely see some degradation in write performance if you manage to outrun the garbage-collection routines on your SSD and have to write to dirty/partially used blocks. Leaving some free space lets the SSD write into unmapped space which is faster. 50% was Ye Olde Thumbrule for when you'd start to see noticeable pain on spinning disks. NAND you can usually push higher than that, but as mentioned before watch out for the GC routines. Better SSDs tend to be able to push closer to the wall; it depends on their firmware, amount of internal overprovisioning, etc. If your SSDs are Intel DC/HGST/etc you may have no problems until 80%+ - if they're SuperHappyFunBee from the Amazon bargain bin, less so.

But here's what you can do.

ZFS does inline compression very well using LZ4 or ZSTD (former tends to be faster, latter tends to have better compression - test with your dataset!) so you can certainly create a sparse ZVOL that's around half the size of your pool. You're striping 5x480G so you'll get roughly 2.3T usable in the pool, make a 1T sparse ZVOL, and then start loading data on it. Compare the logical size of data that you put on it (VHD allocated sizes) and see what kind of compression numbers you get. If you're getting a relatively conservative 1.33:1 compression ratio, that lets you make another 1T ZVOL and only use a grand total of about 1.5T of actual NAND to hold 2T of VHDs. Well under the margin of error.

Warning, here be potential dragons.

If you get better compression, and/or you're absolutely confident that you won't mess something up you can decide to overcommit storage by adding a third 1T ZVOL (3T logical) and make the necessary blood sacrifice to the compression gods to fit that into 2.25T, just squeaking into that 2.3T physical space. But if you're running a 5-drive stripe you're probably okay with some risk anyways. ;)

Cheers

Thanks great info.

And how about deduplication?

My system it's a 2x Xeon with 48G of RAM. Because I will be using to host VHDs, almost all off them Windows on the exact same version I think maybe deduplication can give even more available space but I'm not sure about possibility of data loss or performance impact.

You guys have any thoughts on using Dedup in this scenario?

Again thanks for the help. Because it's a lab environment it's in this environment the best place to test the possibilities.

HoneyBadger · Feb 11, 2022

fabioteixei said:
And how about deduplication?

Short answer is "don't" - deduplication is generally not recommended unless you have a lot of resources to throw at it (not just RAM, but low-latency storage such as Optane for the metadata/deduplication tables) - @Stilez has written a couple of excellent resources chronicling their adventures:

My experiments in building a home server capable of handling fast + consistent deduplication

AIM: To help people looking at deduplication on TrueNAS 12+, what I've found on the way making it work on mine. On sustained mixed loads, such as 50GB+ file copies and multiple transfers, using TrueNAS 12 with a deduped pool and default config...

www.truenas.com

A bit about SSD perfomance and Optane SSDs, when you're planning your next SSD....

NOTE: I'll be referring in this page, to a type of SSD developed by Intel and Micron, called 3D X-Point (pronounced "crosspoint"). It's most widely sold as Intel's Optane. The Optane devices I mean are things like the 900p, 905p, and P48xx...

www.truenas.com

I would stick with just seeing what results you get from LZ4 or maybe ZSTD compression (bias towards space-savings vs performance) - if you do choose to experiment with deduplication (it's a homelab, experimentation is what they're for, right?) I'd suggest splitting your virtual image into separate disks for OS/Application/Data and only storing the OS disk on the dedup-enabled ZVOL. You'll still be subject to the bathtub curve of performance on non-Optane SSDs but there will be less data to index.

fabioteixei · Feb 11, 2022

Thanks all of you for the answers. Helped a lot.

Important Announcement for the TrueNAS Community.

Using 100% of the disks spaces for iscsi

fabioteixei

Dabbler

jgreco

Resident Grinch

The path to success for block storage

Ericloewe

Server Wrangler

fabioteixei

Dabbler

The path to success for block storage

Ericloewe

Server Wrangler

fabioteixei

Dabbler

HoneyBadger

actually does care

fabioteixei

Dabbler

HoneyBadger

actually does care

My experiments in building a home server capable of handling fast + consistent deduplication

A bit about SSD perfomance and Optane SSDs, when you're planning your next SSD....

fabioteixei

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Using 100% of the disks spaces for iscsi

Dabbler

Resident Grinch

Server Wrangler

Dabbler

Server Wrangler

Dabbler

actually does care

Dabbler

actually does care

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Using 100% of the disks spaces for iscsi"

Similar threads