Striped pool using SAN-backed disks and usage discrepancy

seankndy

Cadet
Joined
Aug 16, 2023
Messages
2
I have a PureNAS Core (12.2) system that was running a ZFS pool with a single 4TB vdev. The vdev is an iSCSI LUN served by a PureStorage SAN.

Usage was starting to creep up towards 70%, so I created an additional 4TB LUN in the SAN and then exposed it to the PureNAS box. I then added the LUN to the pool. The result:

# zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
TANK 7.97T 2.85T 5.12T - - 34% 35% 1.00x ONLINE /mnt
gptid/811314ce-7324-11ec-857e-00505687c9e9 3.98T 2.75T 1.23T - - 61% 69.0% - ONLINE
gptid/4e66d155-385b-11ee-a94f-00505687c9e9 3.98T 102G 3.88T - - 7% 2.49% - ONLINE


(new vdev in bold)

This ended up looking as it should. However, the new PureStorage LUN has rapidly increased from 0GB to 800GB in the matter of 5 days, even though the ZFS usage listed above shows only 102G. The original LUN in the SAN (811314ce-7324-11ec-857e-00505687c9e9) shows no increase OR decrease in usage.

Can anyone explain what ZFS is doing here? Why am I suddenly using 800GB *more* data on my SAN by adding the disk to the pool? I would have thought if 800GB is now used on the new LUN that I would see an equal drop in the other LUN, but that hasn't happened.

In hindsight I probably should have just expanded the 4TB LUN to 8TB, but for now I'm just trying to wrap my head around what is going on here.

Thank you!
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I have a PureNAS Core (12.2) system that was running a ZFS pool with a single 4TB vdev. The vdev is an iSCSI LUN served by a PureStorage SAN.
...
I am not sure what you mean by PureNAS Core (12.2). This is the TrueNAS forums... If you meant TrueNAS backed by PureStorage SAN, then okay.

The output of the following commands might be helpful;
zfs list -t all -r


Next, did you thin provision your new 4TB SAN LUN?

If so, then the 800GB might be misc. metadata updates causing new SAN storage to be allocated. Every write or read with access time recording, causes metadata updates. Even if the data writes are tiny, the metadata is copied on write, so it does not over-write the existing metadata.

And we are not talking about simple metadata. ZFS'' metadata is complex, causing a multiple writes because any data write has it's directory metadata, higher level metadata and Uber blocks to be updated.

Further, ZFS' keeps TWO / 2 copies of regular metadata by default. Thus, doubling it's storage. And ZFS' keeps THREE / 3 copies of critical metadata. And if I recall, their are 3 Uber block tables PER DISK / LUN.

Since all that metadata gets written to new locations, (aka Copy On Write), that would leave old, currently un-used metadata blocks still assigned in the SAN LUN. But, those blocks would be considered & listed as free space by ZFS.



You have probably heard the warnings, but TrueNAS is not designed to use external SAN disk, whether it is RAIDed or not. If the PureStorage SAN does not do "the proper thing", data loss can occur during crashes or power outages. And potentially total loss of the pool.

The "proper thing" is actually several items. Like follow write barriers, which force earlier writes to finish before newer ones. Thus, preserving ZFS' copy on write and order of write. Their are other "proper things", but I don't know them all.
 

Levi G.

Cadet
Joined
Nov 11, 2016
Messages
2
I'm going to chime in here with Sean as we're working on the same project.

We're currently running TrueNAS Core 12.0u8. Our Pure SAN does not have the native capability to serve up NFS shares, thus the direction to stand up TrueNAS Core in a VM. Originally we had created a single 4TB LUN, and presented it to the ESXi nodes within the vSphere cluster at this facility. From there, the LUN was NOT consumed within vSphere, however it was set as an RDM disk and presented directly to the TrueNAS VM. Been running for nearly 2 years without any issues.

Over the course of time, the volume has been consumed, and has steadily been filling up. The course of action we took was to add a second LUN, and proceed with the RDM passthru to the TrueNAS VM, and add the vdev to the pool, then expand on it.

FYI, Arwen, thank you for your input on the matter, and clarification as to how the metadata is written over the span of 'disks'. That makes sense as to why we're seeing the huge increase and consumption on the backside.

Moving forward, we're looking at increasing the size of the initial LUN, as we've labbed up her with another set of LUNs & TrueNAS VM appliance. Once the initial LUN sizing has been increased, say to 8TB, we'll then expand the pool and once that process has completed, then remove the second LUN from the pool altogether. We've nightly snapshot tasks and snapshot replications that are being dumped on over to (2) physical TrueNAS servers.

In regards to power outages, we recently went through this mishap at this facility. And how this TrueNAS VM & Pure LUNs are setup, the Pure unit came up long before the TrueNAS VM was booted up. No data was lost as a result of this power outage.
 

seankndy

Cadet
Joined
Aug 16, 2023
Messages
2
Typo; I meant TrueNAS.

I surmised that the raw SAN usage that is increasing was some sort of pre-allocations and metadata since ZFS itself shows only 100GB of space used on that vdev and the original vdev shows no change.

I have considered just backing out of this and removing the additional vdev from the tank. The zpool-remove man page states that this is supported and will copy data back over to the other vdev(s) assuming there is space (there is), apparently since version 11.2. The problem is I am on pace to *double* my SAN usage just because I added this vdev to the ZFS tank.

I created a new "test" tank with 2 new LUNs / vdevs in a stripe and attempted the removal and it did work and data was preserved.

Any feedback on this process or warnings or if this would be a good idea? I am concerned that if I now remove that additional vdev I added, will ZFS copy this 800GB of additional data/metadata over to the other vdev and I would still be in a situation where I am now +800GB than I was before?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
...
In regards to power outages, we recently went through this mishap at this facility. And how this TrueNAS VM & Pure LUNs are setup, the Pure unit came up long before the TrueNAS VM was booted up. No data was lost as a result of this power outage.
Just because a single outage did not cause data loss, does not mean the design of this SAN -> LUN -> VMWare -> VM NAS will work without data loss next crash / power loss.

What maters is the design of the PureStorage and if it does the "right thing". I can't say, except that ZFS was designed for direct access to disks, SATA, SAS, NVMe, etc... not iSCSI LUNs through SAN.

That said, I routinely use FC SAN LUNs to my Solaris 11 servers at work. Solaris 11 uses ZFS exclusively. And the Solaris 11 servers are virtualized through Sun LDOMs, Logical Domaining, a method to divide up a SPARC T series processor in to multiple client servers. That all works and has for many years...


As for the plan to remove the second 4TB SAN LUN, and then grow the first 4TB SAN LUN to 8TB, I don't know. Basically, you are off the beaten track, to the point I have not knowledge if your design will remain stable.

With the low cost of 8TB disks, it might be simpler to buy a physical server, and use 2 x 8TB in a Mirror vDev. Or just use some other NAS software in your SAN / VMWare / VM NAS configuration.
 
Top