How 'unsafe' will this be?

ThisTruenasUser · Apr 20, 2023

Hello all
I used to have truenas on bare metal, now virtualised.
The hypervisor s proxmox.
I have some important information on the NAS, as well a less important, being media files.
I plan to move about 2TB of my data to a share setup through proxmox, as simple samba/windows shares.

The machine has 64GB non ecc memory.
So the plan is:

2 x 2TB drives as a mirror zfs setup within proxmox with windows shares (not truenas). This is for most important data. Non ECC memory much less of an issue in that situation as I understand, compared to storing it on truenas.

The more unsafe stuff is as follows.

5 x 4TB drives in raid Z1 for media files, accessed through samba/windows share. Also nfs for temporary files. This particular pool has already been created and runs as virtual machine.. All the files on there are backed up using multiple external drives. Maybe once a week. It does not change often. Using proxmox and and appropriate turnkey container could do this also, but would mean recopying all data to it again.

The maybe 'dangerous' pool:

A 12 - 16TB drive with no redundancy to store games. I still need to but this. A single 16GB intel optane for deduplication.
There is a lot of confusion it seems about deduplication in truenas.
According to craft computing, it is about 1GB deduplication data needed for 1TB actual data.
Can anyone confirm this?

The plan is to setup two sparse drives, both the same size and using deduplication.
Also another virtual nic ( in proxmox) to be attached to truenas.
So 2 iscsi shares to different computers. One virtual (already running) & one actual gaming machine.
The first for updates games only.
The second playing games. I use primocache software that caches block devices with NVMe as cache.
Also a lancache server to speed thing up, specifically games updates.

So with games, they can be re-downloaded in case of disaster.

All appropriate drives are direct attached to ht truenas virtual machine.
Using IOMMU is a problem, so direct attached directly only. smart monitoring will be done through proxmox.

The machine itself: 3600X, MSI B450 tomahawk & 64GB non ecc memory.
Hardware passthrough apparently works only with devices directly attached to the CPU.
The 16GB intel optane should be able to pass through to the truenas virtual machine.
If not, will there be massive performance issues. is directly attached through proxmox?
There is only one NVMe slot on the motherboard.

How doable is this?
How 'unsafe' is this?
I am not spending any money an any server equipment, and using what I have.
The only hardware I may buy is the 12 - 16TB hard drive.
Thanks

sretalla · Apr 20, 2023

ThisTruenasUser said:
All appropriate drives are direct attached to ht truenas virtual machine.
Using IOMMU is a problem, so direct attached directly only. smart monitoring will be done through proxmox.

You're preparing to lose the data by doing this.

You must have the disk controller (not the "disk") available directly (via PCI passthrough) to TrueNAS, or your data will be at risk and likely to be lost at some point.

"Absolutely must virtualize TrueNAS!" ... a guide to not completely losing your data.

[---- 2018/02/27: This is still as relevant as ever. As PCIe-Passthru has matured, fewer problems are reported. I've updated some specific things known to be problematic ----] [---- 2014/12/24: Note, there is another post discussing how to deploy a small FreeNAS VM instance for basic file...

www.truenas.com

ThisTruenasUser said:
How doable is this?

You can "do it".

ThisTruenasUser said:
How 'unsafe' is this?

Very.
I can think of at least 10 threads where I saw people coming here for help to recover their pools which were set up like this and lost to corruption.

ThisTruenasUser · Apr 20, 2023

Thank you for the reply.

As stated IOMMU through the chipset = not working. This is where the sata cards and onboardrd controller are.
Only direct connected devices to the CPU do. They are in this case a quad port nic for a router & the 16GB optane drive.
I have read through the document. If IOMMU worked for sata controller, would use it.

When trying it, the machine just freezes.

How am I going to lose data exactly?

Truenas will be used for rarely changing media on one pool, which is backed up with multiple external drives
Also the pool for games, all of which which can be downloaded again.

The most important data I have will not be stored through truenas. Also it will be backed up in other machines & external drives.

Also can you tell me if the ratio of 1GB deuplication data to 1TB of storage is about right?

As for the pool with games & deduplication, is there documentation with something like - It has xyz% cahnce of failing within a year. That would be useful.

Thanks

sretalla · Apr 20, 2023

ThisTruenasUser said:
2 x 2TB drives as a mirror zfs setup within proxmox with windows shares (not truenas). This is for most important data. Non ECC memory much less of an issue in that situation as I understand, compared to storing it on truenas.

Using ZFS is what prompts most people to "require" ECC memory, not TrueNAS (although TrueNAS exclusively uses ZFS, so you can argue it's the case for TrueNAS just as much).

If you care about the integrity of your data, you'll care to use ECC memory, no matter the filesystem.

ThisTruenasUser said:
Also can you tell me if the ratio of 1GB deuplication data to 1TB of storage is about right?

I can't, but there are threads in the forum that discuss it.

What I can tell you is that I see plenty of cases where folks without enough RAM try to use dedup and come here complaining about how terrible the performance of their NAS is when dedup uses all their RAM and other things start to go slow or fail.

ThisTruenasUser said:
How am I going to lose data exactly?

One of two ways (or maybe both):

You will see pool corruption, which will be related to events where either the disk is suddenly (perhaps temporarily) removed from the VM by a quirk of the hypervisor or where the caching that the hypervisor is doing somehow fails to pass all data to the disk.

When enough events of disk (temporary) removal happen with data not making it to disk, the pool will become inoperable and you will potentially have lost all data in it.

sretalla · Apr 20, 2023

ThisTruenasUser said:
As stated IOMMU through the chipset = not working.

I understand that you may have limited resources/budget and your available hardware isn't up to the task.

What I can't do for you is change how that will work in terms of the outcome you will get if you don't do it right.

If you want to virtualize TrueNAS and keep your data, you need to pass through the SATA controller at PCI level and use a reliable hypervisor (ESXi is the only one rated as such, Proxmox may be on its way there, but they even rate their own PCI passthrough as experimental at this time).

I really recommend against doing what you propose unless you're OK with losing all the data at some point.

Arwen · Apr 20, 2023

Some information about ZFS De-Dup;

Resource - ZFS de-Duplication - Or why you shouldn't use de-dup

The TrueNAS forums occasionally have people who come across ZFS de-duplicate, and want to investigate its use. Or think it is a good idea, and want to implement it. Here are some suggested configuration details: Understand that you need CPU...

www.truenas.com

In some ways, people think they can do lots of things on a server, besides simple NAS. That can be true, but if poorly implemented, the storage behind the NAS is at risk. Thus, have good backups and know you might loose data between last backup and pool loss. Your call.

danb35 · Apr 20, 2023

ThisTruenasUser said:
Also can you tell me if the ratio of 1GB deuplication data to 1TB of storage is about right?

The number I've heard says to expect 5 GB of RAM required per TB of storage, but that the requirement could go far higher.

Patrick M. Hausen · Apr 20, 2023

You could check if TrueNAS SCALE with its allegedly better hypervisor support might fit your needs and simply not run proxmox but only a single system.

Adding to the remarks about ECC:

ECC is always safer than no ECC
ZFS is almost always safer than any other filesystem - ECC or not!

The "ZFS will destroy your data without ECC" myth is exactly that - a myth.

WI_Hedgehog · Apr 20, 2023

ThisTruenasUser said:
I plan to move about 2TB of my data to a share setup through proxmox, as simple samba/windows shares.

Adding a Samba share to Proxmox as Storage – Virtualize Everything

virtualizeeverything.com

jgreco · Apr 20, 2023

ThisTruenasUser said:
How am I going to lose data exactly?

ZFS expects to be in control of the I/O operations to the disk, including ordering and sync. ZFS has some quirks that prevent it from scoring 100% on predictive fault detection, and we generally compensate for this with the use of SMART. Placing your disks behind another subsystem that may reorder, obscure, or otherwise mishandle the intense I/O loads that ZFS tends to generate can cause pool corruption. This is one of the reasons why ZFS projects advise you not to use hardware RAID controllers or other devices that might clever your data to death; hypervisors can also screw this up especially if they do anything but strict sequencing of I/O, or lose connectivity to their I/O due to (for example) a slow I/O subsystem. In general, ZFS needs direct access to the directly attached raw disk to be safe. People who have tried other things have run into ... issues.

sretalla said:
I can think of at least 10 threads where I saw people coming here for help to recover their pools which were set up like this and lost to corruption.

That's just the Proxmox threads. I can probably dig up dozens more where people tried to use ESXi RDM to explosive results, which is where the TrueNAS warnings against RDM originate from

danb35 said:
The number I've heard says to expect 5 GB of RAM required per TB of storage, but that the requirement could go far higher.

We should say "5GB of ARC required per TB of storage", given that Linux is stupid about ARC management. ARC ~== RAM is only true for CORE.

5GB of ARC per TB of storage assumes a conventional mix of block sizes; if your data is heavily skewed towards smaller block sizes, this number grows (possibly significantly). Dedup places intense pressure on the ARC subsystem. For reasons similar to the advice that you should not use an L2ARC until you have at least 64GB of ARC, you should have a considerable amount of ARC (definitely north of 64GB, probably 128GB++) before engaging in dedup. Stilez has an excellent example of a workable dedup setup over in the Resources section and is mandatory reading for anyone contemplating dedup.

ThisTruenasUser · Apr 20, 2023

The problem is that would like to try it out, but would mean purchasing the hard drive '12 - 16TB'
I have various old machine about to maybe try it out.

As for deduplicaion, i thought it only used a lot of extra RAM if a deduplication device was not being used, specifically an optane drive I already have.
There is so much conflicting information out there.

With what I have could use to test with an old machine:
An old business machine 32Gb ram with i7-3770
Also put in a old boot drive, old 1TB sata HD, 2.5Gb NIC
Most important a PCIE adapter, + 16GB intel optane.

Install truenas setup a pool of 1TB + 16GB optane as dedupication.

It is intended to store games.
The larger the block sizes the better.
So it it possible to tweak the block size for a pool.
Also what about iscsi? how to change block sizes in that?
Any pointers there?

Maybe buy a bigger drive if all goes well & use in proxmox server as a virtual machine.

Thanks

WI_Hedgehog · Apr 20, 2023

Is there a resource or good link on why not to enable dedup? It seems (in general) people think this is a magic bullet for gaining 5x the drive space. (I'd like to add the "Don't do it!" it to the resources list.)

Arwen · Apr 20, 2023

Yes, the one I posted above:

Resource - ZFS de-Duplication - Or why you shouldn't use de-dup

The TrueNAS forums occasionally have people who come across ZFS de-duplicate, and want to investigate its use. Or think it is a good idea, and want to implement it. Here are some suggested configuration details: Understand that you need CPU...

www.truenas.com

Etorix · Apr 21, 2023

jgreco said:
We should say "5GB of ARC required per TB of storage", given that Linux is stupid about ARC management. ARC ~== RAM is only true for CORE.

By the same token, should we revise the rule of thumb "L2ARC ≤ 5 to 10 * RAM" (CORE) to be "RAM ≤ 2.5 to 5 * RAM" for SCALE?
I.e. make 5*RAM the absolute maximum for L2ARC on Linux rather than an initial guideline.

jgreco said:
For reasons similar to the advice that you should not use an L2ARC until you have at least 64GB of ARC, you should have a considerable amount of ARC (definitely north of 64GB, probably 128GB++) before engaging in dedup.

So "no L2ARC until you have 128 GB RAM" on Linux?

sretalla · Apr 21, 2023

Etorix said:
So "no L2ARC until you have 128 GB RAM" on Linux?

I guess that would be the broad (albeit extremely unpopular) advice.

I'm sure we'll get buckets of feedback about how it works great in plenty of cases where people have less RAM than that... but that's already the case with the 64GB recommendation. (even if it is based on specific cases or where the folks don't really understand what's going on).

jgreco · Apr 21, 2023

sretalla said:
I guess that would be the broad (albeit extremely unpopular) advice.

I'm sure we'll get buckets of feedback about how it works great in plenty of cases where people have less RAM than that... but that's already the case with the 64GB recommendation. (even if it is based on specific cases or where the folks don't really understand what's going on).

This basically breaks down to two subcategories:

1) People who have done the homework with arc_summary etc to understand their workloads, in which case yes indeed it MIGHT be just fine, or

2) People who just glom on an L2ARC to their 16GB or 32GB RAM system and just assume it must be doing SOMETHING useful, not bothering to understand that the L2ARC isn't making good choices.

The thing that tends to be difficult to convey to people is that what you want is for the ARC records to have enough time resident in ARC to collect a significant amount of multiple hits. You then want to be able to evict to L2ARC the stuff that is seeing more than a single hit but not a large number of hits. This is the stuff that will benefit from being cached in L2ARC. If your ARC is so small that most entries never accrue more than two hits, for example, you basically end up picking random blocks to evict to the L2ARC, which is inefficient. Your system may be lucky enough to thrash around and eventually evict a useful set of blocks, but it is a poor substitute for having a sufficient amount of ARC to begin with.

jgreco · Apr 21, 2023

Etorix said:
By the same token, should we revise the rule of thumb "L2ARC ≤ 5 to 10 * RAM" (CORE) to be "RAM ≤ 2.5 to 5 * RAM" for SCALE?
I.e. make 5*RAM the absolute maximum for L2ARC on Linux rather than an initial guideline.

I think we have to defocus on the abbreviation "RAM". The rule has always referred to ARC, which on CORE is intimately linked to RAM. If you have a 128GB RAM Linux system and you tune it to 96GB or 112GB of ARC, which is an entirely reasonable thing to do if you're not running containers or VM's, referring to RAM as you suggest actually gets us a bad result.

Etorix said:
So "no L2ARC until you have 128 GB RAM" on Linux?

Probably more like "No L2ARC until you have 64GB ARC" and then also "Linux users can increase their ARC size to a larger percent of RAM because they have a dumb OS"...? Heh.

Patrick M. Hausen · Apr 21, 2023

Poul-Henning Kamp wrote about the architecture of Varnish: there is only one type of memory - disk. Everything else is one of various stages of cacheing. There is some truth to that. Varnish uses memory mapped files and lets the operating system decide what is kept in RAM and what isn't.

jgreco · Apr 21, 2023

Patrick M. Hausen said:
Varnish uses memory mapped files and lets the operating system decide what is kept in RAM and what isn't.

This could be considered lazy and may not work well in many cases, because you've introduced a dependency on a memory management system of unknown quality. It could work well if you had a memory management system that handled MFU/MRU, could handle fragmented memory well, efficiently handled faults in parallel, etc. That's not universally true. I know that I generally don't try to make my problem someone else's problem to solve when coding, but then again there's value to not reinventing the wheel too. Hm.

Patrick M. Hausen · Apr 21, 2023

An operating system with a unified VM and buffer cache like FreeBSD does perform better than any algorithm you and I can come up with. And phk has already proven that:

https://dl.acm.org/doi/pdf/10.1145/1810226.1814327

http://phk.freebsd.dk/pubs/varnish_perf.pdf

Important Announcement for the TrueNAS Community.

How 'unsafe' will this be?

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

Powered by Neutrality

MVP

Hall of Famer

Hall of Famer

Guru

Resident Grinch

Dabbler

Guru

MVP

Wizard

Powered by Neutrality

Resident Grinch

Resident Grinch

Hall of Famer

Resident Grinch

Hall of Famer

Similar threads