How 'unsafe' will this be?

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
Hello all
I used to have truenas on bare metal, now virtualised.
The hypervisor s proxmox.
I have some important information on the NAS, as well a less important, being media files.
I plan to move about 2TB of my data to a share setup through proxmox, as simple samba/windows shares.

The machine has 64GB non ecc memory.
So the plan is:

2 x 2TB drives as a mirror zfs setup within proxmox with windows shares (not truenas). This is for most important data. Non ECC memory much less of an issue in that situation as I understand, compared to storing it on truenas.

The more unsafe stuff is as follows.

5 x 4TB drives in raid Z1 for media files, accessed through samba/windows share. Also nfs for temporary files. This particular pool has already been created and runs as virtual machine.. All the files on there are backed up using multiple external drives. Maybe once a week. It does not change often. Using proxmox and and appropriate turnkey container could do this also, but would mean recopying all data to it again.

The maybe 'dangerous' pool:

A 12 - 16TB drive with no redundancy to store games. I still need to but this. A single 16GB intel optane for deduplication.
There is a lot of confusion it seems about deduplication in truenas.
According to craft computing, it is about 1GB deduplication data needed for 1TB actual data.
Can anyone confirm this?


The plan is to setup two sparse drives, both the same size and using deduplication.
Also another virtual nic ( in proxmox) to be attached to truenas.
So 2 iscsi shares to different computers. One virtual (already running) & one actual gaming machine.
The first for updates games only.
The second playing games. I use primocache software that caches block devices with NVMe as cache.
Also a lancache server to speed thing up, specifically games updates.

So with games, they can be re-downloaded in case of disaster.

All appropriate drives are direct attached to ht truenas virtual machine.
Using IOMMU is a problem, so direct attached directly only. smart monitoring will be done through proxmox.

The machine itself: 3600X, MSI B450 tomahawk & 64GB non ecc memory.
Hardware passthrough apparently works only with devices directly attached to the CPU.
The 16GB intel optane should be able to pass through to the truenas virtual machine.
If not, will there be massive performance issues. is directly attached through proxmox?
There is only one NVMe slot on the motherboard.

How doable is this?
How 'unsafe' is this?
I am not spending any money an any server equipment, and using what I have.
The only hardware I may buy is the 12 - 16TB hard drive.
Thanks
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
All appropriate drives are direct attached to ht truenas virtual machine.
Using IOMMU is a problem, so direct attached directly only. smart monitoring will be done through proxmox.
You're preparing to lose the data by doing this.

You must have the disk controller (not the "disk") available directly (via PCI passthrough) to TrueNAS, or your data will be at risk and likely to be lost at some point.

How doable is this?
You can "do it".

How 'unsafe' is this?
Very.
I can think of at least 10 threads where I saw people coming here for help to recover their pools which were set up like this and lost to corruption.
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
Thank you for the reply.

As stated IOMMU through the chipset = not working. This is where the sata cards and onboardrd controller are.
Only direct connected devices to the CPU do. They are in this case a quad port nic for a router & the 16GB optane drive.
I have read through the document. If IOMMU worked for sata controller, would use it.

When trying it, the machine just freezes.

How am I going to lose data exactly?

Truenas will be used for rarely changing media on one pool, which is backed up with multiple external drives
Also the pool for games, all of which which can be downloaded again.

The most important data I have will not be stored through truenas. Also it will be backed up in other machines & external drives.

Also can you tell me if the ratio of 1GB deuplication data to 1TB of storage is about right?

As for the pool with games & deduplication, is there documentation with something like - It has xyz% cahnce of failing within a year. That would be useful.

Thanks
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
2 x 2TB drives as a mirror zfs setup within proxmox with windows shares (not truenas). This is for most important data. Non ECC memory much less of an issue in that situation as I understand, compared to storing it on truenas.
Using ZFS is what prompts most people to "require" ECC memory, not TrueNAS (although TrueNAS exclusively uses ZFS, so you can argue it's the case for TrueNAS just as much).

If you care about the integrity of your data, you'll care to use ECC memory, no matter the filesystem.

Also can you tell me if the ratio of 1GB deuplication data to 1TB of storage is about right?
I can't, but there are threads in the forum that discuss it.

What I can tell you is that I see plenty of cases where folks without enough RAM try to use dedup and come here complaining about how terrible the performance of their NAS is when dedup uses all their RAM and other things start to go slow or fail.

How am I going to lose data exactly?
One of two ways (or maybe both):

You will see pool corruption, which will be related to events where either the disk is suddenly (perhaps temporarily) removed from the VM by a quirk of the hypervisor or where the caching that the hypervisor is doing somehow fails to pass all data to the disk.

When enough events of disk (temporary) removal happen with data not making it to disk, the pool will become inoperable and you will potentially have lost all data in it.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
As stated IOMMU through the chipset = not working.
I understand that you may have limited resources/budget and your available hardware isn't up to the task.

What I can't do for you is change how that will work in terms of the outcome you will get if you don't do it right.

If you want to virtualize TrueNAS and keep your data, you need to pass through the SATA controller at PCI level and use a reliable hypervisor (ESXi is the only one rated as such, Proxmox may be on its way there, but they even rate their own PCI passthrough as experimental at this time).

I really recommend against doing what you propose unless you're OK with losing all the data at some point.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Some information about ZFS De-Dup;


In some ways, people think they can do lots of things on a server, besides simple NAS. That can be true, but if poorly implemented, the storage behind the NAS is at risk. Thus, have good backups and know you might loose data between last backup and pool loss. Your call.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Also can you tell me if the ratio of 1GB deuplication data to 1TB of storage is about right?
The number I've heard says to expect 5 GB of RAM required per TB of storage, but that the requirement could go far higher.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
You could check if TrueNAS SCALE with its allegedly better hypervisor support might fit your needs and simply not run proxmox but only a single system.

Adding to the remarks about ECC:

ECC is always safer than no ECC
ZFS is almost always safer than any other filesystem - ECC or not!

The "ZFS will destroy your data without ECC" myth is exactly that - a myth.
 
Joined
Jun 15, 2022
Messages
674
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
How am I going to lose data exactly?

ZFS expects to be in control of the I/O operations to the disk, including ordering and sync. ZFS has some quirks that prevent it from scoring 100% on predictive fault detection, and we generally compensate for this with the use of SMART. Placing your disks behind another subsystem that may reorder, obscure, or otherwise mishandle the intense I/O loads that ZFS tends to generate can cause pool corruption. This is one of the reasons why ZFS projects advise you not to use hardware RAID controllers or other devices that might clever your data to death; hypervisors can also screw this up especially if they do anything but strict sequencing of I/O, or lose connectivity to their I/O due to (for example) a slow I/O subsystem. In general, ZFS needs direct access to the directly attached raw disk to be safe. People who have tried other things have run into ... issues.

I can think of at least 10 threads where I saw people coming here for help to recover their pools which were set up like this and lost to corruption.

That's just the Proxmox threads. I can probably dig up dozens more where people tried to use ESXi RDM to explosive results, which is where the TrueNAS warnings against RDM originate from

The number I've heard says to expect 5 GB of RAM required per TB of storage, but that the requirement could go far higher.

We should say "5GB of ARC required per TB of storage", given that Linux is stupid about ARC management. ARC ~== RAM is only true for CORE.

5GB of ARC per TB of storage assumes a conventional mix of block sizes; if your data is heavily skewed towards smaller block sizes, this number grows (possibly significantly). Dedup places intense pressure on the ARC subsystem. For reasons similar to the advice that you should not use an L2ARC until you have at least 64GB of ARC, you should have a considerable amount of ARC (definitely north of 64GB, probably 128GB++) before engaging in dedup. Stilez has an excellent example of a workable dedup setup over in the Resources section and is mandatory reading for anyone contemplating dedup.
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
The problem is that would like to try it out, but would mean purchasing the hard drive '12 - 16TB'
I have various old machine about to maybe try it out.

As for deduplicaion, i thought it only used a lot of extra RAM if a deduplication device was not being used, specifically an optane drive I already have.
There is so much conflicting information out there.

With what I have could use to test with an old machine:
An old business machine 32Gb ram with i7-3770
Also put in a old boot drive, old 1TB sata HD, 2.5Gb NIC
Most important a PCIE adapter, + 16GB intel optane.

Install truenas setup a pool of 1TB + 16GB optane as dedupication.


It is intended to store games.
The larger the block sizes the better.
So it it possible to tweak the block size for a pool.
Also what about iscsi? how to change block sizes in that?
Any pointers there?


Maybe buy a bigger drive if all goes well & use in proxmox server as a virtual machine.



Thanks
 
Joined
Jun 15, 2022
Messages
674
Is there a resource or good link on why not to enable dedup? It seems (in general) people think this is a magic bullet for gaining 5x the drive space. (I'd like to add the "Don't do it!" it to the resources list.)
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Yes, the one I posted above:
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
We should say "5GB of ARC required per TB of storage", given that Linux is stupid about ARC management. ARC ~== RAM is only true for CORE.
By the same token, should we revise the rule of thumb "L2ARC ≤ 5 to 10 * RAM" (CORE) to be "RAM ≤ 2.5 to 5 * RAM" for SCALE?
I.e. make 5*RAM the absolute maximum for L2ARC on Linux rather than an initial guideline.

For reasons similar to the advice that you should not use an L2ARC until you have at least 64GB of ARC, you should have a considerable amount of ARC (definitely north of 64GB, probably 128GB++) before engaging in dedup.
So "no L2ARC until you have 128 GB RAM" on Linux?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
So "no L2ARC until you have 128 GB RAM" on Linux?
I guess that would be the broad (albeit extremely unpopular) advice.

I'm sure we'll get buckets of feedback about how it works great in plenty of cases where people have less RAM than that... but that's already the case with the 64GB recommendation. (even if it is based on specific cases or where the folks don't really understand what's going on).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I guess that would be the broad (albeit extremely unpopular) advice.

I'm sure we'll get buckets of feedback about how it works great in plenty of cases where people have less RAM than that... but that's already the case with the 64GB recommendation. (even if it is based on specific cases or where the folks don't really understand what's going on).

This basically breaks down to two subcategories:

1) People who have done the homework with arc_summary etc to understand their workloads, in which case yes indeed it MIGHT be just fine, or

2) People who just glom on an L2ARC to their 16GB or 32GB RAM system and just assume it must be doing SOMETHING useful, not bothering to understand that the L2ARC isn't making good choices.

The thing that tends to be difficult to convey to people is that what you want is for the ARC records to have enough time resident in ARC to collect a significant amount of multiple hits. You then want to be able to evict to L2ARC the stuff that is seeing more than a single hit but not a large number of hits. This is the stuff that will benefit from being cached in L2ARC. If your ARC is so small that most entries never accrue more than two hits, for example, you basically end up picking random blocks to evict to the L2ARC, which is inefficient. Your system may be lucky enough to thrash around and eventually evict a useful set of blocks, but it is a poor substitute for having a sufficient amount of ARC to begin with.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
By the same token, should we revise the rule of thumb "L2ARC ≤ 5 to 10 * RAM" (CORE) to be "RAM ≤ 2.5 to 5 * RAM" for SCALE?
I.e. make 5*RAM the absolute maximum for L2ARC on Linux rather than an initial guideline.

I think we have to defocus on the abbreviation "RAM". The rule has always referred to ARC, which on CORE is intimately linked to RAM. If you have a 128GB RAM Linux system and you tune it to 96GB or 112GB of ARC, which is an entirely reasonable thing to do if you're not running containers or VM's, referring to RAM as you suggest actually gets us a bad result.

So "no L2ARC until you have 128 GB RAM" on Linux?

Probably more like "No L2ARC until you have 64GB ARC" and then also "Linux users can increase their ARC size to a larger percent of RAM because they have a dumb OS"...? Heh.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Poul-Henning Kamp wrote about the architecture of Varnish: there is only one type of memory - disk. Everything else is one of various stages of cacheing. There is some truth to that. Varnish uses memory mapped files and lets the operating system decide what is kept in RAM and what isn't.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Varnish uses memory mapped files and lets the operating system decide what is kept in RAM and what isn't.

This could be considered lazy and may not work well in many cases, because you've introduced a dependency on a memory management system of unknown quality. It could work well if you had a memory management system that handled MFU/MRU, could handle fragmented memory well, efficiently handled faults in parallel, etc. That's not universally true. I know that I generally don't try to make my problem someone else's problem to solve when coding, but then again there's value to not reinventing the wheel too. Hm.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Top