Homelab build proxmox/k8s nfs storage heavy write

z-lf · Jan 20, 2023

Hi lovely people,

I am moving my homelab storage to truenas CORE. Not scale because I want to force myself not to use docker directly on truenas.
the setup is as follow:

3 node cluster in proxmox.
Nfs as storage backend for VMs and backups.
And I use HA.
Truenas is on a 8 bay machine.
everything is on a 1gbe network but truenas has lacp aggregation so 2x1gbe.
all my services in k8s will use nfs storage class

I am into monitoring a lot (learning for work) so a lot of the services will be things like influxdb, prometheus, graylog, security onion, maybe splunk or elk, etc.
None of the data will be critical. It would suck to redo everything but not really top prio (especially when considering cost)

so here is what I was thinking:

1 vdev with 3x4TB and a hot spare. This would hold vms, backups and long term aggregated data from the monitoring
1 vdev of 4x256GB (samsung evo 870). This would be for data ingest, with retention policy adequate to the space available)

I already have the 4x4TB HDDs. But the rest is completely open..

what do you guys think?

thanks a lot.
(and if you are in berlin, beer’s on me :)

z-lf · Jan 20, 2023

Oh and there is a nvme for the os. so that’s not part of the 8 bays.

z-lf · Jan 20, 2023

I also forgot the following:

32gb of ram
the ethernet links will upgrade to 2x2.5Gbe.
there are a between 15 and 20 devices that will send metrics/logs. (home assistant, router, switches, computers, laptops, other nas)
and around 50 services but Incan tweak the timings to fit the capacity.

NugentS · Jan 21, 2023

1. You should be using mirrors for VM Storage. Preferably SSD's. Mirrors = IOPS.
2. NFS write are sync writes which means your performance will be poor. Either set sync=disabled (at a data risk) or get a proper SLOG
3. Be prepared to want more memory

z-lf · Jan 21, 2023

@NugentS thanks a lot for the feedback.
would this setup make more sense:

4 ssd in raid10 with the nvme as slog.
a usb attached ssd for boot
4 HDD raidz1 + hotspare

the raid 10 for vms and ingest
the raidz1 for backup and archive.

what do you think?
my board is limited to 32gb ram. I maxed it out already. (Terramaster u8-423)

jgreco · Jan 21, 2023

You want to support TWO pools, one doing VM block storage, on 32GB? Yeesh.

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

NugentS · Jan 21, 2023

Yeah - he is pushing it a bit

What NVMe - a SLOG has specific requirements that an EVO 870 (if thats what you are thinking) will not achieve. Think Optane 900p or better (mirrored if this is a commercial device, or single if its just for play)

4 SSD's in Mirrors, striped across 3 vdevs will be a lot better for block storage than Z1. You will really want a proper SLOG or run sync=disabled which is not datasafe

32GB is likely to cause somewhat non-optimal performance - although the good news is that you are running on 1 Gb (ish), rather than 10 or 40. Reliability of the 2.5 - well you will find that out for yourself (ie YMMV)

I personally would prefer 64GB+ - but if 32 is your limit then so be it.

jgreco · Jan 21, 2023

NugentS said:
32GB is likely to cause somewhat non-optimal performance

I don't even know how you can trivialize it like that. With two pools, there will be contention for ARC, and ARC is incredibly important for block storage. You could think of it at "only 16GB ARC for the block storage" and probably not be that far off base.

Resource - Why iSCSI often requires more resources for the same result

iSCSI is a SAN protocol. NFS, CIFS, etc., are NAS protocols. For a NAS protocol, the client sends a command to the filer, such as "open this file", or "read ten blocks", or "remove this file." On the filer, the local NAS protocol daemon...

www.truenas.com

z-lf · Jan 21, 2023

@NugentS after your first message I was thinking:

slog -> m.2 optane (maybe not the 905p as it is too expansive, but maybe the 4801x ? and I can't dual because I only have 1 nvme slot.
4xEVO 870 for the raid10 (this pool is where the vm would run)
4x4tb HDD raidz1 (this pool is strictly for backup.

@jgreco I am new to zfs so I might have understood this wrong, I thought the rule of thumb for ram was 1gb of ram for 1TB data.
I don't intend this machine to grow passed 32 TB. (I don't think it will grow above 20)
Would you say maybe I should stick with the HDDs and make:
6x4tb HDDs raid10 and add slog, ssd cache, keep everything between 10% and 50% use and call it a day?

Also, this is for a homelab, it doesn't have to be perfect, I just want to avoid a stupid setup.

Thanks again for the feedback. It's much appreciated !

jgreco · Jan 21, 2023

z-lf said:
I thought the rule of thumb for ram was 1gb of ram for 1TB data.

Sure, for basic filesharing duties. However, as explained in the resources I've linked for you, iSCSI or NFS block storage is not "basic filesharing".

NugentS · Jan 21, 2023

1. 4801x would make a fine SLOG - actually better than the 905p. Less of it - but for a SLOG that doesn't matter as you won't need much

2. Samsung make good drives, better than most and these are not QVO drives which are basically landfill. They are however comsumer grade SSD's which do not as a general rule of thumb perform as well as advertised as generally they don't advertise random IO specs. There is a reason why business buys enterprise SSD's and its not because they love spending money. Having said that these are for home use and you won't (for the money) do better. Just don't expect miracles

3. Backup - err OK. Its not really a backup if its on the same machine but other than that OK.

Memory. Our resident Grinch is not wrong, but remember where he comes from and what he does for a living (other than moan at us for bad decisions, which he does so much you might think its his day job). I say however, he is right (please Mr Grinch don't hit me). Ideally you would be running the working load from RAM (ARC) with the rest read in where required. In your case that won't happen and you will put a lot of load onto the disks which are slow in comparison. When you run "backup" you will tend to evict useful data from ARC (depending on how the ARC actually works - magic) which will then have to be read back in. What you might try, as speed of backups is rarely important, is "zfs set cache=metadata backuppool" which should limit how much ARC is used by the backup pool, prioritising the NVMe pool. YMMV and I would benchmark and test first for acceptable performance before trying that or the even more severe "zfs set cache=none backuppool" which I think would make the backup pool run like a dead 3-legged dog thats been buried for a year or so.

Memory is likely to be your major issue here - and as you say - you can't do anything about it

z-lf · Jan 22, 2023

Thanks a lot guys. Truly appreciated, and I understand the grinch-ness, since you have these questions a lot.
I did read some white paper, and I thought I understood how things worked :D. But I totally did not see the RAM issue.

I am going back to my drawing board. The setup I want will be too expansive unfortunately.
And after this conversation, maybe I don't need a cluster for my homelab.

if I have 1 machine with all my services and it fails: everything fails.
if truenas fails in my cluster: everything fails.

@NugentS on your supermicro server, I see you have 2 pools, one with HDDs and one with SSDs.
Where would you store the data for graylog/prometheus/influxdb etc. Not the VM storage, just the data as NFS?
Would everything go onto the SSD and you archive to the HDDs?
I think maybe I need to go into that direction.

Thank you !

NugentS · Jan 22, 2023

As a general principle I store as little as possible on VM's. If a VM needs storage then it stores it, not in the VM, but on a shared dataset on the NAS. If something really needs low latency, high IOPS storage then I would put it on SSD's. (I also have a single SSD pool for scratch low latency data)
My Pools are:
AppPool - for storage of docker data and related data such as config folders. Also on Server VM ZVOL's. This is a pair of mixed mode enterprise SSD's
BigPool - Bulk Storage - but mirrors for increased IOPS, included special for small files on certain datasets, L2ARC (kinda redundant) & SLOG. All bulk data goes here.
ScratchSSD - used for transcoding folders and similar. Single SSD, no redundancy. Who cares if it breaks
SSDPool - Mixed mode enterprise SSD's for VM's for ESXi. Multiple mirrors for IOPS. Exists soley for supplying ESXi with storage. Added SLOG
[Note all the SSD's, with the exception of the SLOG's are second hand from ebay]. I have had one fail so far - total failure - disk no longer recognised.

My SSDPool is sperate from my App Pool because they are different sizes and I didn't think of adding them together as an unbalanced pool. If I was to have to rebuild, I would probably combine SSDPool & AppPool, stay with mirrors and just have slightly unbalanced data vdevs. More vdevs, more IOPS

The answer to your question is how do you use the data, and how much data. A database (what I believe you are talking about) if used efficiently is / or should be reasonably small so I would run it on the SSD's together with the container or VM as appropriate. If the db was large and inefficient I would store it on the HDD's, in its own dataset, via NFS or SMB tuned for small record sizes with access from whatever app variety was using the data. Actual logging data, which is then ingested into the database would probably be a dataset on the ScratchSSD to avoid write amplification on the SSD's

Important Announcement for the TrueNAS Community.

Homelab build proxmox/k8s nfs storage heavy write

z-lf

Cadet

z-lf

Cadet

z-lf

Cadet

NugentS

MVP

z-lf

Cadet

jgreco

Resident Grinch

The path to success for block storage

NugentS

MVP

jgreco

Resident Grinch

Resource - Why iSCSI often requires more resources for the same result

z-lf

Cadet

jgreco

Resident Grinch

NugentS

MVP

z-lf

Cadet

NugentS

MVP

Similar threads