Drive Layout - HPE ProLiant Microserver Gen8

Pavlinchen · Jul 21, 2022

Hi all,
first of all: thank you for your time reading.
I just bought a HPE ProLiant Microserver Gen8 and I am planning to use it as my main NAS.
I'm planning to run the following storage pools:

Boot (duh) - preferably redundant
Hard Drives for low cost storage - mirrored (2 drives) or RaidZ (3 drives)
SSD-only storage for low latency storage - mirrored (2 drives)

I only have the following ports available:

4x SATA (up to 3.5")
X16 PCIe 3.0 - bifurcation not supported
1x internal SATA - not bootable
1x internal USB 2

I coudln't find an ideal layout without any OBVIOUS bottlenecks.
Using a HBA is not an option due to lack of space in the case (at max I could fit 1 2.5" drive internally).

Any ideas or suggestions?
Maybe even with future expansion or ZIL/L2ARC in mind?

TIA and best,
Paul

HoneyBadger · Jul 21, 2022

Boot: USB-to-SATA and a 2.5" SSD, or USB-to-M.2 - if 2.5" SSD, use double-sided tape or Velcro and stick it wherever it fits.
HDD: RAIDZ1 is somewhat discouraged for larger drives due to potential stress on rebuild, but might be fine here if you only plan to store somewhat-replaceable files ("Linux ISOs" I believe the kids call them) - if you put more valuable data there, perhaps replicate it to a cloud provider like Backblaze?
SSD: Mirrors are fine.

PCIe slot isn't really necessary for your use case, but I suppose you could use one NVMe and one SATA SSD in a mirrored setup and do a 4-drive RAIDZ2 for more capacity/space.

Pavlinchen · Jul 21, 2022

First of all: love your name! :)

HoneyBadger said:
Boot: USB-to-SATA and a 2.5" SSD, or USB-to-M.2 - if 2.5" SSD, use double-sided tape or Velcro and stick it wherever it fits.

Yeah, about that: wouldn't both of that options hold back my SSD?

HDD: RAIDZ1 is somewhat discouraged for larger drives due to potential stress on rebuild

If I understand that correctly: You recommend getting either more small drives (just out of curiosity: which layout would you recommend for 4x 8TB NAS drives?) or mirroring 2 drives?

HoneyBadger · Jul 21, 2022

Pavlinchen said:
First of all: love your name! :)

I'm slightly less vicious than the name suggests, but I like the directness it implies.

Pavlinchen said:
Yeah, about that: wouldn't both of that options hold back my SSD?

This is for your boot pool only, you won't be serving actual I/O off of them. The 40MB/s of USB 2.0 will be more than enough - I'm just suggesting an inexpensive SSD (look for a used Intel 320 or DC S3500 perhaps?) as it will be way more robust than a USB thumbdrive.

Pavlinchen said:
If I understand that correctly: You recommend getting either more small drives (just out of curiosity: which layout would you recommend for 4x 8TB NAS drives?) or mirroring 2 drives?

The risk with RAIDZ1 on large drives is that in a failure scenario, another second read failure causes data loss because it's only single-parity. You face the same risk with mirrors to a degree, but a mirror is reliant only on one other drive rebuilding the data without failure.

I'm suggesting either use four drives in a Z2 (if you're risk-averse) or use three drives in a Z1 and copy your data elsewhere (which you should anyways, because "RAID is not a backup" and accidentally deleting a file isn't prevented by it)

So in summary:

4x8TB in RAIDZ2 for the bulk data on the main SATA ports in the 3.5" bays
1x SATA SSD in the one 2.5" bay
1x NVMe SSD on a PCIe adaptor card (it can be a slow-ish SSD, it only has to be as fast as the SATA)
1x USB-to-SATA adaptor with a cheap Intel DC SSD for booting

What are you intending to use the system for? Mostly "what are the SSDs for?"

Pavlinchen · Jul 21, 2022

a mirror is reliant only on one other drive rebuilding the data without failure

Ah, never thought of it like that!

This is for your boot pool only, you won't be serving actual I/O off of them.

Fair point. I can wait for my Logs to load for a second longer :D

what are the SSDs for?

A lot of small files requested by (for now) 3-5 K8s workers.

So in summary [...]

Cool, thank you very much for your time!

Pavlinchen · Jul 21, 2022

PS: I asked the same question over on reddit and got a comment about ZFS caching.
Do you think, that would be an alternative to using a dedicated pool for that?

HoneyBadger · Jul 21, 2022

Pavlinchen said:
PS: I asked the same question over on reddit and got a comment about ZFS caching.
Do you think, that would be an alternative to using a dedicated pool for that?

ZFS will always be caching things just based on how it operates. If your K8s workers are frequently hitting the small files, they will wind up having a very high MFU (most frequently used) score and might be kept there; it really depends on how much you expect to be actively reading if it will exceed the 16GB of RAM (which isn't all dedicated to cache, you lose some to system services and other metadata.)

How big is the dataset that you intend to serve via NFS to your K8s workers? This could potentially be a good match for a small L2ARC setup, with only the "low latency" dataset allowed to use it.

Writes though will depend on if your workers are sending COMMIT operations over NFS (or mount the NFS export synchronously, turning every write into a sync write, as ESXi does) - if they do that, you'll probably want some manner of SLOG device or to use a pair of SSDs as pool devices.

blanchet · Jul 21, 2022

If it may help

- I have a HP MicroServer Gen8 since 2017.
- I boot from a SATA Kingston SSD + a USB3-to-SATA adapter plugged on the internal USB port
- The SSD boot disk is located in the DVD drive slot
- I use 2 independant pools. Each of them is a mirror of spinning disks
- For VMs I use a tiny silent PC (Shuttle DS10u7) with a single SATA SSD and I automatically backup the data to the HP Microserver Gen8.

This solution is very flexible: I can change the harddisks, the storage server or the hypervisor without touching the other devices.

Pavlinchen · Jul 22, 2022

How big is the dataset that you intend to serve via NFS to your K8s workers?

I can't really say yet, but I'd like to futureproof anyways ;)

This could potentially be a good match for a small L2ARC setup

Yes, sorry. That was what I meant but didn't write.
My issue is, that I can't really find anything suited for that (new or used) for a decent (<100€) pricetag, is there a go to in the community?

with only the "low latency" dataset allowed to use it.

Why would I want to limit my cache to a specific dataset?

Writes though will depend on if your workers are sending COMMIT operations over NFS (or mount the NFS export synchronously, turning every write into a sync write

Honestly: I have no clue.
This is my first time playing around with K8s. How can I test, which is true?

or to use a pair of SSDs as pool devices

Which difference is there exactly (for my usecase) with using 2 SSDs as a pool vs using a cache drive?

Pavlinchen · Jul 22, 2022

blanchet said:
If it may help

- I have a HP MicroServer Gen8 since 2017.
- I boot from a SATA Kingston SSD + a USB3-to-SATA adapter plugged on the internal USB port
- The SSD boot disk is located in the DVD drive slot
- I use 2 independant pools. Each of them is a mirror of spinning disks
- For VMs I use a tiny silent PC (Shuttle DS10u7) with a single SATA SSD and I automatically backup the data to the HP Microserver Gen8.

This solution is very flexible: I can change the harddisks, the storage server or the hypervisor without touching the other devices.

Thank you for sharing!

HoneyBadger · Jul 22, 2022

Pavlinchen said:
I can't really say yet, but I'd like to futureproof anyways ;)

An estimate is fine, but the general idea is "would it be small enough to fit into the L2ARC device?"

(There's more to it than this, see the final callout re: record headers stealing from primary RAM.)

Pavlinchen said:
Yes, sorry. That was what I meant but didn't write.
My issue is, that I can't really find anything suited for that (new or used) for a decent (<100€) pricetag, is there a go to in the community?

What country are you shopping from? Any consumer SSD will function as L2ARC, and for 100 Euros you should definitely be able to find something to fit the bill.

Pavlinchen said:
Why would I want to limit my cache to a specific dataset?

Limiting only the secondary cache to it will prevent the primary "large file" cache from trying to push the smaller files out of L2ARC. L2ARC is a very "dumb" level of cache and has a simple first-in-first-out ring buffer, so it's best to make the job as easy as possible for it.

Pavlinchen said:
Honestly: I have no clue.
This is my first time playing around with K8s. How can I test, which is true?

I suppose it will depend on the platform you're using for the K8s host, but I would suspect that if you're using separate machines for this it will want to write synchronously to the shared NFS export in order to have it be consistent. Might mean a bit of slowness for updating configs.

Pavlinchen said:
Which difference is there exactly (for my usecase) with using 2 SSDs as a pool vs using a cache drive?

A mirror of 2 SSDs (or a single, see below) means no need to worry about what will/won't be cached for speedy reads, you're going to guarantee at least SSD-speed for anything living on it for both reads and writes. With a cache drive (L2ARC) it is only a read cache - writes are subject to the question of "are they being requested as synchronous" meaning that the client system is waiting for the NFS server to confirm "yes, I wrote this to a permanent location on disk" rather than just holding it in RAM. Any new writes also immediately invalidate the old data in L2ARC - it doesn't "write back" to the SSD cache, so it would need to be staged into it again.

L2ARC also requires headers in main memory to store the location of your data on the SSD. For 100G of large records (files 64KB or larger) it's relatively trivial at 125MB (100G device / 64K record * 80 bytes per header) but if those 100G are of smaller 8K files, that same 100G costs you 1000MB (100G device / 8K record * 80 bytes per header) of your main RAM to index it. That may be a worthwhile tradeoff if you have that much "small file" data that's valuable to be read at SSD speed, but ultimately it's about understanding the workload. And again, L2ARC doesn't help for writes.

But there's another option:

blanchet said:
- For VMs I use a tiny silent PC (Shuttle DS10u7) with a single SATA SSD and I automatically backup the data to the HP Microserver Gen8.

This can work within the system itself - a single SATA (or NVMe) SSD can hold the K8s data, and then you have a job running as frequently as you like (and is technically viable) to snapshot that dataset and replicate it to your main spinning-disk pool. Depends on your tolerance for data loss - if you're snapping once an hour, you lose an hour at most.

Pavlinchen · Jul 22, 2022

would it be small enough to fit into the L2ARC device?

Yes. I am pretty certain, that the desired files will not exceed 200GB

Any consumer SSD will function

Well technically yes, but won't it burn through it like crazy?

if you're using separate machines for this it will want to write synchronously to the shared NFS export

Is that a setting I have to toggle or how does that work?

Might mean a bit of slowness for updating configs.

That's fine for me. There are only like 2 services I want to run as HA anyways :)

it will prevent the primary "large file" cache from trying to push the smaller files out of L2ARC

Okay, so it was the most obvious answer :D
That shouldn't be too much of a problem but I will have it in mind, thanks!

you're going to guarantee at least SSD-speed for anything living on it for both reads and writes

Well yes, also kinda obvious, now that you said it.

Any new writes also immediately invalidate the old data in L2ARC

Yes, that was something I was thinking about. Does that mean my Cache will serve "old" data or does it actually invalidate it? If so, how does it do that? Does it check, if there's a copy in the cache after the write is done?

And again, L2ARC doesn't help for writes

Worst case I'll just get an ZIL drive, no?

But there's another option:

I've also thought about that but I don't think, that's gonna be my way to go, just for ease of use.

HoneyBadger · Jul 22, 2022

Pavlinchen said:
Yes. I am pretty certain, that the desired files will not exceed 200GB

Indexing 200G of 8K records will cost you about 2G of your RAM. Or you can use a pair of SSDs and never have to index anything.

Pavlinchen said:
Well technically yes, but won't (L2ARC use) burn through (a consumer SSD) like crazy?

It all depends on how quickly you are "churning" data through the L2ARC, ie: it being pushed out by newer data or writes that invalidate the old data. But you can easily get a 240GB Intel DC S3500/S3510/S3520 (used) for well under that cost threshold.

Pavlinchen said:
Is that a setting I have to toggle or how does that work?

It would be on each of the K8s hosts, somewhere in the config. NFSv3 by default might be mapping things asynchronously. I don't really have a lot of experience with containers.

Pavlinchen said:
That's fine for me. There are only like 2 services I want to run as HA anyways :)

You could also flip the dataset to sync=disabled and it will ignore any requests from the client for the sake of speed, but at the cost of potentially losing the last few seconds of writes if the system crashes.

Pavlinchen said:
Okay, so it was the most obvious answer :D
That shouldn't be too much of a problem but I will have it in mind, thanks!

If you do go with an L2ARC I'd suggest that you set the K8s dataset as having secondarycache=all and your "bulk" data as secondarycache=metadata which will let it cache the filesystem metadata for speed but not the actual bulk of it.

Pavlinchen said:
Well yes, also kinda obvious, now that you said it.

:)

Pavlinchen said:
Yes, that was something I was thinking about. Does that mean my Cache will serve "old" data or does it actually invalidate it? If so, how does it do that? Does it check, if there's a copy in the cache after the write is done?

If a file exists in the L2ARC and it's overwritten with a new one, the associated records are marked as "stale" by the ZFS filesystem. You'll never be served an "old copy."

Pavlinchen said:
Worst case I'll just get an ZIL drive, no?

Yes, you could use a small Optane device (16G/32G) in the NVMe slot as your SLOG, and a larger SATA SSD for L2ARC.

Pavlinchen said:
I've also thought about that but I don't think, that's gonna be my way to go, just for ease of use.

Just thought I'd mention that, it's been employed by a few users here.

Pavlinchen · Jul 23, 2022

HoneyBadger said:
Or you can use a pair of SSDs and never have to index anything.

Understood, fair point.

HoneyBadger said:
But you can easily get a 240GB Intel DC S3500/S3510/S3520 (used) for well under that cost threshold.

You are right, I was just looking for the wrong things ;)

HoneyBadger said:
It would be on each of the K8s hosts

Alright, thank you!

HoneyBadger said:
You could also flip the dataset to sync=disabled

I think, I'd rather not do that for data consistency/integrity reasons, but thanks for pointing it out

HoneyBadger said:
I'd suggest that you set the K8s dataset as having secondarycache=all and your "bulk" data as secondarycache=metadata

If I understand correctly, this metadata is mainly just a pointer to where the file is on my drives, correct?
If so, how would that improve performance?
Also: Can I just give the (L2)ARC a filesize-limit?
Eg: It won't cache any files bigger than 10 MB?

HoneyBadger said:
the associated records are marked as "stale" by the ZFS filesystem.

I couldn't imagine ZFS handing out old copies, but I was unsure how it works.
Thanks for explaining :)

HoneyBadger · Aug 9, 2022

Pavlinchen said:
If I understand correctly, this metadata is mainly just a pointer to where the file is on my drives, correct?
If so, how would that improve performance?
Also: Can I just give the (L2)ARC a filesize-limit?
Eg: It won't cache any files bigger than 10 MB?

There isn't a way to tell L2ARC not to cache a file based on size, but the earlier command of secondarycache=metadata has the effect of telling it not to use it at the entire dataset level (eg: if you put all of your videos and large media files into the Plex dataset, you can tell ZFS never to mark them as eligible for placement in L2ARC)

By default, ZFS will consider everything as a valid record for L2ARC (secondarycache=all) so you're just downgrading the recommendation on your large media fileset. It's not worth going all the way to secondarycache=none though, as metadata is both very small and very valuable for making things like directory listings and folder browsing feel responsive.

Important Announcement for the TrueNAS Community.

Drive Layout - HPE ProLiant Microserver Gen8

Pavlinchen

Dabbler

HoneyBadger

actually does care

Pavlinchen

Dabbler

HoneyBadger

actually does care

Pavlinchen

Dabbler

Pavlinchen

Dabbler

HoneyBadger

actually does care

blanchet

Guru

Pavlinchen

Dabbler

Pavlinchen

Dabbler

HoneyBadger

actually does care

Pavlinchen

Dabbler

HoneyBadger

actually does care

Pavlinchen

Dabbler

HoneyBadger

actually does care

Similar threads

Important Announcement for the TrueNAS Community.

Drive Layout - HPE ProLiant Microserver Gen8

Dabbler

actually does care

Dabbler

actually does care

Dabbler

Dabbler

actually does care

Guru

Dabbler

Dabbler

actually does care

Dabbler

actually does care

Dabbler

actually does care

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Drive Layout - HPE ProLiant Microserver Gen8"

Similar threads