Drive Layout - HPE ProLiant Microserver Gen8

Pavlinchen

Dabbler
Joined
May 22, 2022
Messages
14
Hi all,
first of all: thank you for your time reading.
I just bought a HPE ProLiant Microserver Gen8 and I am planning to use it as my main NAS.
I'm planning to run the following storage pools:
  1. Boot (duh) - preferably redundant
  2. Hard Drives for low cost storage - mirrored (2 drives) or RaidZ (3 drives)
  3. SSD-only storage for low latency storage - mirrored (2 drives)

I only have the following ports available:
  • 4x SATA (up to 3.5")
  • X16 PCIe 3.0 - bifurcation not supported
  • 1x internal SATA - not bootable
  • 1x internal USB 2
I coudln't find an ideal layout without any OBVIOUS bottlenecks.
Using a HBA is not an option due to lack of space in the case (at max I could fit 1 2.5" drive internally).

Any ideas or suggestions?
Maybe even with future expansion or ZIL/L2ARC in mind?

TIA and best,
Paul
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Boot: USB-to-SATA and a 2.5" SSD, or USB-to-M.2 - if 2.5" SSD, use double-sided tape or Velcro and stick it wherever it fits.
HDD: RAIDZ1 is somewhat discouraged for larger drives due to potential stress on rebuild, but might be fine here if you only plan to store somewhat-replaceable files ("Linux ISOs" I believe the kids call them) - if you put more valuable data there, perhaps replicate it to a cloud provider like Backblaze?
SSD: Mirrors are fine.

PCIe slot isn't really necessary for your use case, but I suppose you could use one NVMe and one SATA SSD in a mirrored setup and do a 4-drive RAIDZ2 for more capacity/space.
 

Pavlinchen

Dabbler
Joined
May 22, 2022
Messages
14
First of all: love your name! :)
Boot: USB-to-SATA and a 2.5" SSD, or USB-to-M.2 - if 2.5" SSD, use double-sided tape or Velcro and stick it wherever it fits.
Yeah, about that: wouldn't both of that options hold back my SSD?

HDD: RAIDZ1 is somewhat discouraged for larger drives due to potential stress on rebuild
If I understand that correctly: You recommend getting either more small drives (just out of curiosity: which layout would you recommend for 4x 8TB NAS drives?) or mirroring 2 drives?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
First of all: love your name! :)

I'm slightly less vicious than the name suggests, but I like the directness it implies.

Yeah, about that: wouldn't both of that options hold back my SSD?

This is for your boot pool only, you won't be serving actual I/O off of them. The 40MB/s of USB 2.0 will be more than enough - I'm just suggesting an inexpensive SSD (look for a used Intel 320 or DC S3500 perhaps?) as it will be way more robust than a USB thumbdrive.

If I understand that correctly: You recommend getting either more small drives (just out of curiosity: which layout would you recommend for 4x 8TB NAS drives?) or mirroring 2 drives?

The risk with RAIDZ1 on large drives is that in a failure scenario, another second read failure causes data loss because it's only single-parity. You face the same risk with mirrors to a degree, but a mirror is reliant only on one other drive rebuilding the data without failure.

I'm suggesting either use four drives in a Z2 (if you're risk-averse) or use three drives in a Z1 and copy your data elsewhere (which you should anyways, because "RAID is not a backup" and accidentally deleting a file isn't prevented by it)

So in summary:

4x8TB in RAIDZ2 for the bulk data on the main SATA ports in the 3.5" bays
1x SATA SSD in the one 2.5" bay
1x NVMe SSD on a PCIe adaptor card (it can be a slow-ish SSD, it only has to be as fast as the SATA)
1x USB-to-SATA adaptor with a cheap Intel DC SSD for booting

What are you intending to use the system for? Mostly "what are the SSDs for?"
 

Pavlinchen

Dabbler
Joined
May 22, 2022
Messages
14
a mirror is reliant only on one other drive rebuilding the data without failure
Ah, never thought of it like that!

This is for your boot pool only, you won't be serving actual I/O off of them.
Fair point. I can wait for my Logs to load for a second longer :D

what are the SSDs for?
A lot of small files requested by (for now) 3-5 K8s workers.
So in summary [...]
Cool, thank you very much for your time!
 

Pavlinchen

Dabbler
Joined
May 22, 2022
Messages
14
PS: I asked the same question over on reddit and got a comment about ZFS caching.
Do you think, that would be an alternative to using a dedicated pool for that?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
PS: I asked the same question over on reddit and got a comment about ZFS caching.
Do you think, that would be an alternative to using a dedicated pool for that?
ZFS will always be caching things just based on how it operates. If your K8s workers are frequently hitting the small files, they will wind up having a very high MFU (most frequently used) score and might be kept there; it really depends on how much you expect to be actively reading if it will exceed the 16GB of RAM (which isn't all dedicated to cache, you lose some to system services and other metadata.)

How big is the dataset that you intend to serve via NFS to your K8s workers? This could potentially be a good match for a small L2ARC setup, with only the "low latency" dataset allowed to use it.

Writes though will depend on if your workers are sending COMMIT operations over NFS (or mount the NFS export synchronously, turning every write into a sync write, as ESXi does) - if they do that, you'll probably want some manner of SLOG device or to use a pair of SSDs as pool devices.
 

blanchet

Guru
Joined
Apr 17, 2018
Messages
511
If it may help

- I have a HP MicroServer Gen8 since 2017.
- I boot from a SATA Kingston SSD + a USB3-to-SATA adapter plugged on the internal USB port
- The SSD boot disk is located in the DVD drive slot
- I use 2 independant pools. Each of them is a mirror of spinning disks
- For VMs I use a tiny silent PC (Shuttle DS10u7) with a single SATA SSD and I automatically backup the data to the HP Microserver Gen8.

This solution is very flexible: I can change the harddisks, the storage server or the hypervisor without touching the other devices.
 

Pavlinchen

Dabbler
Joined
May 22, 2022
Messages
14
How big is the dataset that you intend to serve via NFS to your K8s workers?
I can't really say yet, but I'd like to futureproof anyways ;)

This could potentially be a good match for a small L2ARC setup
Yes, sorry. That was what I meant but didn't write.
My issue is, that I can't really find anything suited for that (new or used) for a decent (<100€) pricetag, is there a go to in the community?

with only the "low latency" dataset allowed to use it.
Why would I want to limit my cache to a specific dataset?

Writes though will depend on if your workers are sending COMMIT operations over NFS (or mount the NFS export synchronously, turning every write into a sync write
Honestly: I have no clue.
This is my first time playing around with K8s. How can I test, which is true?

or to use a pair of SSDs as pool devices
Which difference is there exactly (for my usecase) with using 2 SSDs as a pool vs using a cache drive?
 

Pavlinchen

Dabbler
Joined
May 22, 2022
Messages
14
If it may help

- I have a HP MicroServer Gen8 since 2017.
- I boot from a SATA Kingston SSD + a USB3-to-SATA adapter plugged on the internal USB port
- The SSD boot disk is located in the DVD drive slot
- I use 2 independant pools. Each of them is a mirror of spinning disks
- For VMs I use a tiny silent PC (Shuttle DS10u7) with a single SATA SSD and I automatically backup the data to the HP Microserver Gen8.

This solution is very flexible: I can change the harddisks, the storage server or the hypervisor without touching the other devices.
Thank you for sharing!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
I can't really say yet, but I'd like to futureproof anyways ;)

An estimate is fine, but the general idea is "would it be small enough to fit into the L2ARC device?"

(There's more to it than this, see the final callout re: record headers stealing from primary RAM.)

Yes, sorry. That was what I meant but didn't write.
My issue is, that I can't really find anything suited for that (new or used) for a decent (<100€) pricetag, is there a go to in the community?

What country are you shopping from? Any consumer SSD will function as L2ARC, and for 100 Euros you should definitely be able to find something to fit the bill.

Why would I want to limit my cache to a specific dataset?

Limiting only the secondary cache to it will prevent the primary "large file" cache from trying to push the smaller files out of L2ARC. L2ARC is a very "dumb" level of cache and has a simple first-in-first-out ring buffer, so it's best to make the job as easy as possible for it.

Honestly: I have no clue.
This is my first time playing around with K8s. How can I test, which is true?

I suppose it will depend on the platform you're using for the K8s host, but I would suspect that if you're using separate machines for this it will want to write synchronously to the shared NFS export in order to have it be consistent. Might mean a bit of slowness for updating configs.

Which difference is there exactly (for my usecase) with using 2 SSDs as a pool vs using a cache drive?

A mirror of 2 SSDs (or a single, see below) means no need to worry about what will/won't be cached for speedy reads, you're going to guarantee at least SSD-speed for anything living on it for both reads and writes. With a cache drive (L2ARC) it is only a read cache - writes are subject to the question of "are they being requested as synchronous" meaning that the client system is waiting for the NFS server to confirm "yes, I wrote this to a permanent location on disk" rather than just holding it in RAM. Any new writes also immediately invalidate the old data in L2ARC - it doesn't "write back" to the SSD cache, so it would need to be staged into it again.

L2ARC also requires headers in main memory to store the location of your data on the SSD. For 100G of large records (files 64KB or larger) it's relatively trivial at 125MB (100G device / 64K record * 80 bytes per header) but if those 100G are of smaller 8K files, that same 100G costs you 1000MB (100G device / 8K record * 80 bytes per header) of your main RAM to index it. That may be a worthwhile tradeoff if you have that much "small file" data that's valuable to be read at SSD speed, but ultimately it's about understanding the workload. And again, L2ARC doesn't help for writes.

But there's another option:

- For VMs I use a tiny silent PC (Shuttle DS10u7) with a single SATA SSD and I automatically backup the data to the HP Microserver Gen8.

This can work within the system itself - a single SATA (or NVMe) SSD can hold the K8s data, and then you have a job running as frequently as you like (and is technically viable) to snapshot that dataset and replicate it to your main spinning-disk pool. Depends on your tolerance for data loss - if you're snapping once an hour, you lose an hour at most.
 

Pavlinchen

Dabbler
Joined
May 22, 2022
Messages
14
would it be small enough to fit into the L2ARC device?
Yes. I am pretty certain, that the desired files will not exceed 200GB

Any consumer SSD will function
Well technically yes, but won't it burn through it like crazy?

if you're using separate machines for this it will want to write synchronously to the shared NFS export
Is that a setting I have to toggle or how does that work?

Might mean a bit of slowness for updating configs.
That's fine for me. There are only like 2 services I want to run as HA anyways :)

it will prevent the primary "large file" cache from trying to push the smaller files out of L2ARC
Okay, so it was the most obvious answer :D
That shouldn't be too much of a problem but I will have it in mind, thanks!

you're going to guarantee at least SSD-speed for anything living on it for both reads and writes
Well yes, also kinda obvious, now that you said it.

Any new writes also immediately invalidate the old data in L2ARC
Yes, that was something I was thinking about. Does that mean my Cache will serve "old" data or does it actually invalidate it? If so, how does it do that? Does it check, if there's a copy in the cache after the write is done?

And again, L2ARC doesn't help for writes
Worst case I'll just get an ZIL drive, no?

But there's another option:
I've also thought about that but I don't think, that's gonna be my way to go, just for ease of use.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Yes. I am pretty certain, that the desired files will not exceed 200GB

Indexing 200G of 8K records will cost you about 2G of your RAM. Or you can use a pair of SSDs and never have to index anything.

Well technically yes, but won't (L2ARC use) burn through (a consumer SSD) like crazy?

It all depends on how quickly you are "churning" data through the L2ARC, ie: it being pushed out by newer data or writes that invalidate the old data. But you can easily get a 240GB Intel DC S3500/S3510/S3520 (used) for well under that cost threshold.

Is that a setting I have to toggle or how does that work?

It would be on each of the K8s hosts, somewhere in the config. NFSv3 by default might be mapping things asynchronously. I don't really have a lot of experience with containers.

That's fine for me. There are only like 2 services I want to run as HA anyways :)

You could also flip the dataset to sync=disabled and it will ignore any requests from the client for the sake of speed, but at the cost of potentially losing the last few seconds of writes if the system crashes.

Okay, so it was the most obvious answer :D
That shouldn't be too much of a problem but I will have it in mind, thanks!

If you do go with an L2ARC I'd suggest that you set the K8s dataset as having secondarycache=all and your "bulk" data as secondarycache=metadata which will let it cache the filesystem metadata for speed but not the actual bulk of it.

Well yes, also kinda obvious, now that you said it.

:)

Yes, that was something I was thinking about. Does that mean my Cache will serve "old" data or does it actually invalidate it? If so, how does it do that? Does it check, if there's a copy in the cache after the write is done?

If a file exists in the L2ARC and it's overwritten with a new one, the associated records are marked as "stale" by the ZFS filesystem. You'll never be served an "old copy."

Worst case I'll just get an ZIL drive, no?

Yes, you could use a small Optane device (16G/32G) in the NVMe slot as your SLOG, and a larger SATA SSD for L2ARC.

I've also thought about that but I don't think, that's gonna be my way to go, just for ease of use.

Just thought I'd mention that, it's been employed by a few users here.
 

Pavlinchen

Dabbler
Joined
May 22, 2022
Messages
14
Or you can use a pair of SSDs and never have to index anything.
Understood, fair point.

But you can easily get a 240GB Intel DC S3500/S3510/S3520 (used) for well under that cost threshold.
You are right, I was just looking for the wrong things ;)

It would be on each of the K8s hosts
Alright, thank you!

You could also flip the dataset to sync=disabled
I think, I'd rather not do that for data consistency/integrity reasons, but thanks for pointing it out

I'd suggest that you set the K8s dataset as having secondarycache=all and your "bulk" data as secondarycache=metadata
If I understand correctly, this metadata is mainly just a pointer to where the file is on my drives, correct?
If so, how would that improve performance?
Also: Can I just give the (L2)ARC a filesize-limit?
Eg: It won't cache any files bigger than 10 MB?

the associated records are marked as "stale" by the ZFS filesystem.
I couldn't imagine ZFS handing out old copies, but I was unsure how it works.
Thanks for explaining :)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
If I understand correctly, this metadata is mainly just a pointer to where the file is on my drives, correct?
If so, how would that improve performance?
Also: Can I just give the (L2)ARC a filesize-limit?
Eg: It won't cache any files bigger than 10 MB?

There isn't a way to tell L2ARC not to cache a file based on size, but the earlier command of secondarycache=metadata has the effect of telling it not to use it at the entire dataset level (eg: if you put all of your videos and large media files into the Plex dataset, you can tell ZFS never to mark them as eligible for placement in L2ARC)

By default, ZFS will consider everything as a valid record for L2ARC (secondarycache=all) so you're just downgrading the recommendation on your large media fileset. It's not worth going all the way to secondarycache=none though, as metadata is both very small and very valuable for making things like directory listings and folder browsing feel responsive.
 
Top