HPE Apollo 4510 gen 10

HellKern

Cadet
Joined
Sep 24, 2021
Messages
7
Hello community,
I would like to proceed with TrueNAS-powered NAS build based on HPE Apollo 4510 gen10.
Vendor will send me a system with next config:
  • Xeon-Silver 4214R
  • 4x16GB Single Rank x4 DDR4-2933 CAS-21-21-21 Registered RAM sticks
  • HPE Ethernet 10Gb 2-port FLR-T BCM57416 network adapter
  • HPE Smart Array P408i-a SR Gen10 drive controller
  • 4x800W power supply
  • 2x240GB SATA SSD installed in server
  • 60xLFF(3.5) drive bays in chassis
Then i plan to partially fill this system with 18TB WD HC550 SAS drives and create 2-4 RAID-Z2 arrays of 6-12 drives
Also i think to add LSI 9305-16i or 9300-8i controller and 2-4 SFF-8643 > SFF-8087 cables to connect drive bays to LSI HBA controller instead of HPE controller to ensure that there will be no issues with "not pure HBA" in HPE controller and using non-HPE drives with HPE controller

NAS will be used as storage for CCTV system with around 250 4-8MP cameras, connected to CCTV server via iSCSI, ext4 filesystem on top of ZFS array, 20gbit/s connectivity between NAS and CCTV server.

So i have a several questions to community:
  • Does 64GB RAM is enough for this purpose? I know about rule 1GB RAM for 1TB of storage for ARC, but actually in this application i will have 94% write load, 4% read load from random places and 1% read load for quite small(~3GB) database so nothing useful to put in ARC. Also current FreeNAS build i have for same purpose have 8GB RAM and 20TB storage and working very well
  • Does BCM57416 network adapter going to surprise me with any kind of bugs/glitches? i heard that FreeBSD driver for this adapter is not perfect, but mostly in terms of maximum performance - i don't think that more than 3-5Gbit/s peak load will occure in this system in next decade
  • If i understand correctly 2x240GB SSD drives in server are controlled by HPE P408i-a - are there any probable issues installing FreeNAS to drives connected to this controller? good scenario for me is to create RAID1 on a controller level and install TrueNAS on this array
  • Any other probable issues i could run into?
Thanks for your time =)
 

HellKern

Cadet
Joined
Sep 24, 2021
Messages
7
Well, hardware ordered, will try to not forget to post an update here when it will arrive
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,599
...
Then i plan to partially fill this system with 18TB WD HC550 SAS drives and create 2-4 RAID-Z2 arrays of 6-12 drives
...
NAS will be used as storage for CCTV system with around 250 4-8MP cameras, connected to CCTV server via iSCSI, ext4 filesystem on top of ZFS array, 20gbit/s connectivity between NAS and CCTV server.
...
I am not sure you will get what you are expecting using EXT4 file system, on an iSCSI LUN, stored on ZFS RAID-Z2 vDevs.

Their are design considerations with block storage on ZFS, especially RAID-Zx. I don't have all the details but they involve wasted space and lower performance.

From the implication, your CCTV server is running Linux, (since it is able to use EXT4 file systems).

If the performance or space issues hit you, you might look to see if NFS would work better. Meaning ZFS RAID-Z2 to NFS to CCTV server. However, I don't know if you can get both 10Giga-bit Ethernet interfaces working between the 2, in anything other than fail-over mode. Using iSCSI, you can set the network interfaces on 2 different sub-nets and iSCSI will automatically load share across them. (If I understand that correctly...)
 

HellKern

Cadet
Joined
Sep 24, 2021
Messages
7
I am not sure you will get what you are expecting using EXT4 file system, on an iSCSI LUN, stored on ZFS RAID-Z2 vDevs.

Their are design considerations with block storage on ZFS, especially RAID-Zx. I don't have all the details but they involve wasted space and lower performance.

From the implication, your CCTV server is running Linux, (since it is able to use EXT4 file systems).

If the performance or space issues hit you, you might look to see if NFS would work better. Meaning ZFS RAID-Z2 to NFS to CCTV server. However, I don't know if you can get both 10Giga-bit Ethernet interfaces working between the 2, in anything other than fail-over mode. Using iSCSI, you can set the network interfaces on 2 different sub-nets and iSCSI will automatically load share across them. (If I understand that correctly...)
Thank You for your thoughts!

About ext4 on top of ZFS on top of raidz2 via iSCSI - i'am totally agreed that it's not a perfect solution, but when i've tested this particular CCTV solution with storage connected via NFS it was performing really poorly in comparation with ext4 over thick-provisioned iSCSI, believe it's somehow related how CCTV managing archive I/O, but anyway if there will be some spare time - i will play with NFS and compare performance.

Not sure that i understand part about wasted space - it's somehow wasted because of ext4 or because of raidz?
Think i am ok with space - calculated amount of drives(4x8x18TB + 1x18TB hot spare) with consideration of GB<>GiB, padding, parity, 20% free space to copy-on-write be able to work fast and +30% just in case

10Gig is actually good enough for now - anyway currently i have only 2x1Gig links from serverroom to cameras and most of users, but at least in FreeNAS there are options to configure several link aggregation modes including LACP and RoundRobin, also iSCSI allows multipath(using it in current NAS) - in my case it works as some kind of aggregation as i have four LUNs, and it doesn't require different subnets
Screenshot 2021-11-18 212959.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
  • HPE Ethernet 10Gb 2-port FLR-T BCM57416 network adapter
  • HPE Smart Array P408i-a SR Gen10 drive controller

Well, the RAID card is definitely a no-no.


The Broadcom 10G ethernets may or may not be a no-no, it'll either work or it won't.

 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,943
@HellKern With NFS what happens if you try sync=disabled. If that speeds the NFS performance up then I suggest a SLOG and set sync=always.
 

HellKern

Cadet
Joined
Sep 24, 2021
Messages
7
@jgreco Thanks for reply! RAID card will be configured as much HBA-mode as it could and it will be used only for two mirrored system drives, not for data ones, believe i have no other options here as blade node installed in Apollo system could use only HPE proprietary-type disk controller(it just don't have any pci-e slots inside blade), for data drives i will use one or two LSI 9300-8i controllers.
About 10GbaseT Broadcom NIC - it's supported in FreeBSD by bnxt driver, so hopefully it will work good, anyway it's already ordered and i can't replace it in order. I have some spare budget on this system so in case Broadcom will not work or will work not as expected i will be able to replace it

@NugentS Thank You! Will try this hopefully. SLOG is definitely out of scope here as couple of SSD/Persistent memory drives that could handle predicted write load is way out of budget for this system
 

HellKern

Cadet
Joined
Sep 24, 2021
Messages
7
Well, everything turned around and i had to use another hardware for this project, sticked up with AIC SB406-PV barebone - it's pretty same as HPE Apollo technically (but not same support-wise). Everything seems to work good on this system - installed TrueNAS Core 13-U2 on 2x256GB Samsung 860Pro SSDs, created 4xRAIDz2x8 18TB WD HC550 SAS drives, created 86%-sized zvols and exported them via iSCSI.
Initial tests showing that it's working well beyond good than needed - i've managed to fill up 10Gig uplink with iSCSI write load during fio tests(128k blocksize, sync=default, 4 threads - one on each pool). Currently filling up the space to see how bad it will be when there will be 14% free space on each pool.
For some reasons i am able to saturate 10Gig uplink only while fio test is running, while fio allocates space for test i am getting around 6.5Gig/s RX on TrueNAS while physical disks metrics don't show any bottlenecking (busy~20%, latency<2ms, pending IO<2, IOPS<300 (saw around 1k here while was testing with 4k blocks random rw), IO~20MB/s), CPU load ~30%.
Maybe anyone have some ideas why fio is able to saturate network connection only while performing test and not while allocating test files?
CPU/network resources from host side is ok
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Wow, it's like a recipe of what not to do.

Get rid of the RAIDZ2.
Maintain something significantly less than 86% space utilization, I recommend 30-50%.
Add more RAM.
Go back and review


and by that I also mean the articles that it links to which describe subtopics in more detail. The SSD's will save you from some -- but not all -- of the impacts of HDD seeking, but SSD's can be slow in their own ways, and will turn tragically slow when you cannot maintain the free page pool properly.
 

HellKern

Cadet
Joined
Sep 24, 2021
Messages
7
Wow, it's like a recipe of what not to do.

Get rid of the RAIDZ2.
Maintain something significantly less than 86% space utilization, I recommend 30-50%.
Add more RAM.
Go back and review


and by that I also mean the articles that it links to which describe subtopics in more detail. The SSD's will save you from some -- but not all -- of the impacts of HDD seeking, but SSD's can be slow in their own ways, and will turn tragically slow when you cannot maintain the free page pool properly.
I have read this article several times before starting the build, Thanks!

So now i will run to the serverroom, throw away all that HDD mess, toast ~2petabytes of SSDs from my wonder-SSD-making box (to get same amount of usable space of course 4x80TB usable x 2 for mirroring x 3 for free space), move additional TB or two of RAM from hypervisors and all of that just to store CCTV archive(no VMs, no fancy databases, only video streams+metadata combined and packed in 2GB files before getting on NAS). Come on =) in production this storage will be filled really slow relative to it's size - literally around a year for a full re-write cycle. I assume by the time it get any reasonable fragmentation this NAS will be replaced with a new one.

And now without any sarcasm - let me wait several days before it fills up with test data and i will perform some testing while pools are 86% occupied with data, i understand that 14% free space generally is a bad idea, but it still ~13TB free space for doing CoW magic with realworld write speed <100mbit/s on each pool
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
I assume by the time it get any reasonable fragmentation this NAS will be replaced with a new one.

If you can really say this with confidence, then perhaps you won't actually have fragmentation issues. These start showing up after write-erase-rewrite cycles, especially with smaller blocks. If you never get to the point where you are erasing and then overwriting, theoretically your speeds should remain good.

let me wait several days before it fills up with test data and i will perform some testing while pools are 86% occupied with data,

The problem with this is just that I want you to be aware that this isn't going to be representative. The first time you fill a pool, it fills to capacity at a very high speed, because ZFS has no problems finding free space to allocate. However, once you start erasing data and fragmentation begins, it becomes harder and harder to find contiguous regions of space to allocate. For HDD's, this means seeks. For SSD's, this means erasure cycles. In both cases, it means reading more metadata from the pool and doing more computational work to analyze for free space. This mostly just gets worse with the age of the pool and additional rewrite cycles, unless you do something dramatic like empty the pool of data.

And now without any sarcasm

Well, just understand that my goal is for you to be successful, and that usually requires at least an understanding of the long term problem. If you are certain about the long term prospects of the pool, then of course you can play out different solutions that rely on the characteristics of the pool that you're working on. The important thing is this:

i understand that 14% free space generally is a bad idea, but it still ~13TB free space for doing CoW magic with realworld write speed <100mbit/s on each pool

I realize that 13TB might seem like an unconscionable amount of free space to need, but it is important to understand that ZFS is using compsci trickery to do its thing. You are trading space for efficiency and speed, and if you are sloppy or unaware, then you can box yourself into an ugly corner if you don't give it the space. The 13TB is a minor tax in the big picture.
 

HellKern

Cadet
Joined
Sep 24, 2021
Messages
7
I've managed to fill up 100% one of the zvols with data (86% of the pool) and made ~30 hours of random writes on filled data, so pool fragmentation reached 20% - still getting sequential write performance ~1000mbit/s(120MB/s) on 4k blocks, that's 10 times more than real workload will be produced is my use case, yet random 4k writes are really messed up now - getting around 2MB/s writes on random 4k blocks.

Also i've checked stats for my old/current FreeNAS server that should be replaced with one we are talking about - it have 4 pools each made from 2 mirrored 8TB HDDs, 90% each pool is zvol filled up 100% and re-writed 24/7 for last 5 years (approx. 45 full re-writes) - pools have 73% fragmentation and still have good performance for real workload - ~40MB/s sequential 4k writes.

I've tried to put all real workload to one of the pools in new TrueNAS box (in production workload will be spreaded around 4 pools evenly) and it performed without any issues.

I will try to kill performance of one of the pools in new box further by doing 4k random writes for another 24 hours or so and perform checks again, but for now i am really pleased by performance i've got. And because i've plan to replace this box after 5 years(~4-7 full re-writes) i don't see any potential troubles with current config in future. In case some drive will fail i could set workload to be read-only on any pool(s) to avoid critical performance degradation during resilvering process.

So @jgreco, could you let go of my sin of creating such a no-go config for iSCSI? =)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
pool fragmentation reached 20% - still getting sequential write performance ~1000mbit/s(120MB/s) on 4k blocks, that's 10 times more than real workload will be produced is my use case, yet random 4k writes are really messed up now - getting around 2MB/s writes on random 4k blocks.

Be aware that the ZFS-reported "fragmentation" is really a weird statistic, it's actually reporting how hard it had to work to find free space (a measure of free space), not what is classically understood as fragmentation (a measure of how optimized stored data is).

The 20% suggests that it isn't having a hard time allocating new space, which explains why you are still getting 120MB/sec in some cases, which is true because ZFS tends to keep large contiguous runs of free space available if possible. The real problem you have to be wary of is that you're eventually going to be heading towards that 2MB/sec rate once there are enough write-erase-rewrite cycles, and that free space pool has been used, leaving you with scattershot free blocks all over hell and back.

So @jgreco, could you let go of my sin of creating such a no-go config for iSCSI? =)

I ain't absolving you of your sins. All I really want is for you to understand what you're potentially signing up for. ZFS is weird, and if you go in with your eyes open and informed and educated of the potential issues, I assume you're willing to own the result. What I don't want is to hear in five years "but I benchmarked it when it was new, it was great, and now it totally sucks." Because that's a totally possible outcome. But by that same measure, if you went in understanding that today's performance would not be representative, or if you planned on replacing your 8TB drives today with 16TB drives down the road, then that doesn't bother me, because at the end of the day it's your server and your problem and I have nothing against people selecting well-informed options that I would not personally select but which serve their needs.
 
Top