Help needed finalising new office fileserver build

scott2500uk

Dabbler
Joined
Nov 17, 2014
Messages
37
What I'm currently looking at buying:

Chassis: Supermicro 3U SC836BE1C
Motherboard: Supermicro X11SCL-F
CPU: Intel Xeon E-2286G 6-Core 4.0GHz
RAM: 128GB - 4x32GB DDR4 PC4-21300 ECC
HDD Controller: B'Com 9300-4i 12Gb/s 4port JBOD + Chassis SAS Expander
Network: Intel X520-DA2 10Gbit DA SFP+ 2 port PCI-e
HDDs: 12x16TB Seagate Ironwolf Pro - possibly 18TB instead
HDD configuration: 1 pool - 2VDEVs of 6xRAIDZ3 or 3VDEVs of 4xRAIDZ2

The server will be used for storage and serving up files to the design, photography and production team. Working data set of about 50TB and unlikely to grow as old client data is archived off or deleted.

Devices connecting are predominately Apple Macs and will be connecting over network 1gbe nics with a couple connecting via 10gbe.

Our current FreeNas file server, this new server is replacing, is still running FreeNAS 9.3 and currently shares files over AFP. We plan on switching to CIFS/SMB and the latest TrueNAS with the new server.

I know TrueNAS/ZFS has come a long way since 9.3 so there are a few things I'm not sure what hardware is best with the newer stuff.

So here are the things I want to improve over the old server:

1) Boot speed. The old server runs its boot over a couple of USB sticks with a ZFS mirror. As the old server only used the USBs to boot off of once in a blue moon I've had no issues with this setup other than they are quite slow. Is the recommendation to still use a mirrored set of USB sticks and regular backups of config? Should I switch to a mirrored pair of small SSDs instead? If so what is the recommendation?

2) Directory traversal speed. I've always thought it was quite slow at times especially when the directory hadn't been visited yet. Navigating through the network share at times would take a second or two to load the folder contents. Our current server also has 128GB RAM, no L2ARC cache and has a similar 2vdev Raidz2 disk array. We only use about 33GB of ARC size with a 66% hit ratio. Maybe the issue is a configuration issue however for the new server would you recommend and particular hardware? I know ZFS now has the ability to use SSD for storing metadata. Would a fast PCIe or M2 SSD help here? what size and drive would you recommend.

3) Improve read/write speeds in general. Our current file server quite happily deals with our designers on 1gbe connections to the servers. We have about no more than 20 designers working off the server at any one time. In reality, they aren't all opening and saving files at the same time so we have never seen the server 10gbe get saturated and the limitation has been the 1gbe connections. Also with remote working now due to the pandemic, we have more people connecting over slow internet VPN connections, not a lot we can do about that, but this has resulted in staff using remote desktop on to machines with 10Gbe connections to the fileserver so it is like they are working locally. I would like to give the best possible loading experience to those on 10gbe. Are there hardware alterations or additions I should make to saturate a 10gbe connection, if at all possible?

Thank you for taking the time to read, and if there is any other info I have missed they let me know and I will add it to the original post.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Certainly.

1) You'll definitely want to switch to mirrored SSDs or SATADOMs instead. USB sticks have been discouraged as boot devices for a bit now as their endurance and wear-leveling is lacking. Newer versions of Free/TrueNAS are leaning more heavily on the boot device and users with less robust solutions have had some failures, and in a business environment that's obviously a no-go.

2) For specifically browsing the data, I'd recommend an L2ARC device with secondarycache=metadata set on the dataset you want to accelerate the metadata for. This has the advantage of being able to use most consumer/read-intensive SSDs (although still get a decent one, not a cheap Kingston) since redundancy for L2ARC is unneeded. Several users have reported significant improvements in directory browsing/thumbnail indexing from this setup, and it doesn't have the same potential pitfalls of losing the metadata vdev ("your pool is toast")

3) You may have some challenges moving to SMB from AFP, as Apple/macOS SMB sends a lot of its file actions as synchronous requests (wait for commit to disk before acknowledgement sent) which can be a severe performance drag. If you've been okay from a stability/safety perspective (use a UPS, take snapshots/backups) with AFP on 9.3 then disabling SMB strict sync on your new setup will likely be the ticket forward, as opposed to employing a fast SLOG device (RMS-200, NVDIMM, Optane DC series) which adds another layer of expense.

Pool design may also be of interest - two RAIDZ3 vdevs is certainly a very robust level of redundancy but you're losing the same amount of space as you would with a 6x2-way mirror vdev setup. Is 2x 6-RAIDZ2 an option, or are you specifically looking for the ability to lose a lot of drives (really only needed if you anticipate a longer time to replace a failed device, such as a remote office) without an impact?
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
In terms of hard disks, I would also look at Seagate Exos and Toshiba MG08. Both are data center class and often cheaper than NAS drives. My personal experience with Segate Exos 16 TB has been mixed. Great RMA handling, but 3 dead drives in 3 months that were all bought about a year ago.
 

scott2500uk

Dabbler
Joined
Nov 17, 2014
Messages
37
1) You'll definitely want to switch to mirrored SSDs or SATADOMs instead. USB sticks have been discouraged as boot devices for a bit now as their endurance and wear-leveling is lacking. Newer versions of Free/TrueNAS are leaning more heavily on the boot device and users with less robust solutions have had some failures, and in a business environment that's obviously a no-go.

What wear levelling is needed if they are only ever written to during config changes? In 9.3 I could set one of the data pools to store all of the logging and graphing so the USB sticks wouldn't get any use. Have things changed in TrueNAS where boot drives are read and written to more often? What SSD's would be recommended?

2) For specifically browsing the data, I'd recommend an L2ARC device with secondarycache=metadata set on the dataset you want to accelerate the metadata for. This has the advantage of being able to use most consumer/read-intensive SSDs (although still get a decent one, not a cheap Kingston) since redundancy for L2ARC is unneeded. Several users have reported significant improvements in directory browsing/thumbnail indexing from this setup, and it doesn't have the same potential pitfalls of losing the metadata vdev ("your pool is toast")

Ah, I was under the impression that the zfs metadata separate vdev was a duplicate of already existing metadata so if it was lost, no worries. If that's not the case then yes a l2arc set to cache only metadata seems like the way to go. What is the rule of thumb here for sizing an l2arc for meta data? Would something like a 250GB Samsung 980 Pro NVMe M.2 SSD be suitable for this job?

3) You may have some challenges moving to SMB from AFP, as Apple/macOS SMB sends a lot of its file actions as synchronous requests (wait for commit to disk before acknowledgement sent) which can be a severe performance drag. If you've been okay from a stability/safety perspective (use a UPS, take snapshots/backups) with AFP on 9.3 then disabling SMB strict sync on your new setup will likely be the ticket forward, as opposed to employing a fast SLOG device (RMS-200, NVDIMM, Optane DC series) which adds another layer of expense.

I am contemplating sticking with AFP now. I recently discovered that SMB only works with LDAP auth if it has SMB Attributes. As I will be using Google Secure LDAP to do user auth going forward I have no idea how I can get the two systems working if at all. I have no worries disabling sync in SMB since, as you suggested, we run a UPS, snapshots and onsite and offsite backups. I am really torn now, should we stick with AFP or switch to SMB...? We have 1 windows machine that runs some accountancy software and it would be far easier to admin if all the shares used the same protocol. Also, I have had weird AFP behaviour in the past with extended attributes where all files were locked on the share and had to run a script on a mac computer to lock and unlock every single file to solve the problem. Never did get to the bottom of what caused it.

Pool design may also be of interest - two RAIDZ3 vdevs is certainly a very robust level of redundancy but you're losing the same amount of space as you would with a 6x2-way mirror vdev setup. Is 2x 6-RAIDZ2 an option, or are you specifically looking for the ability to lose a lot of drives (really only needed if you anticipate a longer time to replace a failed device, such as a remote office) without an impact?

Yes, you are absolutely right. After posting I did think a 3 x VDEV of 4 x 16TB would be better for performance and give the same amount of storage. I run VDEVS currently in z2 but changed to z3 given that I was going from 4TB drives to 16TB. The office is kinda now remote due to the pandemic which means replacements have to be scheduled. I do run regular scrubs and short/long SMART tests and any sign of a failed test or error I will replace the disk and RMA. I think given that I replace our drives at the slightest sign of failure, touch wood, I think the 3xVDEV is the better way to go.
 

scott2500uk

Dabbler
Joined
Nov 17, 2014
Messages
37
In terms of hard disks, I would also look at Seagate Exos and Toshiba MG08. Both are data center class and often cheaper than NAS drives. My personal experience with Segate Exos 16 TB has been mixed. Great RMA handling, but 3 dead drives in 3 months that were all bought about a year ago.

I did see the Exos drives are slightly cheaper than the Ironwolf pros but I haven't had any experience with them so passed on them. When buying 12 drives the Exos only gave a £220 saving over the Ironwolf pros. Not much when you are buying nearly £5K of disks.

I have used Seagates and WD RMA and both have been satisfactory. I no longer buy WD RED Pro drives as nearly every drive I have had, never made it past their 5-year warranty without developing an issue. I've done so many WD RED PRO drive replacements I have lost count. Seagate I've only had 1 of their Iron wolf pro drives go bad so far.

The cheapest drives I can see are the WD Ultrastar DC drives but they are on back-order atm. Those would offer a £550 saving over the ironwolf pros.

I have 4 of those drives in my personal NAS/Server and after 2 years, so far, they haven't produced any issues. WD RMA I can trust but I don't know if I can trust 12 of them in a company server. Anyone got any good or bad reviews of these drives?
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
1) Boot drives now get a lot of log activity. Any small SSD should do the trick.

2) As said, special vdevs are part of the pool, and need the same level of redundancy and resilience as any "data" vdev. A (preferably persistent) metadata L2ARC can be lost with no adverse effect but performance. Size is according to the metadata in your pool (any excess would have to be fetched from the pool).

Personally, I would never operate 10+ TB HDDs in mirrors less than 3-wide… so 6-wide raidz3 is still better than mirrors.
More efficient: 2 * 6-wide raidz2
Safest: 2 * 6-wide raidz3
Higher performer: 3 * 4-wide raidz2
Since the chassis has 16 bays, could 2*6-wide Z2 plus a one or two hot spares be a reasonable middle point between performance, efficiency and safety if maintenance takes some time?

WD Gold/Ultrastar are from HGST. As far as I know, the engineering teams are still separate from WD (and knew that putting SMR in NAS drives was a Bad Idea…), so if anything WD Gold/Ultrastar DC should be better than WD Red Plus/Pro. But I do not have enough of them, and have not logged enough working hours, to have any significant data in support of this prejudice.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
What wear levelling is needed if they are only ever written to during config changes? In 9.3 I could set one of the data pools to store all of the logging and graphing so the USB sticks wouldn't get any use. Have things changed in TrueNAS where boot drives are read and written to more often? What SSD's would be recommended?

There's increased logging activity to the boot as @Etorix pointed out, but IMO another factor has been the "race to the bottom" in the quality of USB sticks these days. Back in the day I could have confidence that I could buy a Sandisk 16GB 2.0 stick and it would merrily go along for years in a home setting (nine and counting) but nowadays it seems like everything but the absolute worst NAND makes its way into a cheap SATA chassis from some knockoff brand (Kingdian, Superfast, whatever)

Any decent-brand SSD should be fine. I like old Intel DC or 320's - the 320 40GB is USD$12.25/ea at the moment on eBay USA.

Ah, I was under the impression that the zfs metadata separate vdev was a duplicate of already existing metadata so if it was lost, no worries. If that's not the case then yes a l2arc set to cache only metadata seems like the way to go. What is the rule of thumb here for sizing an l2arc for meta data? Would something like a 250GB Samsung 980 Pro NVMe M.2 SSD be suitable for this job?

Metadata for file workloads is typically around 0.1-0.3% of total data - my home fileserver runs at about 0.16% and 0.07% on its pools and it mostly holds multi-MB photos and multi-GB videos. The smaller the records, the more metadata - tiny 4K/8K could be up to as much as 1% of data size. For the special/meta vdevs since you want to be able to hold all of it there for speed, but if an L2ARC runs out of room it just doesn't cache it all. The 250GB size is more than enough and a 980Pro is definitely overkill for speed.

I am contemplating sticking with AFP now. I recently discovered that SMB only works with LDAP auth if it has SMB Attributes. As I will be using Google Secure LDAP to do user auth going forward I have no idea how I can get the two systems working if at all. I have no worries disabling sync in SMB since, as you suggested, we run a UPS, snapshots and onsite and offsite backups. I am really torn now, should we stick with AFP or switch to SMB...? We have 1 windows machine that runs some accountancy software and it would be far easier to admin if all the shares used the same protocol. Also, I have had weird AFP behaviour in the past with extended attributes where all files were locked on the share and had to run a script on a mac computer to lock and unlock every single file to solve the problem. Never did get to the bottom of what caused it.

Afraid I'm not well-versed in the AFP world, so I can't offer a lot of advice here.

Yes, you are absolutely right. After posting I did think a 3 x VDEV of 4 x 16TB would be better for performance and give the same amount of storage. I run VDEVS currently in z2 but changed to z3 given that I was going from 4TB drives to 16TB. The office is kinda now remote due to the pandemic which means replacements have to be scheduled. I do run regular scrubs and short/long SMART tests and any sign of a failed test or error I will replace the disk and RMA. I think given that I replace our drives at the slightest sign of failure, touch wood, I think the 3xVDEV is the better way to go.

The question about vdev structure is really "how quickly can you react to a fully failed disk?" If it's inside of 4H then you should be safe with Z2 - beyond that adding a single hotspare to the pool might be a good intermediate step as opposed to going up to Z3. With 16 bays you can look at 4x4-wide Z2 as an expansion path.
 
Top