Large 36 x 12TB Build

Status
Not open for further replies.

Rob_SAN

Dabbler
Joined
Apr 26, 2018
Messages
13
Hey,

I have a build for which I need to ensure rebuilds do not impact the array, I assume I would be best of doing mirrored vdevs rather than RAIDZ2 or RAIDZ3 vdevs to reduce the impact of rebuilds?

The purpose of the array is a backup target, it's worth noting my backup software does perform consistency checks on the data, hence my worry about rebuilds impacting read/write speeds!

Also it worth having a NVMe PCIe slog? I need to ensure data is committed to the array without fail, I want to turn on " sync=always" to ensure this.

My Build is below.

Processor - 2 x Intel® Xeon® 5122 Gold Processor 3.60GHz 4 core (worth considering a 2 x 3.4Ghz 6 core?)
Memory - 256GB 2666MHz DDR4 ECC
HBA - LSi SAS HBA 9300-4i4e SGL
SSD - 2 x Intel® SSD DC S4500 Series (Freenas OS Drives in mirror).
HDD - 36 x 12TB 7200 RPM 3.5" 12GB SAS
Network Card - Supermicro Quad Port 10Gb/s SFP+
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Using mirrors for a backup Target is excessively wasteful of drive space unless you have some unusual high IO that normally doesn't happen with backup targets.
What makes you think that rebuilds of a RAIDz2 pool is going to impact array functionality?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Rob_SAN

Dabbler
Joined
Apr 26, 2018
Messages
13
Just my experience of other arrays, I'm aware a rebuild based of parity disks can take some time and place load on the pack of disks?

Other than this does the build look ample? Would you suggest a SLOG?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Just a point on rebuilds... I noticed when doing one recently that a rebuild (resilver) doesn't necessarily confine itself to the vdev involved, and had raised a thread here to discuss it, which resulted in the advice that a resilver for drive replacement also does a scrub, so all disks in the pool get hammered anyway even if not in the vdev being rebuilt.

I'm not sure that opting to give up more disks to parity will help you escape any performance impact, so you're probably better off running with RAIDZ2 unless performance is somehow driving the need for more vdevs in total.

https://forums.freenas.org/index.ph...ing-all-disks-in-multi-vdev-mirror-set.61954/
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Just a point on rebuilds... I noticed when doing one recently that a rebuild (resilver) doesn't necessarily confine itself to the vdev involved, and had raised a thread here to discuss it, which resulted in the advice that a resilver for drive replacement also does a scrub, so all disks in the pool get hammered anyway even if not in the vdev being rebuilt.

I'm not sure that opting to give up more disks to parity will help you escape any performance impact, so you're probably better off running with RAIDZ2 unless performance is somehow driving the need for more vdevs in total.

https://forums.freenas.org/index.ph...ing-all-disks-in-multi-vdev-mirror-set.61954/
Exactly, but if you want fast rebuilds, having more vdevs with less drives does improve the speed. I have a pool at work that takes 4 days to resilver a drive, but the resilver does not significantly impact operation of the pool.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Just my experience of other arrays, I'm aware a rebuild based of parity disks can take some time and place load on the pack of disks?

Other than this does the build look ample? Would you suggest a SLOG?
Otherwise, it does look good. If you want to set sync = always, you would likely benefit from SLOG. I usually wouldn't suggest that on a backup target. Why do you want to force sync writes?
 

Rob_SAN

Dabbler
Joined
Apr 26, 2018
Messages
13
Thanks for the responses!

Regarding the slog, I understand this will improve write speeds with sync on? I need to ensure data is not lost on a power outage.

I think the hardware final build will be;
Processor - 2 x Intel® Xeon® 5122 Gold Processor 3.60GHz 4 core
Memory - 256GB 2666MHz DDR4 ECC
HBA - LSi SAS HBA 9300-4i4e SGL
SSD - 2 x Intel® SSD DC S4500 Series (Freenas OS Drives in mirror)
SLOG - 2 x Samsung PM1725a 1.6TB HHHL | 5 DWPD | PCIe (Large I know, smallest I can buy for a decent price)
HDD - 36 x 12TB 7200 RPM 3.5" 12GB SAS
Network Card - Supermicro Quad Port 10Gb/s SFP+

Freenas setup;
SLOG

2 x Samsung PM1725a 1.6TB HHHL | 5 DWPD | PCIe
L2ARC
Not sure there is any point?
RAID-Z2
5 vdevs each 6 x 12TB = 240TB
either
3 vdevs each 10 x 12TB = 288TB

Means I have 6 spare disks which is frustrating. :(
 
Last edited:

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Ah but then I have no hot spares. :(
Unless you are not monitoring the system, or the system is at a remote location, there's no need for hot spare drives. Keeping a couple pre-tested cold spares is plenty.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Alternative is 5 x 7-way raidz2

And a single spare. Basically leaves a bay open for replacement.

300TB.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
SLOG
2 x Samsung PM1725a 1.6TB HHHL | 5 DWPD | PCIe
Am I the only one that thinks this is crazy? Are there no better options?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The maximum that ZFS will use is 16GB per SLOG device, so 1.6TB is a stupendous waste. There are many better, less wasteful, options.
Am I the only one that thinks this is crazy? Are there no better options?
The best option for SLOG is the Intel Optane P4800X 375GB but they are running around $1600.
We just ordered a new server at work with one of those for a SLOG.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Just a point on rebuilds... I noticed when doing one recently that a rebuild (resilver) doesn't necessarily confine itself to the vdev involved, and had raised a thread here to discuss it, which resulted in the advice that a resilver for drive replacement also does a scrub, so all disks in the pool get hammered anyway even if not in the vdev being rebuilt.
/


I have no evidence to the contrary nor any knowledge but I would say, based on disk load, time taken, I would agree with your thoughts. Pretty sure all of my rebuilds have been tremendously long in time and stressful.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Found this post from @joeschmuck. He talks about tuning scrubs/resilver tunables. This could be adjusted to prevent an unresponsive/slow system during rebuilds.
It might be good to describe what is being changed.

The Default Values for the above parameters are:
vfs.zfs.scrub_delay=4
vfs.zfs.top_maxinflight=32
vfs.zfs.resilver_min_time_ms=3000
vfs.zfs.resilver_delay=2

A description of each tunable is listed below: (from the FreeBSD Handbook)

vfs.zfs.scrub_delay - Number of ticks to delay between each I/O during a scrub. To ensure that a scrub does not interfere with the normal operation of the pool, if any other I/O is happening the scrub will delay between each command. This value controls the limit on the total IOPS (I/Os Per Second) generated by the scrub. The granularity of the setting is determined by the value of kern.hz which defaults to 1000 ticks per second. This setting may be changed, resulting in a different effective IOPS limit. The default value is 4, resulting in a limit of: 1000 ticks/sec / 4 = 250 IOPS. Using a value of 20 would give a limit of: 1000 ticks/sec / 20 = 50 IOPS. The speed of scrub is only limited when there has been recent activity on the pool, as determined by vfs.zfs.scan_idle. This value can be adjusted at any time with sysctl(8).

vfs.zfs.top_maxinflight - Maxmimum number of outstanding I/Os per top-level vdev. Limits the depth of the command queue to prevent high latency. The limit is per top-level vdev, meaning the limit applies to each mirror, RAID-Z, or other vdev independently. This value can be adjusted at any time with sysctl(8).

vfs.zfs.resilver_min_time_ms -Minimum time allocated to the resilver process.

vfs.zfs.resilver_delay - Number of milliseconds of delay inserted between each I/O during a resilver. To ensure that a resilver does not interfere with the normal operation of the pool, if any other I/O is happening the resilver will delay between each command. This value controls the limit of total IOPS (I/Os Per Second) generated by the resilver. The granularity of the setting is determined by the value of kern.hz which defaults to 1000 ticks per second. This setting may be changed, resulting in a different effective IOPS limit. The default value is 2, resulting in a limit of: 1000 ticks/sec / 2 = 500 IOPS. Returning the pool to an Online state may be more important if another device failing could Fault the pool, causing data loss. A value of 0 will give the resilver operation the same priority as other operations, speeding the healing process. The speed of resilver is only limited when there has been other recent activity on the pool, as determined by vfs.zfs.scan_idle. This value can be adjusted at any time with sysctl(8).

So by making these changes as indicated you will be reducing the responsiveness of the FreeNAS server to outside requests during a Scrub or Resilver operation. And this is not a bad change if you have to resilver a large hard drive back into your system. You could also adjust those settings to half the differences and see how that affects things.
 
Joined
Feb 2, 2016
Messages
574
HBA - LSi SAS HBA 9300-4i4e SGL

Are you going to have external drives? If not, the SAS 9300-8I gives you twice as many internal ports and at a lower power draw. Depending on which back plane you have in the server has, you may be able to get double the bandwidth or multiple paths by using the 8i instead of the 4i4e.

Also, I don't fully understand your use case. That's a beastly amount of CPU for a backup server. If you're not running your backup software on the FreeNAS server itself, you probably don't need that much horsepower. The Xeon 5122 is a high-dollar chip that is going to require a high-dollar motherboard. CPU isn't going to be your bottleneck even with a much less powerful - and expensive - chip.

Otherwise, sweet machine, Rob.

Cheers,
Matt
 

Rob_SAN

Dabbler
Joined
Apr 26, 2018
Messages
13
The maximum that ZFS will use is 16GB per SLOG device, so 1.6TB is a stupendous waste. There are many better, less wasteful, options.

The best option for SLOG is the Intel Optane P4800X 375GB but they are running around $1600.
We just ordered a new server at work with one of those for a SLOG.

I know, I can get the 1.6TB at a better price than the P4800 375GB. Odd hey?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I know, I can get the 1.6TB at a better price than the P4800 375GB. Odd hey?
How does the performance compare?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Rob_SAN

Dabbler
Joined
Apr 26, 2018
Messages
13
Are you going to have external drives? If not, the SAS 9300-8I gives you twice as many internal ports and at a lower power draw. Depending on which back plane you have in the server has, you may be able to get double the bandwidth or multiple paths by using the 8i instead of the 4i4e.

Also, I don't fully understand your use case. That's a beastly amount of CPU for a backup server. If you're not running your backup software on the FreeNAS server itself, you probably don't need that much horsepower. The Xeon 5122 is a high-dollar chip that is going to require a high-dollar motherboard. CPU isn't going to be your bottleneck even with a much less powerful - and expensive - chip.

Otherwise, sweet machine, Rob.

Cheers,
Matt

The plain is to use the external connecting with additional shelving, I may end up going with the SAS 9300-8I and buy a SAS9300-8E for the shelving. Also been toying with the idea of running dual HBA cards for pathing to the disks.

The backup software in use constantly checks the data for consistency, I also intend on adding more shelving as time goes on, wanted to ensure we have enough CPU for the next 3-5 years.
 
Status
Not open for further replies.
Top