Large 36 x 12TB Build

Rob_SAN · Apr 26, 2018

Hey,

I have a build for which I need to ensure rebuilds do not impact the array, I assume I would be best of doing mirrored vdevs rather than RAIDZ2 or RAIDZ3 vdevs to reduce the impact of rebuilds?

The purpose of the array is a backup target, it's worth noting my backup software does perform consistency checks on the data, hence my worry about rebuilds impacting read/write speeds!

Also it worth having a NVMe PCIe slog? I need to ensure data is committed to the array without fail, I want to turn on " sync=always" to ensure this.

My Build is below.

Processor - 2 x Intel® Xeon® 5122 Gold Processor 3.60GHz 4 core (worth considering a 2 x 3.4Ghz 6 core?)
Memory - 256GB 2666MHz DDR4 ECC
HBA - LSi SAS HBA 9300-4i4e SGL
SSD - 2 x Intel® SSD DC S4500 Series (Freenas OS Drives in mirror).
HDD - 36 x 12TB 7200 RPM 3.5" 12GB SAS
Network Card - Supermicro Quad Port 10Gb/s SFP+

Chris Moore · Apr 26, 2018

Using mirrors for a backup Target is excessively wasteful of drive space unless you have some unusual high IO that normally doesn't happen with backup targets.
What makes you think that rebuilds of a RAIDz2 pool is going to impact array functionality?

Sent from my SAMSUNG-SGH-I537 using Tapatalk

Rob_SAN · Apr 26, 2018

Just my experience of other arrays, I'm aware a rebuild based of parity disks can take some time and place load on the pack of disks?

Other than this does the build look ample? Would you suggest a SLOG?

sretalla · Apr 27, 2018

Just a point on rebuilds... I noticed when doing one recently that a rebuild (resilver) doesn't necessarily confine itself to the vdev involved, and had raised a thread here to discuss it, which resulted in the advice that a resilver for drive replacement also does a scrub, so all disks in the pool get hammered anyway even if not in the vdev being rebuilt.

I'm not sure that opting to give up more disks to parity will help you escape any performance impact, so you're probably better off running with RAIDZ2 unless performance is somehow driving the need for more vdevs in total.

https://forums.freenas.org/index.ph...ing-all-disks-in-multi-vdev-mirror-set.61954/

Chris Moore · Apr 27, 2018

sretalla said:
Just a point on rebuilds... I noticed when doing one recently that a rebuild (resilver) doesn't necessarily confine itself to the vdev involved, and had raised a thread here to discuss it, which resulted in the advice that a resilver for drive replacement also does a scrub, so all disks in the pool get hammered anyway even if not in the vdev being rebuilt.

I'm not sure that opting to give up more disks to parity will help you escape any performance impact, so you're probably better off running with RAIDZ2 unless performance is somehow driving the need for more vdevs in total.

https://forums.freenas.org/index.ph...ing-all-disks-in-multi-vdev-mirror-set.61954/

Exactly, but if you want fast rebuilds, having more vdevs with less drives does improve the speed. I have a pool at work that takes 4 days to resilver a drive, but the resilver does not significantly impact operation of the pool.

Chris Moore · Apr 27, 2018

Rob_SAN said:
Just my experience of other arrays, I'm aware a rebuild based of parity disks can take some time and place load on the pack of disks?

Other than this does the build look ample? Would you suggest a SLOG?

Otherwise, it does look good. If you want to set sync = always, you would likely benefit from SLOG. I usually wouldn't suggest that on a backup target. Why do you want to force sync writes?

Rob_SAN · May 13, 2018

Thanks for the responses!

Regarding the slog, I understand this will improve write speeds with sync on? I need to ensure data is not lost on a power outage.

I think the hardware final build will be;
Processor - 2 x Intel® Xeon® 5122 Gold Processor 3.60GHz 4 core
Memory - 256GB 2666MHz DDR4 ECC
HBA - LSi SAS HBA 9300-4i4e SGL
SSD - 2 x Intel® SSD DC S4500 Series (Freenas OS Drives in mirror)
SLOG - 2 x Samsung PM1725a 1.6TB HHHL | 5 DWPD | PCIe (Large I know, smallest I can buy for a decent price)
HDD - 36 x 12TB 7200 RPM 3.5" 12GB SAS
Network Card - Supermicro Quad Port 10Gb/s SFP+

Freenas setup;
SLOG
2 x Samsung PM1725a 1.6TB HHHL | 5 DWPD | PCIe
L2ARC
Not sure there is any point?
RAID-Z2
5 vdevs each 6 x 12TB = 240TB
either
3 vdevs each 10 x 12TB = 288TB

Means I have 6 spare disks which is frustrating. :(

adrianwi · May 13, 2018

Rob_SAN said:
RAID-Z2
Means I have 6 spare disks which is frustrating. :(

Not with 6 vdevs each 6 x 12TB = 288 TB

Rob_SAN · May 13, 2018

AdrianWilliamson said:
Not with 6 vdevs each 6 x 12TB = 288 TB

Ah but then I have no hot spares. :(

Chris Moore · May 13, 2018

Rob_SAN said:
Ah but then I have no hot spares. :(

Unless you are not monitoring the system, or the system is at a remote location, there's no need for hot spare drives. Keeping a couple pre-tested cold spares is plenty.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

Stux · May 13, 2018

Alternative is 5 x 7-way raidz2

And a single spare. Basically leaves a bay open for replacement.

300TB.

kdragon75 · May 13, 2018

Rob_SAN said:
SLOG
2 x Samsung PM1725a 1.6TB HHHL | 5 DWPD | PCIe

Am I the only one that thinks this is crazy? Are there no better options?

Chris Moore · May 13, 2018

The maximum that ZFS will use is 16GB per SLOG device, so 1.6TB is a stupendous waste. There are many better, less wasteful, options.

kdragon75 said:
Am I the only one that thinks this is crazy? Are there no better options?

The best option for SLOG is the Intel Optane P4800X 375GB but they are running around $1600.
We just ordered a new server at work with one of those for a SLOG.

diskdiddler · May 13, 2018

sretalla said:
Just a point on rebuilds... I noticed when doing one recently that a rebuild (resilver) doesn't necessarily confine itself to the vdev involved, and had raised a thread here to discuss it, which resulted in the advice that a resilver for drive replacement also does a scrub, so all disks in the pool get hammered anyway even if not in the vdev being rebuilt.
/

I have no evidence to the contrary nor any knowledge but I would say, based on disk load, time taken, I would agree with your thoughts. Pretty sure all of my rebuilds have been tremendously long in time and stressful.

kdragon75 · May 14, 2018

Found this post from @joeschmuck. He talks about tuning scrubs/resilver tunables. This could be adjusted to prevent an unresponsive/slow system during rebuilds.

joeschmuck said:
It might be good to describe what is being changed.

The Default Values for the above parameters are:
vfs.zfs.scrub_delay=4
vfs.zfs.top_maxinflight=32
vfs.zfs.resilver_min_time_ms=3000
vfs.zfs.resilver_delay=2

A description of each tunable is listed below: (from the FreeBSD Handbook)

vfs.zfs.scrub_delay - Number of ticks to delay between each I/O during a scrub. To ensure that a scrub does not interfere with the normal operation of the pool, if any other I/O is happening the scrub will delay between each command. This value controls the limit on the total IOPS (I/Os Per Second) generated by the scrub. The granularity of the setting is determined by the value of kern.hz which defaults to 1000 ticks per second. This setting may be changed, resulting in a different effective IOPS limit. The default value is 4, resulting in a limit of: 1000 ticks/sec / 4 = 250 IOPS. Using a value of 20 would give a limit of: 1000 ticks/sec / 20 = 50 IOPS. The speed of scrub is only limited when there has been recent activity on the pool, as determined by vfs.zfs.scan_idle. This value can be adjusted at any time with sysctl(8).

vfs.zfs.top_maxinflight - Maxmimum number of outstanding I/Os per top-level vdev. Limits the depth of the command queue to prevent high latency. The limit is per top-level vdev, meaning the limit applies to each mirror, RAID-Z, or other vdev independently. This value can be adjusted at any time with sysctl(8).

vfs.zfs.resilver_min_time_ms -Minimum time allocated to the resilver process.

vfs.zfs.resilver_delay - Number of milliseconds of delay inserted between each I/O during a resilver. To ensure that a resilver does not interfere with the normal operation of the pool, if any other I/O is happening the resilver will delay between each command. This value controls the limit of total IOPS (I/Os Per Second) generated by the resilver. The granularity of the setting is determined by the value of kern.hz which defaults to 1000 ticks per second. This setting may be changed, resulting in a different effective IOPS limit. The default value is 2, resulting in a limit of: 1000 ticks/sec / 2 = 500 IOPS. Returning the pool to an Online state may be more important if another device failing could Fault the pool, causing data loss. A value of 0 will give the resilver operation the same priority as other operations, speeding the healing process. The speed of resilver is only limited when there has been other recent activity on the pool, as determined by vfs.zfs.scan_idle. This value can be adjusted at any time with sysctl(8).

So by making these changes as indicated you will be reducing the responsiveness of the FreeNAS server to outside requests during a Scrub or Resilver operation. And this is not a bad change if you have to resilver a large hard drive back into your system. You could also adjust those settings to half the differences and see how that affects things.

MatthewSteinhoff · May 14, 2018

Rob_SAN said:
HBA - LSi SAS HBA 9300-4i4e SGL

Are you going to have external drives? If not, the SAS 9300-8I gives you twice as many internal ports and at a lower power draw. Depending on which back plane you have in the server has, you may be able to get double the bandwidth or multiple paths by using the 8i instead of the 4i4e.

Also, I don't fully understand your use case. That's a beastly amount of CPU for a backup server. If you're not running your backup software on the FreeNAS server itself, you probably don't need that much horsepower. The Xeon 5122 is a high-dollar chip that is going to require a high-dollar motherboard. CPU isn't going to be your bottleneck even with a much less powerful - and expensive - chip.

Otherwise, sweet machine, Rob.

Cheers,
Matt

Rob_SAN · May 14, 2018

Chris Moore said:
The maximum that ZFS will use is 16GB per SLOG device, so 1.6TB is a stupendous waste. There are many better, less wasteful, options.

The best option for SLOG is the Intel Optane P4800X 375GB but they are running around $1600.
We just ordered a new server at work with one of those for a SLOG.

I know, I can get the 1.6TB at a better price than the P4800 375GB. Odd hey?

Chris Moore · May 14, 2018

Rob_SAN said:
I know, I can get the 1.6TB at a better price than the P4800 375GB. Odd hey?

How does the performance compare?

Sent from my SAMSUNG-SGH-I537 using Tapatalk

Rob_SAN · May 14, 2018

Stux said:
Alternative is 5 x 7-way raidz2

And a single spare. Basically leaves a bay open for replacement.

300TB.

I was reading it should be 6 or 10? If 7 will work ok that's great!

Rob_SAN · May 14, 2018

MatthewSteinhoff said:
Are you going to have external drives? If not, the SAS 9300-8I gives you twice as many internal ports and at a lower power draw. Depending on which back plane you have in the server has, you may be able to get double the bandwidth or multiple paths by using the 8i instead of the 4i4e.

Also, I don't fully understand your use case. That's a beastly amount of CPU for a backup server. If you're not running your backup software on the FreeNAS server itself, you probably don't need that much horsepower. The Xeon 5122 is a high-dollar chip that is going to require a high-dollar motherboard. CPU isn't going to be your bottleneck even with a much less powerful - and expensive - chip.

Otherwise, sweet machine, Rob.

Cheers,
Matt

The plain is to use the external connecting with additional shelving, I may end up going with the SAS 9300-8I and buy a SAS9300-8E for the shelving. Also been toying with the idea of running dual HBA cards for pathing to the disks.

The backup software in use constantly checks the data for consistency, I also intend on adding more shelving as time goes on, wanted to ensure we have enough CPU for the next 3-5 years.

Important Announcement for the TrueNAS Community.

Large 36 x 12TB Build

Dabbler

Hall of Famer

Dabbler

Powered by Neutrality

Hall of Famer

Hall of Famer

Dabbler

Guru

Dabbler

Hall of Famer

MVP

Wizard

Hall of Famer

Wizard

Wizard

Guru

Dabbler

Hall of Famer

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Large 36 x 12TB Build"

Similar threads