Storage optimisation

beezone · Feb 7, 2018

Hi everyone,
I have a problem but don't even know how to start solving it.

I got a time machine backup server on FreeNAS-11.1-U1, CPU E5-2609 v3 @ 1.90GHz, 96Gb RAM, 36x6Tb 7200rpm Toshiba Enterprise SATA drives. Disks are split to 6 raidz1 volumes by 6 drives. Each backup user has a personal dataset to isolate one from the other. Every day I got same problem, disks busy counters almost at 100% during the working time as a result backup performance is terrible. Mac are using afp shares. Some backups are 40000+ of 8mb files.
24h graph of disk busy time by volumes.

Is it possible to increase the performance of disk subsystem? What of this may help or not: add ZIL, SLOG, ZFS settings or maybe block-size. Which advanced benchmarks or stats I can get to understand what to do?

Or nothing can help me and I need to find a cozy corner and cry?

Nick2253 · Feb 7, 2018

Let's get some more details about your system and use case.

How many users/clients are we talking about here? Are you doing full backups, or incremental? Are you using dedup? Compression? Are you configuring all the backups to run simultaneously, or are they staggered?

How full is your pool? Can you provide the output from zpool status?

tvsjr · Feb 7, 2018

^^ All of the above will help. That said, 7200RPM drives are good for about 100 IOPS... and that's about what you're doing. I'm guessing you have a fair number of users - you may have no choice but to move to a different RAID configuration (striped mirrors) to get the performance you want.

In all honesty, your RAID configuration needs improvement anyway. RAIDZ1 is dead for drives of that size anyway - you should be running RAIDZ2. Do you really have separate pools for each vdev? Or are all of the vdevs in one pool?

beezone · Feb 8, 2018

Nick2253 said:
Let's get some more details about your system and use case.

How many users/clients are we talking about here? Are you doing full backups, or incremental? Are you using dedup? Compression? Are you configuring all the backups to run simultaneously, or are they staggered?

How full is your pool? Can you provide the output from zpool status?

There are about 50 clients. Most of backups are incremental at most. Dedup is off, compression default lz4, cpu usage is about 20% on reporting graph, Load Average 6.34, 4.97, 4.31. Apple does its own magic with time machine so they don't give ability to change backup frequency (system integrity protection), by default it starts every hour. So most of time backups are running simultaneously.

Code:

zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:01:18 with 0 errors on Mon Feb  5 03:46:18 2018
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  mirror-0  ONLINE	   0	 0	 0
		ada1p2  ONLINE	   0	 0	 0
		ada0p2  ONLINE	   0	 0	 0

errors: No known data errors

  pool: vol1
 state: ONLINE
  scan: scrub repaired 0 in 0 days 16:04:01 with 0 errors on Sun Jan 28 16:04:04 2018
config:

	NAME											STATE	 READ WRITE CKSUM
	vol1											ONLINE	   0	 0	 0
	  raidz1-0									  ONLINE	   0	 0	 0
		gptid/e6abddf8-3f3a-11e7-a9ad-0cc47a820950  ONLINE	   0	 0	 0
		gptid/e77675a2-3f3a-11e7-a9ad-0cc47a820950  ONLINE	   0	 0	 0
		gptid/e841a9ad-3f3a-11e7-a9ad-0cc47a820950  ONLINE	   0	 0	 0
		gptid/e913b3c3-3f3a-11e7-a9ad-0cc47a820950  ONLINE	   0	 0	 0
		gptid/e9e67228-3f3a-11e7-a9ad-0cc47a820950  ONLINE	   0	 0	 0
		gptid/eab07c94-3f3a-11e7-a9ad-0cc47a820950  ONLINE	   0	 0	 0

errors: No known data errors

other volumes are also online and healthy

tvsjr said:
^^ All of the above will help. That said, 7200RPM drives are good for about 100 IOPS... and that's about what you're doing. I'm guessing you have a fair number of users - you may have no choice but to move to a different RAID configuration (striped mirrors) to get the performance you want.

I assume that RAIDZ2 will not improve the performance. But in case of failed disk it there will be less worry while it is slowly rebuilding.

tvsjr said:
In all honesty, your RAID configuration needs improvement anyway. RAIDZ1 is dead for drives of that size anyway - you should be running RAIDZ2. Do you really have separate pools for each vdev? Or are all of the vdevs in one pool?

All six pools are separate. Each consist of 6 disks connected to HBA.

danb35 · Feb 8, 2018

beezone said:
All six pools are separate.

That really seems like a bad idea, and probably is causing a good portion of your problem. To increase IOPS, you add vdevs. With only one vdev per pool, each pool has only the IOPS capability of a single disk. You'd likely improve performance quite a bit by putting all the disks into the same pool.

sretalla · Feb 8, 2018

Those charts look like high numbers, but not bottlenecks (should be straight horizontal lines if we're hitting a technical limit).

It's clear that a non-zero pending I/O reading is not great when sustained as in your chart, but danb35's suggestion may help to reduce that.

Have you looked at the network? (perhaps that's where you're bottlenecking)

If you have large numbers of reasonably large sized files (and not too many small files), you may benefit slightly by having a larger block size.

beezone · Feb 8, 2018

I'm quite confused.

danb35 said:
You'd likely improve performance quite a bit by putting all the disks into the same pool.

If I understood correctly all is configured as you said. Top vol1 is a vdev of 6 disks, second level vol1 is a single pool on vdev. Do you suggest to increase the vdevs count from current 6 to 12 (or more)? Do I have any other options?

danb35 · Feb 8, 2018

beezone said:
If I understood correctly all is configured as you said.

I don't think you do understand correctly.

beezone said:
Top vol1 is a vdev of 6 disks, second level vol1 is a single pool on vdev.

No, top vol1 is a pool, and second level vol1 is a dataset--the implicit or root dataset that exists on every ZFS pool (it's always existed with ZFS, but wasn't shown in the GUI before FN 9.3). The screen shot you posted shows nothing about the vdev layout--to see that, click on the top vol1, and then on the Volume Status button below (it looks like a blank sheet of notebook paper). Edit: Never mind--the output of zpool status you posted above shows the vdev layout.

What I'm proposing is that you have a single pool (what FreeNAS calls a Volume) consisting of all your disks, in multiple vdevs. This will increase performance--IOPS will increase roughly linearly with the number of vdevs. It will also simplify storage administration--you'll have a single volume with all your space there. You won't need to manage free space across multiple volumes (pooled storage is one of the big selling points of ZFS). It will also increase risk--as @sretalla notes below, when any single vdev fails, you lose your pool. That's why we'd recommend RAIDZ2 over RAIDZ1. Making this change would require that you destroy and rebuild your pool.

sretalla · Feb 8, 2018

You could probably also benefit from more RAM (before thinking about ZIL).

To clarify on your confusion...

A Pool with multiple VDEVs will perform better than a pool with only 1 VDEV, so if you had a single pool with all of your VDEVs in it, your individual pool performance (potential) will "double" as the writes can be striped across both VDEVs.

Careful! losing 1 VDEV in this configuration loses the whole Pool (all VDEVs).

You would then use that single pool (you could call it Pool1 or tank or whatever), then add 2 datasets, vol1 and vol2 to get back to your logical separation as in the current setup.

beezone · Feb 8, 2018

sretalla said:
Those charts look like high numbers, but not bottlenecks (should be straight horizontal lines if we're hitting a technical limit).

It's clear that a non-zero pending I/O reading is not great when sustained as in your chart, but danb35's suggestion may help to reduce that.

Have you looked at the network? (perhaps that's where you're bottlenecking)

Network seems is not a bottleneck. Lagg0 has a peak of 500-600mbit/s. But it can handle up to 2Gbps/s in theory. Our network infrastructure can handle this easily.

sretalla said:
If you have large numbers of reasonably large sized files (and not too many small files), you may benefit slightly by having a larger block size.

Backup file structure is a 40-60k+ of 8mb data files and about 10 small config files

sretalla · Feb 8, 2018

If the vast majority of your files are large (seems to be what you are saying), then a much larger block size could help (a bit).

I agree with danb35 that a multiple (more than 1) VDEV per pool strategy is what you need to really improve performance.

Make sure you understand and assess your need for redundancy as you would effectively increase the chances of losing everything by reducing to a single pool.

sretalla · Feb 8, 2018

Also remember for LAGG, 1Gbit + 1Gbit = 1Gbit per client as the max, not 2, but overall you should be able to see 2Gbits of throughput on the server end if enough clients were at full speed.

tvsjr · Feb 8, 2018

You're also reaching the limits of that CPU. The E5-2609 v3 is a 6-core part... load averages north of 6 are reasons for concern.

beezone · Feb 8, 2018

sretalla said:
Also remember for LAGG, 1Gbit + 1Gbit = 1Gbit per client as the max, not 2, but overall you should be able to see 2Gbits of throughput on the server end if enough clients were at full speed.

I know, at this time both physical interfaces are loaded equally.

tvsjr said:
You're also reaching the limits of that CPU. The E5-2609 v3 is a 6-core part... load averages north of 6 are reasons for concern.

My mistake, it's a double CPU config = 12cores. So there is a half of power is available.

Nick2253 · Feb 8, 2018

An outside thought is that you are hitting the limits of your memory. You have over 200TB of raw storage with only 96GB of memory. However, I'm not very experienced tuning systems at these scales, so my intuitions about how much memory you need could very well be wrong.

Important Announcement for the TrueNAS Community.

Storage optimisation

beezone

Cadet

Nick2253

Wizard

tvsjr

Guru

beezone

Cadet

danb35

Hall of Famer

sretalla

Powered by Neutrality

beezone

Cadet

danb35

Hall of Famer

sretalla

Powered by Neutrality

beezone

Cadet

sretalla

Powered by Neutrality

sretalla

Powered by Neutrality

tvsjr

Guru

beezone

Cadet

Nick2253

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

Storage optimisation

Cadet

Wizard

Guru

Cadet

Hall of Famer

Powered by Neutrality

Cadet

Hall of Famer

Powered by Neutrality

Cadet

Powered by Neutrality

Powered by Neutrality

Guru

Cadet

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Storage optimisation"

Similar threads