Performance and stability help

turnrye · Feb 1, 2020

Hi!

I have a freenas install with some decent hardware for my homelab. I access it using NFSv4 from other hardware for compute. I'm having issues where all of my databases hosted on it over time end up getting locked, and looking at iostat there is an extremely high iowait. Even when I don't have any applications on it, I find that the best performance I can get moving large single files (30 GB) is around 90 MB/s on the NAS itself.

Also, I find that after a few days, the freenas web UI stops being responsive, and SSH starts hanging as well. Even local console hangs after displaying the options menu -- I cannot reboot or drop in a shell. I end up having to clear this by shutting down all of my nfs clients and then hard power cycling the nas.

I have one ZFS pool that contains 4 vdevs, each with a simple 1x1 mirror. There's a 512GB nvme 970 evo being used as a cache and a 480GB Intel Optane SSD 900P being used as a log. I find that the cache and log have a negligible impact on my performance. The vdev drives are 8 TB each, all 7200 rpm NAS or enterprise variety. SMART tests cleanly on all drives.

The base hardware is a HP DL380 G8p with 32 GB of ram. Freenas is installed on an internal SD card (Trascend media rated to 400 MB/s read). I've bypassed the built in HP RAID controller and am using an LSI SAS2308 controller plugged into the built-in HP SAS expander using both ports.

I'm using a SFP+ DAC module, and iperf shows network to not be the bottleneck with 7+ Gbps TCP being thrown in both the send and receive directions.

Code:

NAME                                     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
freenas-boot                            29.5G  1021M  28.5G        -         -     0%     3%  1.00x  ONLINE  -
  da8p2                                 29.5G  1021M  28.5G        -         -     0%     3%
media                                     29T  14.2T  14.8T        -         -     3%    48%  1.00x  ONLINE  /mnt
  mirror                                7.25T  3.04T  4.21T        -         -     4%    41%
    gptid/2fa3a238-dcb9-11e9-9b80-2c44fd94cfbc      -      -      -        -         -      -      -
    gptid/30974b70-dcb9-11e9-9b80-2c44fd94cfbc      -      -      -        -         -      -      -
  mirror                                7.25T  3.06T  4.19T        -         -     3%    42%
    gptid/3194dbe8-dcb9-11e9-9b80-2c44fd94cfbc      -      -      -        -         -      -      -
    gptid/328c9c6e-dcb9-11e9-9b80-2c44fd94cfbc      -      -      -        -         -      -      -
  mirror                                7.25T  4.59T  2.66T        -         -     6%    63%
    gptid/82e892f9-ebb5-11e9-80ce-2c44fd94cfbc      -      -      -        -         -      -      -
    gptid/84066d34-ebb5-11e9-80ce-2c44fd94cfbc      -      -      -        -         -      -      -
  mirror                                7.25T  3.50T  3.75T        -         -     2%    48%
    gptid/252f1109-3af5-11ea-a237-2c44fd94cfbc      -      -      -        -         -      -      -
    gptid/27917278-3af5-11ea-a237-2c44fd94cfbc      -      -      -        -         -      -      -
log                                         -      -      -         -      -      -
  gptid/af8851cf-fb7f-11e9-80c0-2c44fd94cfbc   444G   684K   444G        -         -     0%     0%
cache                                       -      -      -         -      -      -
  gptid/ad393d45-fb7f-11e9-80c0-2c44fd94cfbc   466G  12.6G   453G        -         -     0%     2%

Here are some stats with compression turned off, log and cache still in place.

Write

root@freenas[/mnt/media/test]# dd if=/dev/zero of=/mnt/media/test/blank bs=4M count=10000
10000+0 records in
10000+0 records out
41943040000 bytes transferred in 58.968424 secs (711279650 bytes/sec)

Read

root@freenas[/mnt/media/test]# dd of=/dev/zero if=/mnt/media/test/blank bs=4M count=10000
10000+0 records in
10000+0 records out
41943040000 bytes transferred in 77.398446 secs (541910620 bytes/sec)

Remove the cache d0 and log:

Write

root@freenas[/mnt/media/test]# dd if=/dev/zero of=/mnt/media/test/blank bs=4M count=10000
10000+0 records in
10000+0 records out
41943040000 bytes transferred in 60.318331 secs (695361410 bytes/sec)

Read

root@freenas[/mnt/media/test]# dd of=/dev/zero if=/mnt/media/test/blank bs=4M count=10000
10000+0 records in
10000+0 records out
41943040000 bytes transferred in 81.975561 secs (511652979 bytes/sec)

So, single writes and reads on the server itself seem great, but add my real use and things go south. I imagine it's due to having one pool that's covering both DBs and large media. Maybe I got some bad advice and wrongly thought that the log + cache would soften the impact there.

Here's what I'm thinking of doing: move the optane to its own pool, backup to the 970 evo, and shift all of my fragmented + IO heavy use over there. Keep the bigger sequential storage over on the spinning media. Before I go through that effort, is this the "right" layout I should consider? Am I missing a completely separate issue in my setup?

Heracles · Feb 1, 2020

Hi,

From what I see, I doubt your plan will help you that much...

You did not put anything about deduplication. I hope you are not using it, but if you are, you have a serious suspect here.

The thing that does show is your pool usage. When looking at your vdevs, one is above 60% and the average is close to 50%. That means FreeNAS will have to sync its drives twice 50% of the time when it will need to write stuff. Because more than 50% of the space is used, you have more than 50% chance of hitting an already used block when you need to write something. Because of that, FreeNAS will have to sync a second time to find a different spot. Even there, you have another 50% chance of hitting another used block. So 1 time out of 4, FreeNAS will have to sync a third time or more.

I expect to defrag the pool will be very short term benefit. ZFS is copy-on-write. That means when a new thing will have to be written, FreeNAS will search a spot to save a complete copy of it before releasing the previous blocks (if not locked by a snapshots....). So Yes, for a short time, defrag should help find large free spot. But databases being heavy on writes and NFS requiring long files to be written, I expect fragmentation will come back with a vengeance in no time.

There are performance measurements posted here and there in the forum about that. @jgreco did many of them. Performance drops are measured from as low as 10%, are significative at 25% and when down to 50%, it can be major. If you are doing mostly reads from a pool, you can have good performance up to and beyond that. But for a heavy writer, to have plenty of free space is important.

turnrye · Feb 1, 2020

@Heracles thanks for the response. I'm exploring moving one of my DB loads over to the new SSD pool, but I haven't had a chance to note the impact yet.

Thanks for the tips regarding utilization. In my application, while I want to retain the data, I'm willing to accept a bit of risk on this main storage volume. This makes me think I could switch to a raidz2 setup, get about 41 TB of usable storage, and drop my usage down to ~35%. What're your thoughts on that idea? That would be a bigger utilization impact than just adding another mirror to my current layout, and even if I buy a spare it'd cost less.

Heracles · Feb 1, 2020

Hi again Turnrye,

To move from mirrors to Raid-Z2 will reduce the disk space usage for sure, but there is a catch here... Mirrors are way better than Raid-Z at providing IOPS and that is what you need for random access. A database is doing random access (as opposed to sequential read - write), so will benefits from the mirrors. You can consider that pool as the performance of as many vDevs you put in it. If you have 4 mirrors, that pool has the performance of 4 drives. If you build it with a single Raid-Z vDev, you end up with the performance on a single drive.

Between the 2, I would stay with mirrors but that is more instinct. Objective performance measurement is so difficult to achieve, it is almost impossible to do.

To upgrade the 2 drives in your vDev loaded at 60+ % would probably be better :
--you stay with mirrors
--That more heavily loaded vDev is the one that is eased the most and instantly
--Once online, ZFS will spread the load to re-balance everything slowly everytime it will need to write
--Only 2 drives needed
--No need for more bays
...

To add an extra mirror would be another option if you can.

But to switch to Raid-Z2 is surely not a clear winner in your case, if it is a winner at all....

turnrye · Feb 1, 2020

Thanks Heracles. I'll snag a pair of 12TBs and throw them in as a new mirror, we'll then see the impact on the overall performance.

I found that like you anticipated, shifting the DBs over to a separate SSD-only didn't have a significant impact on the overall performance issues.

PS: trying to shift the SSDs back now, and FreeNAS shows on the Media pool's status page an entry named "HOLE". WHenever I try to extend to add the log or the cache, I get this error now:

Code:

Error: Traceback (most recent call last):

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 219, in wrapper
    response = callback(request, *args, **kwargs)

  File "./freenasUI/api/resources.py", line 1421, in dispatch_list
    request, **kwargs

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 450, in dispatch_list
    return self.dispatch('list', request, **kwargs)

  File "./freenasUI/api/utils.py", line 252, in dispatch
    request_type, request, *args, **kwargs

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 482, in dispatch
    response = method(request, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 1384, in post_list
    updated_bundle = self.obj_create(bundle, **self.remove_api_resource_names(kwargs))

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 2175, in obj_create
    return self.save(bundle)

  File "./freenasUI/api/utils.py", line 493, in save
    form.save()

  File "./freenasUI/storage/forms.py", line 282, in save
    return False

  File "./freenasUI/storage/forms.py", line 273, in save
    pool = c.call('pool.update', add.id, {'topology': topology}, job=True)

  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 513, in call
    return jobobj.result()

  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 276, in result
    raise ClientException(job['error'], trace={'formatted': job['exception']})

middlewared.client.client.ClientException: 'HOLE'

SweetAndLow · Feb 1, 2020

I would try to remove the slog and l2arc device. Turn sync off on the dateset your have your database on and see if that helps. This is not a good way to run things but it will test the best case scenario. If it works you know your system can handle it. If it doesn't work you are overloading your system and you'll need more tuning.

Jailer · Feb 1, 2020

I would add more memory before any other changes are made. You should at least double the memory you have now before you try anything else.

turnrye · Feb 2, 2020

@Jailer makes sense, just upgraded to 80 GB of ram with some memory out of a donor machine. Let's see what difference it makes in application after a day or so.

Still havent been able to add the log and cache back due to that "HOLE" issue I mentioned in my previous post. Would love tips from any people experienced with what this may be.

turnrye · Feb 3, 2020

Living with 80GB of ram for a day, no noticeable improvement.

Clearly my performance is limited by something else, but I don't know how to troubleshoot this better.

How can I benchmark this fragmented read write load?

Disabling sync had a noticeable improvement to performance but it wasn't immediate.

SweetAndLow · Feb 3, 2020

turnrye said:
Disabling sync had a noticeable improvement to performance but it wasn't immediate.

What do you mean not immediate? If this made things usable then we know the problem is your slog. It's just not fast enough to handle the operations.

turnrye · Feb 5, 2020

Things improved, but only after about 30 minutes. This is resulting in performance comparable to when I had the intel optane running as a log. I have not been able to add the device back still due to the error noted in post #5 on this thread. Any thoughts on that issue?

Note that on this Im still depending on real users and perceptions rather than an intentional benchmarking strategy. Would love any advice on how to simulate these fragmented IO intensive operations so that I can try more changes to the system (perhaps my sas card?).

turnrye · Feb 6, 2020

So, the additional drives arrived, and this time I went for 14 TB drives to save some space on the chassis. Unfortunately, I am now facing the same "HOLE" error when trying to event extend the pool. Please see the screenshot, which references the same stacktrace in my post above. Any help from those more familiar with zfs and FreeNAS is appreciated.

ralphte · Feb 7, 2020

I have just updated to FreeNas 11.3 and i am running to the exact same problem. I have ensured the drives are wiped. Ful error output

Code:

Error: Traceback (most recent call last):

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 219, in wrapper
    response = callback(request, *args, **kwargs)

  File "./freenasUI/api/resources.py", line 1421, in dispatch_list
    request, **kwargs

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 450, in dispatch_list
    return self.dispatch('list', request, **kwargs)

  File "./freenasUI/api/utils.py", line 252, in dispatch
    request_type, request, *args, **kwargs

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 482, in dispatch
    response = method(request, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 1384, in post_list
    updated_bundle = self.obj_create(bundle, **self.remove_api_resource_names(kwargs))

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 2175, in obj_create
    return self.save(bundle)

  File "./freenasUI/api/utils.py", line 493, in save
    form.save()

  File "./freenasUI/storage/forms.py", line 282, in save
    return False

  File "./freenasUI/storage/forms.py", line 273, in save
    pool = c.call('pool.update', add.id, {'topology': topology}, job=True)

  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 513, in call
    return jobobj.result()

  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 276, in result
    raise ClientException(job['error'], trace={'formatted': job['exception']})

middlewared.client.client.ClientException: 'HOLE'

ralphte · Feb 7, 2020

Looks like it might be related to this https://jira.ixsystems.com/browse/N...ewared.client.client.ClientException: 'HOLE'"

turnrye · Feb 9, 2020

Thanks @ralphte. This was indeed the problem I faced. I worked around this by doing a send/receive to a new pool and rebuilding there. Hope this thread helps anyone else who upgrades their pool if they face this problem. Workaround is to build a new pool and move your data, which suck.

Now with my new drives added, my pool is at 29% utilization. I installed the new memory with 192 GB of ram. The Intel Optane is back in place as the log, with the 970 EVO ssd as a l2cache as well. I've checked all of the boxes: tons of ram, low utilization, confirmed 10G network. Performance on single big reads and writes is fantastic as you'd expect.

I still find though that basic applications like Plex have seconds of delay in initiating actions compared to when the same information is local on the server. It must be NFS related to these small reads/sync writes, but I don't know how to benchmark that or what common targets I should look into,

Any advice?

Important Announcement for the TrueNAS Community.

Performance and stability help

turnrye

Cadet

Heracles

Wizard

turnrye

Cadet

Heracles

Wizard

turnrye

Cadet

SweetAndLow

Sweet'NASty

Jailer

Not strong, but bad

turnrye

Cadet

turnrye

Cadet

SweetAndLow

Sweet'NASty

turnrye

Cadet

turnrye

Cadet

ralphte

Cadet

Attachments

ralphte

Cadet

turnrye

Cadet

Similar threads