Scrub after 5 hours on FN 11.2-U6/U7 slows down

RegularJoe · Nov 28, 2019

Hi All,

I might be looking at the wrong data or in the wrong place, I am really just looking to make a baseline to see how this system performs as it ages and to understand when the performance is not good when to look and what to look for.

I am looking at this based on the assumption that longer and slower scrubs mean slower disk results for clients. If I know that 24tb of data can be scrubbed in 6 hours and after xx days my real data is taking 12/24/48 hours I need to look at fragmentation or some other issue.

I looked at another pair of FN nodes that serve NFS 3 files to VMware for VDP backup and they are fragmented so that scrubs take a long time even with very little data.

I have a server that I am testing before putting into production. I have filled the disk with 78% of random data and I do a scrub. For now I have tested a 12 vdev stripe of 3-way mirrors as well as a 1 vdev stripe across two HP SAS D2600 shelves with 3tb Seagate SAS drives. The speed looks good for hours and then I get a slowdown all the way to 0 megabytes per minute and then back to 1.45 gig per minute on the zpool. Nothing is using these pools when I am running. I have a Dell R720xd with 192gb of ram and LSI HBA's running firmware 20.

I also observed this issue on FN 11.2-u6
I do not have any other smart tests or scrubs going on at the same time.
I have looked at the legacy interface for the hard drive temperature reporting and all looks consistent for all drives.
I have a short smart test running every day at 7 am
NetData is not running as a service
Compression is off

You can try it for yourself but I think the magic number is to be able to scrub for at least 6 hours or more, the location of testfile.dat should be on a separate zpool that is very fast.
dd if=/dev/random of=/testfile.dat bs=1M count=10240
for i in {1..2500}; do cp -v /testfile.dat "testfile$i.dat"; done

Tonight I am testing both vola and volb scrubbing at the same time to see if the 5 hour mark happens at the same time on both pools.

RegularJoe · Dec 1, 2019

33 views and no suggestions? The only ODD thing that might not be mainstream on this is I am using a LSI9201-16e SAS2116 4 port HBA to run the external SAS shelves(HP D2600 SAS AJ950A 0150). I am not sure how to view the version of firmware on the SAS expander in the shelves, camcontrol devlist does not seem to show that.

JoeAtWork · Mar 1, 2022

Newer versions of TrueNAS have better disk metrics. On my Dell R720xd servers I see times when the disk busy/IO go up a lot and sawtooth down to something very low. This looks like caching. I have also seen where a PCIe 3 slot has a PCIe 2 card and the card has a bridge in it to run two devices in one slot. The two examples I have are the LSI 9201-16e(6gbps SAS/SATA) and a Qlogic QLE2564(8gb fiber channel).

Important Announcement for the TrueNAS Community.

Scrub after 5 hours on FN 11.2-U6/U7 slows down

RegularJoe

Patron

RegularJoe

Patron

JoeAtWork

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

Scrub after 5 hours on FN 11.2-U6/U7 slows down

RegularJoe

Patron

RegularJoe

Patron

JoeAtWork

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Scrub after 5 hours on FN 11.2-U6/U7 slows down"

Similar threads