New build, problematic performance

Minxster

Dabbler
Joined
Sep 27, 2013
Messages
36
Purchased and built already specs:
Supermicro Motherboard: X11SPM-TF for single Socket LGA3647 CPU
CPU: INTEL Xeon SILVER 4108 1,8GHz socket LGA3647 Scalable, 8 cores, 16 threads, DDR4-2400
64GB RAM: 2x SAMSUNG 32GB DDR4-2400 LRDIMM ECC Registered CL17 Dual Rank
NIC - Onboard dual 10gbe
HBA Card: Broadcom (Avago, LSI) 9305-16i, 16 ports - latest firmware
Hard drives - 16 x He10 HGST 4Kn 10TB SATA 220MB/s
Boot Device: Samsung 960 EVO NVMe M.2 250GB PCIe 3 3.0
FreeNAS : 11.1-U5

Array make-up : Waiting to run either 2 x 8 RAID Z2 or 8 x 2 Mirrored (RAID 10) - For now I'm focusing on the Mirror setup.

This is no my first build, but it is the first one that I'm wanting to push the performance as best I can.

So to the problem : The server has come together well, but am finding serious problems with performance. Before I start looking into NIC performance issues, I've been running 'dd' tests to benchmark the server itself in RAID10

Here is a test done with the 8 x 2 Mirrored setup:
file type size count tot size gbyte process seconds bytes per sec calc to mbytes calc to gbytes
zero 2048k 100k 200 write 133.3 1610142624 1535.6 1.5
zero 2048k 100k 200 read 283.2 758075548 723.0 0.7

I've checked CPU and it's not pushing hard at all, and even the busy % (when on read) is only sitting at around 20%.

For the life of me I can't see why this is so slow and/or what the bottleneck is...

Does anyone have any experience with this hardware that can maybe shed some light on what happening?
 

Attachments

  • RAID10 Setup.png
    RAID10 Setup.png
    36.7 KB · Views: 377
Joined
Dec 29, 2014
Messages
1,135
Before you get info debugging the pool performance, I would suggest that you test the drives using the CLI command diskinfo. I would definitely suggest removing the drives to be tested from the pool first, particularly if you are going to do write testing. See what kind of performance you get there first. That will likely give you an idea where to go next.
 

Minxster

Dabbler
Joined
Sep 27, 2013
Messages
36
Thanks Elliot. If I run the whole array as one big stripe it does perform with high utilisation of all disks.

I saw a post where someone found a bad disk by using gstat -p (?) to identify a disk that was too busy, can't see one drive sticking out like a sore thumb. I did run short smart tests on all drives and they had no issue.

FYI : Unused onboard controllers are all of too.

Sent from my HTC One M9 using Tapatalk
 
Joined
Dec 29, 2014
Messages
1,135
I am not as familiar with the gstat command, but I would be interested in seeing the performance of each disk from a diskinfo -wS to see if they all perform at the same level.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410

Minxster

Dabbler
Joined
Sep 27, 2013
Messages
36
So I've managed to get some time on the server, I've run the diskinfo -t daX which I've attached the results for each disk.

I've gone in to check the HBA firmware version, dmesg | grep mpr, this is also attached, but it's on version 16.0.

While I was at it, I also ran sas3ircu 0 DISPLAY

Edit:
I couldn't get diskinfo -wS daX to work, but I've now found my error. So I am running this now on all drives.
 

Attachments

  • diskinfo.txt
    17.2 KB · Views: 439
  • dmesg - mpr.txt
    313 bytes · Views: 341
  • sas3ircu - info.txt
    13.3 KB · Views: 570
Last edited:

Minxster

Dabbler
Joined
Sep 27, 2013
Messages
36
So I've run more testing on the system, specifically the diskinfo -wS tests for all disks as well as attempting the solnet-array (v2) test.

The solnet-array test did not 100% complete due to the hotel wifi I was using failing while I was remotely using putty. Sadly the server is not at my location so everything has to be remote.

The diskinfo test: to me this seems perfectly fine with consistent results across all of the disks?

The solnet-array test: it seems to indicate that, from what it completed, seems things are good with the disks?

I'll attempt to remote run the solnet script again soon, but is there anything else I'm missing?

I'm currently wondering about re-instaling FreeNAS and using an earlier version to see if the is an actual FreeNAS issue?
 

Attachments

  • diskinfo-wS.txt
    17 KB · Views: 388
  • solnet-array-test-v2.txt
    3.4 KB · Views: 434
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Before you get info debugging the pool performance, I would suggest that you test the drives using the CLI command diskinfo. I would definitely suggest removing the drives to be tested from the pool first, particularly if you are going to do write testing. See what kind of performance you get there first. That will likely give you an idea where to go next.
I second this. I found a problem one time that ended up being, I had one slow drive in my pool that was bringing the overall performance down significantly. After I ran some full drive writes over the drive for a few days, it showed bad sectors that I think were the source of the issue for me. Knowing that all the drives are up to par is a great starting point and the only way to know for sure is to run data destructive write/read/compare testing.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Minxster

Dabbler
Joined
Sep 27, 2013
Messages
36
Thanks Chris, after some extra digging, I suspect you're telling me to use the screen command?

I'll give this a try... What I find confusing with the testing I'm doing is that if I just make one big striped array across all disks, it's very quick. If I had disk issues, why only show up during creation of vdev of stipped 8 x mirrors?
 

Minxster

Dabbler
Joined
Sep 27, 2013
Messages
36
Ok so I've run the solnet v2 script now for around 16 hours. I've checked the logs for any errors, none have certainly been emailed to me. I've attached the output from solnet and I've also included the latest smart report (from smart_report.sh).

I've now started 16 tmux sessions, all running the disk-burnin.sh script for each of the 16 disks.

While I'm still slogging (pardon the pun) amy way through all of these tests, does anyone have any ideas into my issue?

I've attached a screen grab taken from within FreeNAS during DD testing. The first 4 blips on the graph is me running 2 read tests then 2 write tests. The array was 2 x 8 RaidZ2 configuration, the test used bs=2048k and count = 100k...

The next 3 blips are after I've changed to 8 x 2 disk mirrors... The third blip is the small blue smudge that runs from just before 18:20 to 18:40....The write speed is exceptional, but the read just falls flat on it's face.

Am I doing something stupid with me DD commands?
dd if=/dev/zero of=dav_test.txt bs=2048k count=100k
dd of=/dev/null if=dav_test.txt bs=2048k count=100k


I'm desperate to move things along whether it be proving or disproving the issue(s) bottlenecks... I've upgraded the OS to 11.2 BETA1, and once I get the burning tests complete, I'll re-install from fresh using 9.xx. To try and see if there a a strange bug within FreeNAS.
 

Attachments

  • smart_report.txt
    35.4 KB · Views: 322
  • solnet-test-16 hours.txt
    8 KB · Views: 426
  • 2 x 8 RaidZ2 - vs - 8 x 2 disk mirrors - graphs.png
    2 x 8 RaidZ2 - vs - 8 x 2 disk mirrors - graphs.png
    333.5 KB · Views: 364

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Unless you've disabled compression on the dataset you're writing to, the dd tests are meaningless. The disks themselves seem to be working just fine, according to jgreco's script.
 

Minxster

Dabbler
Joined
Sep 27, 2013
Messages
36
Unless you've disabled compression on the dataset you're writing to, the dd tests are meaningless. The disks themselves seem to be working just fine, according to jgreco's script.

Thanks Eric, yes compression is off while I'm running the DD tests. Its the first thing I do on a build and the last thing to re-enable (usually) before it goes into production.

On any previous builds I've always started with some short DD testing, then moved across to NICs to get the best performance without going OTT on tuning parameters. But this build just has me completely stumped, and no I have no tuning at all, and the auto tune is off; I usually don't use the auto tune.

For this build, I wanted to go the whole-hog and run a stripe of mirrors but fell at the first hurdle. I've double checked I'm creating the vdevs/pool correctly and even resorted to manually creating it just-in-case something else was amiss. I spent a day manually creating the array to make sure the paired mirrors where running on separate channels on the HBA just to try and rule that out, which took a few different configurations. But nothing I have done actually makes the stripped mirrors work.

I've attached a screen image in a previous post with the vdevs/pool all setup, but to confirm what I'm doing, I've took another image just before clicking 'add volume'.

I still have the disk-burnin.sh running through tmux sessions, which is seems to think i'll be finished by 6am. The latest smart_report.sh shows everything is ok so far.

I'd love to eat my hat because I've done something stupid, but its a fresh install with only the basics :confused:
 

Attachments

  • StripedMirrors.png
    StripedMirrors.png
    30.1 KB · Views: 306

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Do I understand correctly that you had a real-world workload that was behaving poorly, prior to these tests? If so, what was it?
 

Minxster

Dabbler
Joined
Sep 27, 2013
Messages
36
Do I understand correctly that you had a real-world workload that was behaving poorly, prior to these tests? If so, what was it?

I didn't post before but we ran real world testing with davinci resolve. Purley uploading files then an attempt to read them back. The write was perfectly fine but the read just failed catastrophicly, rendering the app useless.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Okay, at least that's consistent. Do you get any messages while the benchmark is running?

Is the firmware on the HBA up to date? I'll try to think of other avenues of investigation.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I've attached a screen grab taken from within FreeNAS during DD testing ... The write speed is exceptional, but the read just falls flat on it's face.

While you've got a low-clocked CPU, if that was the root of the issue I think that would have choked your RAIDZ2 vdev reads as well. And you've done this with compression off as well, so there's even less to get in the way from that front.

The fact that just changing from Z2 to mirrors seems to be enough to trash your performance is what's got me really confused. It's normal to have lower sequential performance but not that much lower.

I hate to say it, but have you tried that reinstall yet? Or even (caution, heresy ahead) a different OS with ZFS still in play?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
CPU: INTEL Xeon SILVER 4108 1,8GHz socket LGA3647 Scalable, 8 cores, 16 threads, DDR4-2400
I overlooked the clock speed on the CPU. What does CPU utilization look like during the test? The low clock could also affect network traffic when it comes to that.
 
Top