Supermicro X10SRH-CLN4F Server Performance Tests - Weird Results?

Status
Not open for further replies.

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
So where is the backplane update? The suspense is killing me. :smile:
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
The TrueNAS has been great, though, and it's been a pleasure to work with iX support. It's a heck of a lot cheaper than the cool million we dropped on our EMC Isilon cluster, which has been an absolute nightmare to administer.

Whilst you're waiting @HeloJunkie for the BP :) Don't want to derail your thread but I noticed that @depasseg can see a lot of stuff about the enclosure here, but I don't think FreeNAS exposes any of this in the web pages.

@souporman what sort of enclosure monitoring does TrueNAS show in the web pages - apparently it's su? Can you blink the light on faulty units, determine exact drive location etc.?
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
@leonroy

I am not sure if there is some method (beyond what I use) to determine the drive locations, but I could not find one.

First, the sas2ircu does not work for me. It tells me that there are no controllers found when issuing the 'list' command. I assume that this is because the LSI card has been flashed to IT mode, but I do not know for sure. I can tell you that it does not work on any system that I have with an LSI card.

When I build my system and I get ready to do all of my testing, one of the first things that I do is run the drive_id script that @Bidule0hm wrote (read more about his scripts here). It is a great script that shows you the device, GPTID and the serial number of your drives in a nice printout:

Code:
[root@plexnas] ~# ./plexnas_drive_id.sh | tee plexnas_drive_id_report

+========+============================================+=================+
| Device | GPTID                                      | Serial          |
+========+============================================+=================+
| da0    | gptid/2ce67c64-e537-11e4-9e81-0cc47a31abcc | PK2338P4H9LXSC  |
+--------+--------------------------------------------+-----------------+
| da1    | gptid/2fc398e3-e537-11e4-9e81-0cc47a31abcc | PK2338P4HAN6RC  |
+--------+--------------------------------------------+-----------------+
| da2    | gptid/328d1f97-e537-11e4-9e81-0cc47a31abcc | PK2338P4HAMVHC  |
+--------+--------------------------------------------+-----------------+
| da3    | gptid/35587ffe-e537-11e4-9e81-0cc47a31abcc | PK1334PCJZNLWX  |
+--------+--------------------------------------------+-----------------+
| da4    | gptid/381b15ea-e537-11e4-9e81-0cc47a31abcc | PK2338P4HAMM8C  |
+--------+--------------------------------------------+-----------------+
| da5    | gptid/3ae87504-e537-11e4-9e81-0cc47a31abcc | PK1334PCK45B2S  |
+--------+--------------------------------------------+-----------------+
| da6    | gptid/3d92278a-e537-11e4-9e81-0cc47a31abcc | PK1334PCK8VD5X  |
+--------+--------------------------------------------+-----------------+
| da7    | gptid/404a1422-e537-11e4-9e81-0cc47a31abcc | PK1334PCK70VVX  |
+--------+--------------------------------------------+-----------------+
| da8    | gptid/430d39f3-e537-11e4-9e81-0cc47a31abcc | PK2338P4HANJRC  |
+--------+--------------------------------------------+-----------------+
| da9    | gptid/45d3d4e3-e537-11e4-9e81-0cc47a31abcc | PK1334PCKB8E2X  |
+--------+--------------------------------------------+-----------------+
| da10   | gptid/4898e09b-e537-11e4-9e81-0cc47a31abcc | PK2338P4HAMMXC  |
+--------+--------------------------------------------+-----------------+
| da11   | gptid/4b626dba-e537-11e4-9e81-0cc47a31abcc | PK2338P4H9LW8C  |
+--------+--------------------------------------------+-----------------+
| da12   | gptid/c3c694ad-de92-11e4-88e8-0cc47a31abcc |                 |
+--------+--------------------------------------------+-----------------+


Now I know exactly what drive ID corresponds to what GPTID and drive serial number. I also now have a copy of that report (tee plexnas_drive_id_report) so that the next time I need that info, I just log in and look at the report file.

Once I have that information, I confirm the exact location of each drive in the chassis by running:

dd if=/dev/zero of=/dev/da0 bs=8M count=25000

then I go look and see which drive has the activity light on and mark that drive as drive '0'. I do this for each drive that I have in the system, each time I build a system.


Now when I need to know where a particular drive is I have the serial number of the drive, the device name, the GPTID and the exact location in the chassis. All of this this information also goes into a spreadsheet that I keep for each server.

There might be an easier way, but this is how I do it.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Be careful to not rely on the device name (da0, da1, ...) because it can change from reboot to reboot. I've put here just in case it can be useful for some cases. I've now added the warning to the useful scripts thread.

I think the best thing you can do is to put a label on each drive with the 4 or 5 first digits of the GPTID and the 4 or 5 last digits of the serial. It doesn't takes to much digits so it's easy to print and easy to read (and easy is always good when you have a failing server in the hands... :p)

Also, when you do dd if=/dev/zero of=/dev/da0 bs=8M count=25000 it's risky because you destroy the data on this drive and if it's the wrong one...

What I'd recommend is to use a read operation dd if=/dev/da0 of=/dev/null bs=1M count=50k or a read/write of the same data dd if=/dev/da0 of=/dev/da0 bs=1M count=50k ;)
 
Last edited:

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
@leonroy
First, the sas2ircu does not work for me. It tells me that there are no controllers found when issuing the 'list' command. I assume that this is because the LSI card has been flashed to IT mode, but I do not know for sure. I can tell you that it does not work on any system that I have with an LSI card.

Thanks for the very useful info. Guess as a community we need to figure out a better way though :)

Regarding the sas2ircu utility here's what I get with an IBM M1015 flashed to IT mode:

Code:
[root@mercury] /mnt/volume1/scripts# ./sas2ircu list
LSI Corporation SAS2 IR Configuration Utility.
Version 18.00.00.00 (2013.11.18)
Copyright (c) 2009-2013 LSI Corporation. All rights reserved.


         Adapter      Vendor  Device                       SubSys  SubSys
Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
-----  ------------  ------  ------  -----------------    ------  ------
   0     SAS2008     1000h    72h   00h:02h:00h:00h      1000h   3020h
SAS2IRCU: Utility Completed Successfully.
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
Well this is not good....new backplane arrived and I installed it - and this is what I see again:


Code:
[root@plexnas] ~# iostat -C -w 2 -d -t da /dev/da0
             da0              da1              da2             cpu
  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
 128.00 1296 162.03   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 1301 162.67   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 1299 162.42   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 1300 162.54   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 1338 167.29   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 1282 160.30   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 1300 162.48   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 
Second read started:

 128.00 592 73.96   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 342 42.79   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 242 30.23   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 322 40.23   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 303 37.86   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 325 40.60   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100
 128.00 337 42.10   0.00   0  0.00   0.00   0  0.00   0  0  0  0 100


I am at a loss. This is what I tried so far:

1) Replaced the hard drives with different manufacture drives, different sizes, different speeds and even different types (SAS vs. SATA).
2) Replaced the controller with an M1015 controller as opposed to using the onboard LSI3008 controller.
3) Replaced the motherboard with another motherboard also with an M1015 controller.
4) Replaced the SFF cables from the backplane to the controller with new (twice).


I have sent my vendor an email with these results asking them to get SM involved to see where we go from here.
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
Be careful to not rely on the device name (da0, da1, ...) because it can change from reboot to reboot. I've put here just in case it can be useful for some cases. I've now added the warning to the useful scripts thread.

What I found with my latest server is that the device name actually follows the device. One of the tests I did was to swap a bunch of the drive around and the device name (da0) followed the drive every time.

Also, when you do dd if=/dev/zero of=/dev/da0 bs=8M count=25000 it's risky because you destroy the data on this drive and if it's the wrong one...

This is not a problem. If you attempt this command on a drive that is part of a zpool it fails. For example:

Code:
+========+============================================+=================+
| Device | GPTID                                      | Serial          |
+========+============================================+=================+
| ada0   | gptid/174b45b2-e87f-11e4-a506-001517834640 | WD-WCC4EFFXU34F |
+--------+--------------------------------------------+-----------------+
| ada1   | gptid/1858365b-e87f-11e4-a506-001517834640 | WD-WCC4E4XK8S1F |
+--------+--------------------------------------------+-----------------+


[root@scruffy] ~# zpool status vol1
  pool: vol1
state: ONLINE
  scan: none requested
config:

    NAME                                            STATE     READ WRITE CKSUM
    vol1                                            ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       gptid/174b45b2-e87f-11e4-a506-001517834640  ONLINE       0     0     0
       gptid/1858365b-e87f-11e4-a506-001517834640  ONLINE       0     0     0



Code:
root@scruffy] ~# dd if=/dev/zero of=/dev/ada0
dd: /dev/ada0: Operation not permitted

[root@scruffy] ~# dd if=/dev/zero of=/dev/ada2
5568+0 records in
5567+0 records out
2850304 bytes transferred in 1.214688 secs (2346532 bytes/sec)
 

souporman

Explorer
Joined
Feb 3, 2015
Messages
57
Whilst you're waiting @HeloJunkie for the BP :) Don't want to derail your thread but I noticed that @depasseg can see a lot of stuff about the enclosure here, but I don't think FreeNAS exposes any of this in the web pages.

@souporman what sort of enclosure monitoring does TrueNAS show in the web pages - apparently it's su? Can you blink the light on faulty units, determine exact drive location etc.?

Yeah, I think it's su. You sure can blink the lights. It's one of my favorite features about TrueNAS. SAS2IRCU shouldn't work for the SAS3008. SAS3IRCU should, but it doesn't work in FreeBSD for some reason. When I boot up in the UEFI shell and load the EFI version of SAS3IRCU it works like a charm. It's not really useful, since if I have to reboot to use it I might as well just turn the thing off and look at my drives.

When I use 9211-8i (SAS2008) I ID the drives by blinking them with SAS2IRCU. On most of my enclosures I use the onboard SAS3008, so I just write big a big file to the enclosure and pull the one that's not lit up. Ez pz. Here's an interesting note: 5TB and 6TB drives do not light up at all unless there is activity going to them in. Doesn't really matter to me, but it's and interesting note. I actually have a few enclosures with a mixture of 4, 5, and 6TB disks. It's kinda funny looking when a few random disks stay lit up.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
What I found with my latest server is that the device name actually follows the device. One of the tests I did was to swap a bunch of the drive around and the device name (da0) followed the drive every time.

AFAIK the order of the device names is the same as the order of the drives "discovering" but in your case it's maybe in the order of which drive had spun up first. Well, in any case, just don't rely on it :)

This is not a problem. If you attempt this command on a drive that is part of a zpool it fails.

Right, I forgot about that.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Is the BP firmware version the same as your previous one?
Any BIOS or controller settings?
Maybe try a LiveCD instead of FreeNAS?
I'll try to clear out my server and run some tests. But it probably won't be for a week or so.
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
Trying to figure out the firmware issue, don't see anywhere where I can pull a FW revision off of it..actually have a call into SM to find out and to find out if it can be upgraded. There are no BIOS or controller settings that pertain to the BP that I can find.

Great idea on the live cd. The only thing I have not tried is a different OS. Even with a different MB and different controllers I used Freenas (and the same version) to test, but I thought the backplane was passive and didn't require any drivers. Was I wrong? What version of Freenas are you running? If the backplane is not passive and actually uses drivers, it could be a driver issue.

Well, I will try Linux tomorrow and see what happens.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I'm not aware of any Backplane drivers, but isn't there firmware? There has to be a reason that our dmesg results are different.

And your BIOS is still a revision or so ahead of mine. Not sure if that has any impact.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
If you can send me one of your backplanes, I'll try it in mine.
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
The BIOS should not be an issue since I had the same exact problem with the backplane but a different motherboard and controller! As far as firmware, I would imagine that there has to be firmware as complicated as these things look:

Screen Shot 2015-04-21 at 6.02.38 PM.png


We have a call into Supermicro to figure out how to determine the firmware and how to upgrade it.
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
OH I DON'T BELIEVE THIS!!!!

Maybe try a LiveCD instead of FreeNAS?


Thanks @depasseg for the idea. I was so tied up in thinking I had a blackplane issue after using multiple controllers and multiple machines and seeing the same problem across the board I did not think to try a different OS! Well when I installed Ubuntu 14.04LTS Server and ran the same tests, I get completely different results. I see no performance degradation when running multiple reads like I do when running under Freenas. I have done this both with the integrated LSI3008 controller as well as the M1015 controller. Both work flawlessly under 14.04LTS, both fail under Freenas.

BUT - this opens up a whole other can of worms. I took a working M1015 from another supermicro box running the same version of Freenas that I am currently running on plexnas (the box with the sas expander) and ran the test in another server (but still connected to the sas backplane in the plexnas server) and had the exact same problem I am seeing with the LSI3008 controller built into the motherboard. This is what led me to believe that it was the backplane. However the new backplane operates the exact same as the old backplane under Freenas.

So that begs the question - is the driver that runs the LSI3008 the same driver that runs the M1015? I am running the V16 M1015 bios (had to upgrade due to the v15 vs. v16 alert in Freenas). If it is the same driver, then why do I not see the problem until I use the SAS expander and then only in Freenas? And if it is a drive, why does the M1015 work just fine in Freenas while not attached to the SAS expander?

The M1015 running Freenas on two different PC machines (one with Asus MB and one with an MSI motherboard) exhibit the same problem while connected to the Sas backplane. However on those exact same machines, running 14.04LTS, problem is gone.

So now this is pointing to this backplane but only WITH Freenas. Does that seem logical..?

I am waiting on Supermicro to tell me if there is anyway to determine firmware version on the backplane and to see if there is someway to upgrade and/or downgrade for further testing.
 

leonroy

Explorer
Joined
Jun 15, 2012
Messages
77
This sounds more like a driver issue than just a firmware issue. FreeNAS 9.3 insists on a v16 FW for the card since that is also what the shipping LSI driver is validated against. I bet the Ubuntu driver is quite a bit more recent than v16 and could well have a number of fixes in it to handle that particular backplane.

There was a thread on the forum where someone upgraded the LSI driver on 9.3. Could try that if you really want to go all out in testing this.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You can do that, but several people that have upgraded the LSI driver themselves have come back and appologized to the community for not just using P16 firmware like we asked.... P20 didn't work so well for them (and we knew this).
 

souporman

Explorer
Joined
Feb 3, 2015
Messages
57
The driver version for the SAS3008 is v5. It's different than the SAS2008 stuff (M1015, 9211-8i etc).
 

HeloJunkie

Patron
Joined
Oct 15, 2014
Messages
300
This sounds more like a driver issue than just a firmware issue.

Normally I would say this is the case, but @depasseg is running the same hardware, same server, same backplane and same drivers and does not see the issue on his server - I am really stumped. If it were drivers only, why see the issue on two different cards (3008&1015)?
 
Status
Not open for further replies.
Top