design advice/sanity check

Status
Not open for further replies.

ItsValium

Cadet
Joined
Sep 21, 2015
Messages
9
I'm in the process of repurposing some hardware and building two identical machines for ZFS based storage and need some advice regarding SLOG/L2ARC disc choice and layout. Intended use is NFS based vSphere storage for medium load VM's.

vpshere Cluster consists of 15 dual cpu hosts connected via dual port intel x520-DA2 card to arista switches.

1st the hardware allready available :
  • Supermicro 2U 219A-R920WB Chassis (16x 2.5'' drive bays)
  • Supermicro X9DRW-iTPF+ Motherboard
  • 2x Intel Xeon E5-2680 2.7Ghz (3.5Ghz Turbo) 8 Core (16 Thread) Processors
  • 128GB DDR3 ECC Registered Server Memory (8*16GB PC3L-10600R)
  • 2*LSI 9200-8i
  • 1* LSI 9200-8e
  • Integrated Dual Port Intel i350 1Gbps Ethernet Ports
  • Integrated Dual Port Intel 82599 (Same as X520) 10Gbps SFP+ Ports
  • Dual Redundant Platinum efficient 920W PSUs
  • 2* IBM exp2524 external chassis connected to the LSI 9200-8e each on one port, containing 24 10K RPM SAS drives, so a total of 48 drives.
FreeNAS 9.10 will be installed on two 64 GB disks each connected to one of the internal LSI controllers. Due to the internal riser cards I can't use NVMe based p3700 cards for cache. So I'm stuck with SATA/SAS on the internal controllers.

As I have no experience with multiple drives for both SLOG and L2ARC, I'm looking for advice. For SLOG drives I was thinking about 2 to 4 Intel S3710 200GB drives, and for L2ARC I have a lot of 240 GB Samsung SM863 drives available.

What about layout? Can anyone advise? I have 14 drive slots spread over both controllers equally available to support SLOG and L2ARC drives. Before I go out and buy the wrong SSD's or the wrong amount for the wrong layout I could use some help.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Due to the internal riser cards I can't use NVMe based p3700 cards for cache.
What's the problem, exactly? The SSDs don't fit on the risers? If that's the case, a simple U.2 adapter card would probably fit and allow you to relocate the SSD (in 2.5" forrmat) elsewhere.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
This magnitude of a build would probably benefit from advice by the resident grinch.. @jgreco
 

ItsValium

Cadet
Joined
Sep 21, 2015
Messages
9
What's the problem, exactly? The SSDs don't fit on the risers? If that's the case, a simple U.2 adapter card would probably fit and allow you to relocate the SSD (in 2.5" forrmat) elsewhere.

No the riser are proprietary WIO stuff from supermicro and according to the datasheet they don't support NVMe stuff. (Riser is RSC-R2UW+-2E16-2E8 and riseroverview from supermicro is at https://www.supermicro.com/ResourceApps/Riser.aspx). Maybe I might interpret that wrong though.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No the riser are proprietary WIO stuff from supermicro and according to the datasheet they don't support NVMe stuff. (Riser is RSC-R2UW+-2E16-2E8 and riseroverview from supermicro is at https://www.supermicro.com/ResourceApps/Riser.aspx). Maybe I might interpret that wrong though.

I think you interpret that wrong. It means it doesn't have NVMe ports. I'm not sure how you could selectively break PCIe so that NVMe wouldn't be possible but average PCIe cards would still work.

Try looking for the risers /with/ NVMe and take a look at them.

Or put it this way. All the NVMe stuff here are PCIe cards plugged into risers that say "No" on that compatibility list. Works.
 

ItsValium

Cadet
Joined
Sep 21, 2015
Messages
9
I think you interpret that wrong. It means it doesn't have NVMe ports. I'm not sure how you could selectively break PCIe so that NVMe wouldn't be possible but average PCIe cards would still work.

Try looking for the risers /with/ NVMe and take a look at them.

Or put it this way. All the NVMe stuff here are PCIe cards plugged into risers that say "No" on that compatibility list. Works.
Oooh well that changes things around a bit.

I was thinking the following layout :
2 pools per machine consisting of 12 disks from each enclosure in mirrored vdevs (24 disks total) with 1 p3700 as SLOG and 4 of the samsungs for L2ARC

this layout for both machines all mounted to the vSphere cluster seems like the best for performance, am I right or is there a different way?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Sounds like excessive L2ARC for 128GB of RAM, which isn't a good thing. 480GB sounds more in line with what is usually recommended.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'm not sure how you could selectively break PCIe so that NVMe wouldn't be possible but average PCIe cards would still work.
The only detail would be the port distribution for which the board is wired. 2x 16 wouldn't be very useful for PCI-e SSDs, but it has to work.

What would be cool is a PCI-e card (say x16) with a PCI-e switch and a bunch of U.2 sockets, SAS expander style. I know Supermicro does this in their fancy new PCI-e SSD servers on the backplane, but I haven't seen any generic implementation.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Sounds like excessive L2ARC for 128GB of RAM, which isn't a good thing. 480GB sounds more in line with what is usually recommended.

I've been pushing 1TB L2ARC on 128GB and it's fine as long as the volblocksize isn't too small.

Seems processor-heavy but maybe if you can score some $80 Xeon E5-2670's or 2680's that makes sense.

The x16 riser, will one of the x8 risers work on that board? I don't have time to go look, but you'd probably be better off with the left riser being a 4 slot x8 riser if that's supported.
 

ItsValium

Cadet
Joined
Sep 21, 2015
Messages
9
@jgreco : Just to clarify a bit :

Each machine allready has 2 e5-2680's in it. There's only 8 out of 24 RAM slots in use so RAM upgrades are an option too.

Regarding PCI-e slots, each machine has 6 PCI-e x8 slots, 4 on 1 riser (2*16,2*8 physical but all are 8*internally wired) and an extra 2*8 on the second riser.

Maybe I shouldn't spend the funds on SSD's but just add more RAM, however I read somewhere that too much RAM might be a bad thing too? Is this changed, it's been a while.

@Ericloewe : I could of course 'cripple' the L2ARC SSD's to half their capacity, lowering L2ARC to about 500 GB in total over 4 SSD's, and at the same time improve the endurance and performance a bit.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
The only detail would be the port distribution for which the board is wired. 2x 16 wouldn't be very useful for PCI-e SSDs, but it has to work.

What would be cool is a PCI-e card (say x16) with a PCI-e switch and a bunch of U.2 sockets, SAS expander style. I know Supermicro does this in their fancy new PCI-e SSD servers on the backplane, but I haven't seen any generic implementation.
The only detail would be the port distribution for which the board is wired. 2x 16 wouldn't be very useful for PCI-e SSDs, but it has to work.

What would be cool is a PCI-e card (say x16) with a PCI-e switch and a bunch of U.2 sockets, SAS expander style. I know Supermicro does this in their fancy new PCI-e SSD servers on the backplane, but I haven't seen any generic implementation.

Yes. There is this quad M.2 adapter, not sure if its PCIe3 though, it supports 4 4 lane M2 slots on a x16 slot.
squid_left.jpg


Intel has a 16x -> 4 x 4x U2, not sure if it works in normal motherboards though.

http://www.servethehome.com/4-solutions-tested-add-2-5-sff-nvme-current-system/

PCIe-to-SFF-8643-options.jpg



I guess, technically, the ASUS HyperKit, would work in the Amfeltec SQUID, but it'd use a bunch of PCI slots because of its physical size.

The M2 cards would be an option for 10GB/s of L2ARC though. If the Amfeltec is PCIe3.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
@jgreco : Just to clarify a bit :

Each machine allready has 2 e5-2680's in it. There's only 8 out of 24 RAM slots in use so RAM upgrades are an option too.

Regarding PCI-e slots, each machine has 6 PCI-e x8 slots, 4 on 1 riser (2*16,2*8 physical but all are 8*internally wired) and an extra 2*8 on the second riser.

Maybe I shouldn't spend the funds on SSD's but just add more RAM, however I read somewhere that too much RAM might be a bad thing too? Is this changed, it's been a while.

@Ericloewe : I could of course 'cripple' the L2ARC SSD's to half their capacity, lowering L2ARC to about 500 GB in total over 4 SSD's, and at the same time improve the endurance and performance a bit.

Perhaps this: http://www.zfsbuild.com/2012/03/02/when-is-enough-memory-too-much/

I believe that's been addressed. I don't think crippling the L2ARC's is necessary. Just add half of them, monitor your system ARC to make sure you're not crushing the ARC, and then add the other two if all looks good. Otherwise, use them as spares or a scratch pool or something like that.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I believe that's been addressed. I don't think crippling the L2ARC's is necessary.
I'm fairly certain that FreeBSD 10.3 includes some additional improvements in that area that weren't in 9.3, so I'd expect it to have been addressed.
 

ItsValium

Cadet
Joined
Sep 21, 2015
Messages
9
Well, that's good news then. I might just set the systems up without any SLOG/L2ARC at first and see how they do (P3700 have to be ordered but have to wait until funds are cleared, SM863 are readily available if needed). I could have a look at first upgrading memory before springing on other hardware.

And all has been delayed because one of the two arista switches died in transport :(
 

ItsValium

Cadet
Joined
Sep 21, 2015
Messages
9
Ok,

Replacement switches got delivered, JBOD chassis are move to location, head servers will be moved tonight/tomorrow and everything connected. No SSD's yet, but everything else should be up and running in the next two days. Anybody interested in numbers/benchmarks before I move these in production?
 

ItsValium

Cadet
Joined
Sep 21, 2015
Messages
9
First test with first system booted up :
12 vdevs of 2 disk mirror distributed over two jbods, no SLOG, no L2ARC

write - no compression
/mnt/Pool1# dd if=/dev/zero of=tmp.dat bs=2048k count=50k
51200+0 records in
51200+0 records out
107374182400 bytes transferred in 69.396228 secs (1547262515 bytes/sec)

write - lz4 compression
/mnt/Pool1# dd if=/dev/zero of=tmp.dat bs=2048k count=50k
51200+0 records in
51200+0 records out
107374182400 bytes transferred in 30.858115 secs (3479609228 bytes/sec)



I will add more numbers as the systems are completed for future reference
 

ItsValium

Cadet
Joined
Sep 21, 2015
Messages
9
Well, long overdue but due to big relationship crisis and ending haven't been around much to deal with all this.

Finally got time and back to a normal mental state and am able to tinker more.

I got both systems in the racks and up and running unfortunately not SSD's in them yet so I started testing the systems with VMWare's IO Analyzer.

The setup is exactly like I described in the startpost hardware wise. I tested with one head server connected to two jbod's via a single LSI card(each external port to one jbod).

FreeNAS 9.10 installed on mirrored 2.5 inch drives in the head-server, connected with both on-board 10 Gbit ports to the Arista switches (LACP enabled). 2 IP's on different subnet's configured on the NIC as alias's.

2 Pools configured with 12 mirrored vdev's containing 1 disk from each enclosure. Two pools total with all 48 drives spread evenly over both pools. Each Pool exported through NFS on one of the seperate subnet's to make most use of LACP.

10 ESXi nodes connected over dual 10 Gbit NIC's (x520-DA2) to the Arista switches. NFS connections made on one port depending on the subnet and fail-over to the other in case link fails. Each node has two VM's configured with the VM IO Analyzer template loaded up and one VM is loaded to the first NFS share, while the second is loaded on the NFS share.

This way each host will push each pool seperately on a seperate subnet from the NFS shares. And load will be evenly spread to both pools too. And a huge load will be generated on the FreeNAS server. Results will appear below as I get them in and I will update this post when I get more results and when I get the SSD's installed later this week, early next week.

I'm seeing really good transfer speeds on the NIC's with sync disabled, so the network bandwith is there and the pool's can push some serious data. Now I'm curious and impatient to get the SSD's in to see what speeds I get when sync is enabled with each pool having an S3710 as SLOG device. See the graph for NIC speeds without sync.

However testing with sync=always is another story, so desperately awaiting the arrival of the SSD's. What does seem a little strange to me is the difference in read speeds between sync=always and sync= standard ...
 

Attachments

  • NIC_Graph.JPG
    NIC_Graph.JPG
    83.7 KB · Views: 291
  • FN3_MixedWorkLoad_NOSYNC.JPG
    FN3_MixedWorkLoad_NOSYNC.JPG
    154.2 KB · Views: 309
  • FN3_MAXThroughput_NOSYNC.JPG
    FN3_MAXThroughput_NOSYNC.JPG
    142.7 KB · Views: 284
  • FN3_MAXIOPS_NOSYNC.JPG
    FN3_MAXIOPS_NOSYNC.JPG
    149.4 KB · Views: 291
  • FN3_MixedWorkLoad_SYNC_ALWAYS.JPG
    FN3_MixedWorkLoad_SYNC_ALWAYS.JPG
    141.2 KB · Views: 277
  • FN3_MAXThroughput_SYNC_ALWAYS.JPG
    FN3_MAXThroughput_SYNC_ALWAYS.JPG
    137.9 KB · Views: 270
  • FN3_MAXIOPS_SYNC_ALWAYS.JPG
    FN3_MAXIOPS_SYNC_ALWAYS.JPG
    135.1 KB · Views: 253
  • FN3_MAXIOPS_SYNC_STANDARD.JPG
    FN3_MAXIOPS_SYNC_STANDARD.JPG
    142.1 KB · Views: 258
  • FN3_MAXThroughput_SYNC_STANDARD.JPG
    FN3_MAXThroughput_SYNC_STANDARD.JPG
    138 KB · Views: 255
  • FN3_MixedWorkLoad_SYNC_STANDARD.JPG
    FN3_MixedWorkLoad_SYNC_STANDARD.JPG
    144.6 KB · Views: 261
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Status
Not open for further replies.
Top