Plans to build a 12 drive SSD array

Status
Not open for further replies.

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
I have questions and concerns about this project including issues such as:
  • TRIM
  • Using multiple SATA controllers & how that will impact single network client performance
  • Write cache/buffer
  • Under-provisioning
Server Hardware:
Server Software:
  • FreeNAS
    • ZFS raidz2 (RAID6)
I plan to use the Intel DC S4500 960GB (subject to change).

Questions/Concerns:
  • TRIM
    • In the early days of consumer SATA SSDs it was known that hardware RAID controllers disabled the TRIM function.
      • What about software RAID?
      • What about FreeNAS ZFS RAID specifically? The only difference I know of between software and hardware RAID is that hardware RAID uses a dedicated controller while software uses system CPU/memory.
    • Does the OS need to support TRIM to use it or is it solely a feature of the SSD itself?
      • Would the SSD run TRIM itself on its own schedule or does the OS have to tell it to?
  • Multiple SATAIII Controllers
    • When drives are spanned evenly across multiple controllers is it possible for a single client to see speeds that exceed the limit of a single controller?
      • In theory it should be possible but I won't be surprised if it doesn't workout that way. With multiple clients though there should be perfect scaling.
  • Write Cache/Buffer
    • In the past putting an SSD in the server I've seen abnormal write operations where it will start out strong and drop radically. After a short time it would shoot up again and repeat this cycle until the move completed. It acted as if a high speed cache or buffer were being filled, then emptied to a slower permanent media then started refilling the buffer with more data, etc. Can anyone explain this activity?
  • Provisioning
    • Would any form of provisioning/under-provisioning be recommended or just setup all the usable space in one giant raidz2/RAID6 volume and not worry about degradation?
If anybody has any other input or things I should be wary of don't be afraid to mention it.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Ill try to knock out some of this...
ZFS is Copy on write so TRIM is less of an issue. When rewriting part of a file, ZFS will copy the "block" and write the new version to a new location before it updates the block pointer. You never write over existing blocks.
Using multiple SATA controllers & how that will impact single network client performance
This all depends on your network performance. Even with 10gb/s ethernet, you should have little trouble saturating your network unless you doing low queue depth random IO but you haven't told us how your system will be used so I cant speculate too much.
Write cache/buffer
Not a thing in ZFS. The closest you will get is the ZIL. The only improves performance for synchronous IO and if the SLOG that hold the ZIL is faster than your pool for random IO. In your case you will be looking at top end Optain drives. Regardless you need to do a LOT of reading on this subject.
Under-provisioning
Most SSDs are rated in daliy drive wipes for years. This could be an issue depending on your workload. Please tell us more about how you plan to use your file server
If your looking for the max performance with SSDs... Again depending on use case, you may consider higher clock speeds or even just one CPU to save money and power.
Depends on how your going to use the system... That said this is a good starting number.
SATA Controller: 3xSATAIII SAS 9207-8i HBA
this is not a SATA controller. It's a SAS controller that compatible with SATA. This particular one is fairly popular from what I recall.
When drives are spanned evenly across multiple controllers is it possible for a single client to see speeds that exceed the limit of a single controller?
Yes. Your system will see all the disks and controllers and will manage writing to them. The clients will see your shares as presented by FreeNAS. This abstraction is what ZFS is built for.
With multiple clients though there should be perfect scaling.
There is never perfect scaling. Not with storage, and not with networking.
In the past putting an SSD in the server I've seen abnormal write operations where it will start out strong and drop radically. After a short time it would shoot up again and repeat this cycle until the move completed. It acted as if a high speed cache or buffer were being filled, then emptied to a slower permanent media then started refilling the buffer with more data, etc. Can anyone explain this activity?
This is cache thrashing. Its happens with poor design. ZFS has number of tunable parameters to prevent this. (not in the GUI) You have settings list TXG time outs and write rate limits.
Would any form of provisioning/under-provisioning be recommended or just setup all the usable space in one giant raidz2/RAID6 volume and not worry about degradation?
You can make a RAIDz2 (like RAID 6) that is 20 drives "wide" it you want but it's generally considered a bad idea. The more drives in a vdev the more likely one will fail during a rebuild. This is especially a concern if you are using large drives (3TB+) as teh rebuild times can become extremely long. Two RAIDz2 vdevs striped would be safer but then level of redundancy/risk you care to take depends on you and what your using FreeNAS for.

Unless your running 10gb ethernet or need extremely high IOPS, a 12 drive all flash array seems a bit unnecessary. Also don't expect to use all for network ports and get 4gb throughput to any single client, even if they are 10gb as there are a number if quirkes with LACP and how aggregate bandwidth works depending on how you plan to use your NAS. As you may have noticed, most of our feedback is dependant on how you plan to utilize your system.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
FreeBSD supports TRIM/UNMAP with ZFS. OpenZFS support for it is in the works on other OSes.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Again depending on use case, you may consider higher clock speeds or even just one CPU to save money and power.
I should mention that the whole server is built and has been running since January of 2016. This whole thread is just to inquire about build the array but I figured people would like to know the hardware being used so I listed & linked them.

That said this is a good starting number.
Ha, yeah I went overkill here. I bought it before RAM seriously overinflated in cost so I'm kind of glad I was able to stock up.

This is cache thrashing. Its happens with poor design. ZFS has number of tunable parameters to prevent this. (not in the GUI) You have settings list TXG time outs and write rate limits.
It was a random desktop SSD which I put in the server for no more reason that to test speed potential. It was never to be a permanent solution for anything but I was curious as to why it performed how it did when writing to it over the network.

Alright, for everything else I didn't quote. I see I was wrong about a lot of things. I see you helped clarify a lot of things. And as you stated in the end. "What is the application?"
Simply put. It is a very very very overkill NAS (with a couple of bhyve VMs) which I've just slowly built-up over the years.

Questions you might have:

Why SSDs?
I've grown bored of the limitations of mechanical storage so I want to try going full solid state.

Do you need it? The throughput? The IOPS?
No, but I want to try it.

You know you could try it in a much cheaper fashion, right?
Yes, but that wouldn't be as fun.

Do you even have a network that could support it?
To clarify the network connection. It's 20Gbit to one client with SMB3.0 Multichannel using a dual port SFP+ card. I have peak read speeds of 1.6GB/s and sustained around 1.35GB/s from files cached in server RAM. Works quite well.
 

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
It was a random desktop SSD which I put in the server for no more reason that to test speed potential. It was never to be a permanent solution for anything but I was curious as to why it performed how it did when writing to it over the network.

If you are using consumer/desktop SSDs it's a good idea to over-provision them (secure erase and then only partition 80-90% of the usable space), as these types of drives usually come with very small spare areas. The small spare, and the need to periodically perform clean-up/"garbage collection", is what holds many such drives back. Over-provisioning will help such drives with performance consistency even when TRIM is supported.

In contrast, enterprise drives usually have a much larger spare area to not only provide a greater endurance, but also a more consistent level of performance over time.

The SSD controller also plays a large role in performance consistency over time. Check out some old Anandtech reviews for SSDs (e.g. for the Samsung 850 Pro) where they test performance consistency of constumer SSDs. They (used to) include results of one or two additional levels of over-provisioning (12% and 25% reduction in partitioned space), in addition to the "default".
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
If you are using consumer/desktop SSDs it's a good idea to over-provision them (secure erase and then only partition 80-90% of the usable space), as these types of drives usually come with very small spare areas. The small spare, and the need to periodically perform clean-up/"garbage collection", is what holds many such drives back. Over-provisioning will help such drives with performance consistency even when TRIM is supported.

In contrast, enterprise drives usually have a much larger spare area to not only provide a greater endurance, but also a more consistent level of performance over time.

The SSD controller also plays a large role in performance consistency over time. Check out some old Anandtech reviews for SSDs (e.g. for the Samsung 850 Pro) where they test performance consistency of constumer SSDs. They (used to) include results of one or two additional levels of over-provisioning (12% and 25% reduction in partitioned space), in addition to the "default".
This may prove useful to know should I ever put a desktop SSD in a server again for any reason.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
When it is something already done:

I don't get why you are having, "questions and concerns"...
Well you see my good fellow the title of the thread is "Plans to build a 12 drive SSD array" NOT "Plans to build a 12 drive SSD Server". I deliberately stated "array" and throughout the original post revolved the topic around the array (not the server as a whole but did list components) to prevent people from confusing the server as a whole with the array alone which is the project in question. It seems I failed to be specific enough as you're not the only one who misinterpreted everything I said.

As for Questions and Concerns. SSDs are a little bit more picky when it comes to their application than HDDs. My questions and concerns revolve around the feasibility of the project with the hardware/software specified.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
If @jgreco were around these parts more he would be a good resource to tap for this. He has experience with all flash arrays.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
If @jgreco were around these parts more he would be a good resource to tap for this. He has experience with all flash arrays.
I've never RAIDed Solid State anything ever. I would expect the throughput potential to be spectacular provided compatible SSDs are being used for the specific application and are configured as such to get maximum performance (ie. RAID levels, over-provisioning, compression, encryption, block-size, etc)

If jgreco pops in that'd be great. The more opinions I can get the better. The more opinions I see overlap tells me what decisions will yield the best possible outcome for my configuration.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Would there be some wisdom to stripe the SSDs, focusing on the desired boost to performance (assuming a low failure rate) and replicate the Datasets to a smaller number of spinning disks to mitigate the risk? (snapshot frequency then determining the time to be lost in case of drive failure)
 
Joined
May 10, 2017
Messages
838
I've been using a 12 SSD pool for some time, on two different controllers, no issues with trim, performance has been constant, though my only concern is consistent read and write speeds over 10GbE even when the pool is being used by processes.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Would there be some wisdom to stripe the SSDs, focusing on the desired boost to performance (assuming a low failure rate) and replicate the Datasets to a smaller number of spinning disks to mitigate the risk? (snapshot frequency then determining the time to be lost in case of drive failure)
Since this would be an upgrade from the existing 8 drive mechanical raidz2 I have, I had the idea to use that as a backup array (otherwise it'd be doing nothing once I upgraded). However what seems like a potential waste of redundancy is if each drive is capable of ~500 read/writes and I have 12 of then that's a max potential of 6000MB/s (6GB/s) in RAID0 which goes well beyond the 20Gbit link I have. Since I don't need the full 12TB and I don't think RAID5/6 cuts performance by 50% or more. It should be possible for me to go raidz2, encrypted, with lz4 compression and still see performance pushing the edge of 2GB/s...theoretically. I'd have to test it.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
I've been using a 12 SSD pool for some time, on two different controllers, no issues with trim, performance has been constant, though my only concern is consistent read and write speeds over 10GbE even when the pool is being used by processes.
So you see frequent dips below 10Gbit? I have a 20Gbit link. Part of my desire is to utilize that. My hope is that 12 drives will be enough to get the speed I desire while including the redundancy. I can only assume frequent dips below gigabit would be a result of the type or size of files being copied.

Not to mention if the CPU starts to become a bottleneck. I would expect though by using multiple NICs it should distribute the load across at least 2 threads if not 4. I'll probably put the whole array in RAID0 just to see what I get but I don't know of any command line benchmarking tools I could run on the array. I wrote up a dd scipt and ran it on the mechanical array testing how it handled block sizes between 512bytes & 64MB but it only showed me the performance of 1 drive. I was using urandom. lz4 was on of that means anything. Interestingly enough though if I multiplied the number I got by the number of drives for total usable storage (6 out of a 8 raidz2) it gave me a number pretty close to the performance I see when writing data to the array. I don't know if that's a good benchmarking method for arrays.
 
Joined
May 10, 2017
Messages
838
So you see frequent dips below 10Gbit?

When transferring larger files I see consistent speeds around 600MB/s reads and 800MB/s writes, limit is likely the NIC I'm using, as Mellanox are not the recommended ones for FreeNAS, but I already had it and it's close enough for me.

The pool itself if faster than that, e.g., scrub takes around 20minutes @ 1.6GB/s
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
When transferring larger files I see consistent speeds around 600MB/s reads and 800MB/s writes, limit is likely the NIC I'm using, as Mellanox are not the recommended ones for FreeNAS, but I already had it and it's close enough for me.

The pool itself if faster than that, e.g., scrub takes around 20minutes @ 1.6GB/s
I was previously using Mellanox cards before upgrading to Broadcom controllers. The 57810S supports RSS & RDMA. The Mellanox cards had the potential to transfer at full 1.123GB/s but I can't give you comparative read/write performance until I built the array.

Also in theory if the driver is only good enough to give 1 interface 600/800 then if you use more than 1 it might give you a little bit more. Though the cost will probably outweigh what you gain. If you gain anything at all.
 
Joined
May 10, 2017
Messages
838
The Mellanox cards had the potential to transfer at full 1.123GB/s but I can't give you comparative read/write performance until I built the array.

Yep, I can get 1.1GB/s when transferring to/from a Linux server with the same NIC, but 600/800MB/s is good enough for me for now, especially since my desktop can't currently sustain 1GB/s writes.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Yep, I can get 1.1GB/s when transferring to/from a Linux server with the same NIC, but 600/800MB/s is good enough for me for now, especially since my desktop can't currently sustain 1GB/s writes.
It is kind of funny sitting at home thinking "Oh, how horrible is this? I can only transfer my files across the network at a measly ~700MB/s. It's horrible." Meanwhile your friend down the road is content with 125MB/s (gigabit).
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
I'm happy with 50MB/s... With 4k random sync writes and low queue depths...:D
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Windows7ge said:
I don't think RAID5/6 cuts performance by 50% or more. It should be possible for me to go raidz2, encrypted, with lz4 compression and still see performance pushing the edge of 2GB/s...theoretically. I'd have to test it.

I'd be interested to see the outcome of that test as I have seen many discussions on a theoretical level cover the fact that a RAID VDEV (RAIDZ1/2 or other types) is only as fast as one of the member devices, hence everybody is saying that a pool of many mirrored VDEVs is the best way to get performance and keep some redundancy.

According to the theory, RAIDZ1/2 should kill over 90% of your potential performance with 12 devices in a single VDEV.

Notably, RAID0 or in FreeNAS terms "stripe" should not be subject to that rule as it is the same as each device in the pool being its own VDEV.
 
Status
Not open for further replies.
Top