Raid5 performance

Status
Not open for further replies.

el_pedriyo

Explorer
Joined
Jun 24, 2018
Messages
65
Hello,

I have few questions before starting with Freenas, I also have been working with other NAS OS and wanted to try this ones. First of all, I just wanted to know If I can achieve with raid5 of 5 disks for example, a write and read speed of 10gbps in disks, I am not talking about that I will need a pcie ethernet port of 10gbps, just talking about the disk perfomance on a raid 5. If I am not wrong, on raids 5, read speed was improved but write speed was not, I suppose it is because of the parity disk that has to be reading and writing everything on its disk. But at least I wanted to know if anyone was able to reach 10gbps on their reads speeds. About write, I wanted to know if I could use a ssd to cache all my content and then for example 10 min after start moving that content to the normal HDDs, I know that is a risk for a power down issue that could end on losing data, but I just wanted to know if Freenas gives the user opportunity of doing this as other NAS softwares, just to increase writes on a raiz5 as reads.

Kind regards,

Pedro
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
FreeNAS doesn't use RAID5. It uses RAIDZx or mirrors. So any performance metrics will be using those terms.

For your question about cache, yes FreeNAS uses a SLOG drive, but not everyone needs it. So unless yau know what you are doing, a SLOG might result in lowering performance or even in data loss. It depends on your use case which you haven't clearly put forth.
 

el_pedriyo

Explorer
Joined
Jun 24, 2018
Messages
65
Hello,

Well I imagine that it will be some RAIDZx similar to RAID5 that strippes the data all across disks and have a parity drive. Could I know the performance of it and if it is going to be able to reach the speeds I send above? I will be using the NAS for simple servers, like media server for films, series and so, for webserver, and using it for a network storage for all the devices around my home, so that it is why I wan to have a good write speed, just in case I want to write at 4gbps I am able to do it, that it is why I was asking for a cache drive, like for example other OS NAS like unraid, has the ability of this just to increase the write speed of a RAID5
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
FreeNAS uses a SLOG drive
SLOG != cache
The SLOG is not in the write path. It's only ever read from if there is a crash/power failure etc... and the data in RAM (the last Transaction Group or two) didn't make it to disk. It is not used as a cache, just a backup for sync writes in RAM.
ZFS does use parity but not in fixed stripe sizes like traditional RAID. Its block based with extra checksums (Yes in addition to parity).

The SLOG only matters in the case of synchronous writes. Otherwise you will use RAM as "cache" but even this is limited to prevent thrashing. As you can see ZFS is MUCH more sophisticated than RAID. The other side of this is the read cache. and it is a true read cache. This is again in RAM and can be as big as you reliable have free memory. This is why you see FreeNAS systems with 256GB of RAM. Its not always for VMs or jails. The other nice thing about he ARC is that it's not just most recently used with a little read ahead. Its both most recent AND most frequent. If you have 200TB worth of files but 60% of the time people are working with the same 20GB, it all stays in RAM. Lets see a traditional RAID5 touch that performance for the same working set and price! No to mention you still get full integrity checking and healing.

On the subject of data healing, if you have a RAID1 or even a RAID10 you may find two disks don't agree on a few bits. How do you know which disk is right? You don't. With that magic checksum, ZFS can find the correct copy and correct the incorrect disk and let you know that it fix your on disk error without ever causing a hiccup.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
just in case I want to write at 4gbps I am able to do it, that it is why I was asking for a cache drive, like for example other OS NAS like unraid, has the ability of this just to increase the write speed of a RAID5
If your using SMB, unless you configure otherwise, it async, meaning the SLOG will not be used at all. All writes will be processed at the speed of network (that's the bottle neck) for about 5 seconds (tuneable) as thats the default timeout for a transaction group in FreeNAS. Again this is tunable but is set as a compromise to prevent thrashing, data loss in the event of a power failure, and preventing cache thrashing.
 

el_pedriyo

Explorer
Joined
Jun 24, 2018
Messages
65
SLOG != cache
The SLOG is not in the write path. It's only ever read from if there is a crash/power failure etc... and the data in RAM (the last Transaction Group or two) didn't make it to disk. It is not used as a cache, just a backup for sync writes in RAM.
ZFS does use parity but not in fixed stripe sizes like traditional RAID. Its block based with extra checksums (Yes in addition to parity).

The SLOG only matters in the case of synchronous writes. Otherwise you will use RAM as "cache" but even this is limited to prevent thrashing. As you can see ZFS is MUCH more sophisticated than RAID. The other side of this is the read cache. and it is a true read cache. This is again in RAM and can be as big as you reliable have free memory. This is why you see FreeNAS systems with 256GB of RAM. Its not always for VMs or jails. The other nice thing about he ARC is that it's not just most recently used with a little read ahead. Its both most recent AND most frequent. If you have 200TB worth of files but 60% of the time people are working with the same 20GB, it all stays in RAM. Lets see a traditional RAID5 touch that performance for the same working set and price! No to mention you still get full integrity checking and healing.

On the subject of data healing, if you have a RAID1 or even a RAID10 you may find two disks don't agree on a few bits. How do you know which disk is right? You don't. With that magic checksum, ZFS can find the correct copy and correct the incorrect disk and let you know that it fix your on disk error without ever causing a hiccup.

So, I have been reading this and other more articles from internet. I am going now to explain more or less what I have understood and I would be pleased if you could correct me if I misunderstood something. All about ZIL and SLOG is just to being capable of not losing any king of data if there is a power drain on stuck in the system, and that this is only applied by default to syncronous writes, just because once you transfer any kind of data to a disk it is not being written at that exactly moment, it takes some time to be completely written (I understood more or less this) and with the SLOG you solve this issue, because in case there is a power drain that information is going to be written first on the SLOG and then on the disks. The thing is that obviously if I need to write everything first to another drive for activating SLOG and then the SLOG will be writting this information 5 seconds after, the perfomance would be limited to the write speed of the SLOG considering that will be decreased 5sec after because a reading of the SLOG will start to copy everything to the disks.
So considering all this here are my questions:
  • If this is true, synchcronous write with SLOG activated, won't hit the best performance in the array but if a power loss ocurrs, few/none data could be corrupted (data that was being written at that moment)
  • If this is true, synchcronous write without SLOG activated, will hit the best performance in the array but if a power loss ocurrs, some data could be corrupted (data that was being written at that moment)
  • If this is true, asynchcronous write with SLOG, I really do not know if the best performance in the will be hited or not, but few/none data could be corrupted (data that was being written at that moment)
  • If this is true, asynchcronous write without SLOG, will hit the best performance of the array, but some/more data than in synchcronous write could be corrupted (data that was being written at that moment)
  • If I only copy/paste files from my computer to the NAS, like a storage device with no database or critical things like those it have any sense activating the SLOG? How much time does it take for a file to correctly be saved on the disks once I have finished transferring it if I also have a parity disks? I mean, how much time does it last to make that file secure before a power drain could happen?
  • Just wondering what would happen if the SLOG is quicker than the array? The write speed will be limited I assume, so I won't be able to write quicker to the SLOG than to the array. So the best perfomance I can get is the one that the array can currently give me. So taking all this into account, there is no cache written method, to speed up more the write than what the array can handle
 

el_pedriyo

Explorer
Joined
Jun 24, 2018
Messages
65
So the differences in all these between RAIDZ and RAID5 for example, is that RAIDZ has everything about we have been talking and RAID5 will be like an asynchcronous without SLOG storage system. Won't it?
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
write everything first to another drive for activating SLOG and then the SLOG will be writting this information 5 seconds after


synchcronous write with SLOG activated, won't hit the best performance in the array but if a power loss ocurrs, few/none data could be corrupted (data that was being written at that moment)
True - Assuming you are using a power loss protected drive. The drive needs to clear its buffers to flash and that's why many enterprise SSDs have super capacitors. They keep them running just long enough to lush and on disk buffers/cache to the flash memory.
synchcronous write without SLOG activated, will hit the best performance in the array but if a power loss ocurrs, some data could be corrupted (data that was being written at that moment)
False - sync writes will halt until the write is confirmed on disk, then continue to the next.
asynchcronous write with SLOG, I really do not know if the best performance in the will be hited or not, but few/none data could be corrupted (data that was being written at that moment)
This is the most likely to see data loss as writes may still be in RAM but the system thinks they are on the disk POOF data gone. This is also generally the fastest as it will not wait to confirm any writes before continuing at least until the TXG (transaction group) is full or times out (default 5 sec).
asynchcronous write without SLOG, will hit the best performance of the array, but some/more data than in synchcronous write could be corrupted (data that was being written at that moment)
ALL sync writes wait to be confirmed before continuing. That's the purpose of sync writes. But yes async without SLOG (if you have one it just skips it) is fastest.
So the best perfomance I can get is the one that the array can currently give me. So taking all this into account, there is no cache written method, to speed up more the write than what the array can handle
False - The transaction group acts as the write cache. In the case of sync, we wait until the write is verified on disk (SLOG or pool). Async, we fill the TXG and flush in an optimised order every 5 seconds. You don't want to "cache" more than that (you can) as that data is "in flight" and "at risk" if the system crashes. Also if we fill a LARGE TXG we have to wait till the data is flushed to start a new one and all IO stops. All caching/buffering work in a similar way. There is no magic trick to making an array write a continuous stream faster than the disks will support. ZFS does an exceptional job keeping things smooth and writing in an optimised way assuming enough free space to do so.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
RAID5 will be like an asynchcronous without SLOG storage system. Won't it?
Not sure what you mean here. RAID usually does not have more that a few GB of battery backed cache and that is only helpful if the power comes back and the system is back on before the battery dies. There are better systems but they get insanely expensive.

Sync and Async depend on the application or if its forced by the filesystem. Most databases are synchronous writes. Most general user applications are Async. I run 2 ESXi hosts on my FreeNAS storage and I for all writes to that storage to be sync. I dont want all my VMs to be corrupted if my UPS has an issue or FreeNAS crashes for an unforeseen reason.
 

el_pedriyo

Explorer
Joined
Jun 24, 2018
Messages
65
False - sync writes will halt until the write is confirmed on disk, then continue to the next.
Yes, it would only be corrupted the data that was being transferred that moment as I pressume, if the other data is confirmed on disk it should be fine, doesn't it?

This is the most likely to see data loss as writes may still be in RAM but the system thinks they are on the disk POOF data gone. This is also generally the fastest as it will not wait to confirm any writes before continuing at least until the TXG (transaction group) is full or times out (default 5 sec).
So this will be the fastest way, just because I won't be limited by the array speed until I hit the maximum storage of the SLOG disk, which I assume it is call TXG like you mentioned.

ALL sync writes wait to be confirmed before continuing. That's the purpose of sync writes. But yes async without SLOG (if you have one it just skips it) is fastest.
It will be fast but you will be limited in this case by the array speed, doesn't it? If I add a SLOG I should be hitting the SLOG maximum write speed. eg: an ssd writing 500mb/s I should be able to hit the 5gbps if I use a 10gbps network speed, and in case I use an m.2 drive with a write speed of 1500mb/s I should be able to hit the full 100gbps. Am I right?

Another thing, If for example I am pasting a file of 100gb in a SLOG drive (I thought it was also called TXG, but maybe TXG is just a group of many SLOGS in a RAID0 or RAID1?), it will be needed to finish the copy before the transferring process to the disk is initiated, doesn't it?

The drive needs to clear its buffers to flash and that's why many enterprise SSDs have super capacitors.
Why do I need ssd with super capacitors, it was not enough with a sync mode array with a SLOG to keep everything fine if a power lose occured?

I don't want all my VMs to be corrupted if my UPS has an issue or FreeNAS crashes for an unforeseen reason.
In case I want to run VM's inside my Freenas OS, can I partition inside the RAIDZ different storage units with different sync and async configs and SLOGs, like for example, sync for VM and async with SLOG for a media server or home storage.

Thank you for all the help :D
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
The thing is that obviously if I need to write everything first to another drive for activating SLOG and then the SLOG will be writting this information 5 seconds after, the perfomance would be limited to the write speed of the SLOG considering that will be decreased 5sec after because a reading of the SLOG will start to copy everything to the disks.

No.

SLOG is only read on reboot (etc) in the event of a crash.

The information committed to the SLOG is flushed to disk with the next TXG.

Essentially ZFS turns the sync write into an async write and then it batches async writes and turns them into sync writes.

RaidZ1 is similar to Raid5, but solved the “write hole” problem that Raid5/6 has.

You can see some benchmarking of my 6 disk raidz2 system here:
https://forums.freenas.org/index.php?threads/testing-the-benefits-of-slog-using-a-ram-disk.56561/

This is using sync writes. But I also run sync = disabled tests

You don’t need a slog if you don’t care about preserving the last txg in a crash. Just set sync = disabled.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Yes, it would only be corrupted the data that was being transferred that moment as I pressume, if the other data is confirmed on disk it should be fine, doesn't it?
Even data "in flight" could not be corrupted as the system never thought the data was there. The data will be missing but not technically corrupted. For example, if you were in the middle of saving a file to the server over the network, you would get an error that the file could not be saved. If this was an Async operation, FreeNAS may have said "yes! I saved that file!" but in reality part or none of the file was actually saved.
So this will be the fastest way, just because I won't be limited by the array speed until I hit the maximum storage of the SLOG disk, which I assume it is call TXG like you mentioned.
The default TXG time out is 5 seconds. In practice the SLOG only ever see a few GB at a time. Remember the TXG stays in memory for both sync and async. In the case of Sync, the SLOGis kind of a backup for RAM and therefore we cannot write faster than the SLOG device.
It will be fast but you will be limited in this case by the array speed, doesn't it? If I add a SLOG I should be hitting the SLOG maximum write speed. eg: an ssd writing 500mb/s I should be able to hit the 5gbps if I use a 10gbps network speed, and in case I use an m.2 drive with a write speed of 1500mb/s I should be able to hit the full 100gbps. Am I right?
Yes and no. Take a look at Stuxs thread to see real numbers and better explanations. Also please be sure to use correct units 100MB != 100mb as you know there are 8b per 1B
Why do I need ssd with super capacitors, it was not enough with a sync mode array with a SLOG to keep everything fine if a power lose occured?
SSDs have RAM to buffer the data so that data in the buffer would still be lost in the event of a crash defeating the purpose of a SLOG.
 

el_pedriyo

Explorer
Joined
Jun 24, 2018
Messages
65
Even data "in flight" could not be corrupted as the system never thought the data was there. The data will be missing but not technically corrupted. For example, if you were in the middle of saving a file to the server over the network, you would get an error that the file could not be saved. If this was an Async operation, FreeNAS may have said "yes! I saved that file!" but in reality part or none of the file was actually saved.

The data was not corrupted because it was not even written, it could be still on the SLOG, and that is why data was not even started to be written, so then, SLOG is currently useless if you do not have an SSD with a battery inside it, doesn't it? I mean, useless at the purpose of not losing data, but useful at the purpose of transferring data at high speeds.

The default TXG time out is 5 seconds. In practice the SLOG only ever see a few GB at a time. Remember the TXG stays in memory for both sync and async. In the case of Sync, the SLOGis kind of a backup for RAM and therefore we cannot write faster than the SLOG device.

In case I only have 8 GB of RAM, can I config Freenas for it to only take the minimum of what he needs, instead of caching backups like you said for the SLOG? And then just using the SLOG on an ssd

A good way to define the sync write could be like windows does when you select a bunch of files, copy and paste them on another place, and it starts 1 by 1 and it does not go to the following one till it finishes with the one it was working on. So finally even if I setup async mode some programs could force the sync mode, am I right?
 
Status
Not open for further replies.
Top