Freenas+NFS, benchmark feedback.... CACHE does matter !

Status
Not open for further replies.

JCL

Dabbler
Joined
Nov 18, 2015
Messages
14
I would like to share my feedback with FREENAS and NFS. I currently have several Freenas systems, mainly used to store data from astrophysical simulations. I am going to focus on the one with following specifications :

FREENAS system:
  • Motherboard SUPER MICRO X10DRi-T (http://www.supermicro.com/products/motherboard/xeon/c600/x10dri-t.cfm)
  • RAM: 130941MB
  • HARD DISK : 30 x 6TBytes (sata III 6Gb/s)
    • option tested : 1 X SSD (Kingston Hyperx Savage 120GB) to store SLOG
  • FREENAS : FreeNAS-9.3-STABLE-201511040813
  • LSI cards : 2 options tested
    • LSI 2308 MPT-fusion (HBA)
    • LSI 2208 Megaraid (JBOD mode)
  • Network card : Intel Corporation Ethernet Controller 10-Gigabit X540-AT2
Our freenas systems are connected to a 48 ports 10GB switch DELL N4032F. Approximately 25 linux machines (with 10GB ethernet card) mount NFS partitions from Freenas through 10GB switch. These machines run analysis programs on data stored on Freenas system. There are a lot of read and write operations. The overall performance is really good, but as we will see later, there are some critical exceptions which have some explanations that I want to share with the freenas community.

I ran several benchmarks on 4 different configurations (LSI controler, pure HBA, memory cache, SLOG) :
  1. C1:
    • controler LSI 2308 MPT-FUSION pure HBA card WITHOUT SLOG
  2. C2:
    • controler LSI 2308 MPT-FUSION pure HBA card WITH SLOG
  3. C3:
    • controler LSI 2208 Megaraid (1GB cache memory) in JBOD mode WITHOUT SLOG
  4. C4:
    • LSI 2208 Megaraid (1GB cache memory) in JBOD mode WITH SLOG
Each configuration have one raid6 (rz2) zpool composed with 3 vdevs of 10 hard disks : 3 X 10 X 6 TB disks.

Configurations C2 and C4 have an extra SSD disk to store SLOG.

LSI 2208 card is configured in JBOD mode, it means that freenas manage itself the raid filesystem. It's important to notice also that this card has 1GB of cache memory.

Each tests have been running from a single linux machine (DELL 20 cores i7, 64GB RAM. Intel 10-Gigabit X540-AT2). During tests, only one machine mount an NFS partition from FREENAS giving all the network bandwidth for the test.

I focused my tests mainly on NFS writing speed from linux machine to freenas system.

TEST #1 : bandwidth test using dd command to write from Linux client to Freenas through NFS
  • dd if=/dev/zero of=qq bs=64M count=500 (34GB transferred from Linux to Freenas)
C1 : 723 MB/s - > yes it's 723 Mega Bytes / seconds, very fast :)
C2 : 564 MB/s
C3 : 874 MB/s
C4 : 908 MB/s​

The overall is quite good, whatever the configuration.

We get best performance with C4 configuration. We are close to maximum speed of 10Gb Ethernet !!!! With this configuration, there are almost no penalty for writing such big file to freenas zpool through NFS.
  • dd if=/dev/zero of=qq bs=6M count=500 ( 3.1GB transferred from Linux to Freenas)
C1 : 594 MB/s
C2 : 507 MB/s
C3 : 722 MB/s
C4 : 605 MB/s​

This time we copy smaller files, 3.1GB via NFS. The overall performance is still good.

In both tests, and with pure HBA card (C1 and C2), we notice that transfer speed is faster when there is no SLOG.

TEST #2 : development environment

This test is very important, because it reveals some not obvious behaviours with FreeNAS+NFS in specific situations.

I am currently working on quite big c++ project, using qt5 opengl and a lot of sources files and libraries. According to the freenas configuration (see C1,C2,C3 and C4) you'll see different time to complete a task in a development environment.

All the following tests have been running several times from a Linux machine which has mounted an NFS partition from a freenas system (C1, C2, C3 or C4).

Here we compute the time required to checkout an SVN project
C1 : 212 sec !!!
C2 : 23 sec
C3 : 36 sec
C4 : 2 sec​

As you can see, C1 configuration took 212 seconds to complete a simple svn co command !! I suspect that svn command do a lot of updates on files which drastically increase time (tons on sync) . This bad behaviour is due to a lack of cache of C1 configuration ( pure HBA and no SLOG) therefore ZIL has to be updated a lot of times during the transaction. Zilstat reports a lot of IO operations.

C2 (pure HBA+Zlog) and C3 (HBA with memory cache) complete the command in the same time.
C4 (HBA with memory cache + SLOG) is the fastest.
  • cd trunk/build && time cmake ..

Here we measure time to complete cmake command which is a tool which generate Makefiles.

C1 : 22 sec
C2 : 4 sec
C3 : 7 sec
C4 : 2 sec

Once again, with such kind of operation, lack of a cache and/or SLOG increase time operation.

  • time make -j 6
Here we measure compilation time.

C1 : 95 sec
C2 : 42 sec
C3 : 48 sec
C4 : 38 sec

Once again, C1 took more than twice time to complete compilation

  • del bin/glnemo2 && time make
This test is very interesting and surprising. Here we generate the application binary by just linking all the objects files and libraries all together.

C1 : 35 sec
C2 : 4 sec
C3 : 8 sec
C4 : 2 sec

This is a really weird result. Althought the created file is very small (24 MB) , it looks that linking operation needs a tons of writes sync operations which drastically increase time by a factor of 10 !!!! Once more, cache does matter.

I have noticed same slowdown when compilig/linking any type of fortran and C programs.

CONCLUSION :
As we can see Freenas+NFS needs a cache mechanism to speed up write sync. For some people it's obvious but it was not for me.

So far I was just using freenas to copy and read files by simple read/write operations from simulations and/or from analysis programs. It was working fine and fast without cache mechanism. But since I started to use my freenas to store and compile my programs, it was not possible to work any-more (waiting 30 seconds for every linking completion is just impossible !!!).

Then a cache mechanism is mandatory :

Freenas advices to use a pure HBA controller + SLOG on fast SSD (like C2 configuration).

It looks that a raid controller with memory cache used in JBOD configuration gives also good performances (C3 configuration). However with this configuration your freenas system will not be as reliable than C1 in case of power loss. You might loose operations store in raid memory cache and not flushed yet on ZIL. As I am a not a bank company, with financials transactions, this is not a big deal for me :). The main advantage of C3 configuration is that you can have one more disk available because you don't use a disk for SLOG.

I am sure I might be wrong in about what I said. Please ZFS experts correct me.... What I would like to know is if C3 configuration is a good configuration for freenas ? I have noticed that smartcontrol daemon was not working on this configuration (it fails to start), and I am wondering what happens in case of disk failure, will I get a mail from Freenas ?

Useful links :
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
That's one of the common complaints and the biggest reasons not do use a RAID card in JBOD mode... the SMART data typically isn't made available to FreeNAS (or any underlying OS) so you have no way to see that data.
You're using /dev/zero to generate the data for your testing... assuming you left the default compression enabled, you're writing very little actual data to the disks (streams of zeroes compress very well). Try it again using a pregenerated file of random data and see what happens.
Also, the choice of the Kingston drive for SLOG may be a poor one - I'm not familiar with that model. Most recommendations tend to center around the Intel data center grade drives.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
What I would like to know is if C3 configuration is a good configuration for freenas ?

The 2208 has a well-known issue with VMware VSAN in JBOD mode, as in an "eats all your data" issue, and I'd be extremely wary of trusting it here.

Practical and most-approved solution: Configuration 2 - 2308 as an HBA, proper fast SLOG device (Intel DC series SSD in SATA or NVMe)

Crazier solution: 2308 as an HBA, 2208 in RAID mode with a pair of fast spinning drives, 1GB of cache set to catch writes only, and use that as your SLOG device. Your first 1GB of sync writes will go in at ridiculous speeds, then the rest will slow down to the sustained ingest rate of the disks behind it.
 

JCL

Dabbler
Joined
Nov 18, 2015
Messages
14
You're using /dev/zero to generate the data for your testing... assuming you left the default compression enabled, you're writing very little actual data to the disks (streams of zeroes compress very well). Try it again using a pregenerated file of random data and see what happens.
Also, the choice of the Kingston drive for SLOG may be a poor one - I'm not familiar with that model. Most recommendations tend to center around the Intel data center grade drives.

Actually, I did not left compression so the /dev/zero stream is not compressed.
Yes Kingston drive is not the best, you are definitively right, but not too bad though. (Savage model, see http://www.kingston.com/us/ssd/v#shss3)
 

JCL

Dabbler
Joined
Nov 18, 2015
Messages
14
dev zero as datasource? star trek needs a dead star...

Yes, /dev/zero as datasource, because /dev/zero is mapped directly to the host memory, then there is no penalty to generate a big file from memory as opposite to read from a file, especially if the purpose of your benchmark is to test write performance...

/home/jcl> df -h /dev/zero
Filesystem Size Used Avail Use% Mounted on
devtmpfs 5,9G 0 5,9G 0% /dev

https://en.wikipedia.org/wiki//dev/zero
 

JCL

Dabbler
Joined
Nov 18, 2015
Messages
14
The 2208 has a well-known issue with VMware VSAN in JBOD mode, as in an "eats all your data" issue, and I'd be extremely wary of trusting it here.

Thanks for the info, but could you be more specific when you write "eats all your data" issue ?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Thanks for the info, but could you be more specific when you write "eats all your data" issue ?

Literally as I wrote it, the card will suffer a failure and will cause data loss. See source: http://www.synchronet.com/blog/critical-bug-for-lsi-pass-through-on-vsan/

Symptoms of LSI Pass-through Bug
Symptoms logged in system IPMI:
  • disks disconnecting
  • complete loss of all drives from the controller
VMware Level Symptoms:
  • disks dropping and reporting unhealthy
  • loss of objects/quorum and loss of data
  • Objects will continue to exist and use space, but ownership will not be owned/presented
  • Clearing ownership in vSAN does not prompt a new owner to be elected
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I cringed as I read this thread tonight. And I debated whether to even bother posting a response. Why? Because this thread is basically dead... it's 2 weeks old and not likely to have more responses. And frankly, I'd kind of rather this kind of thing be forgotten because of how "bad" this is. Not to offend the OP, but the testing parameters, the way the testing was done, and the results are really devoid of value because they don't affect realworld workloads.

First off, HBAs are very important because:

1. Let's you manage SMART (VERY, VERY important for reliability).
2. ZFS needs to be able to manage your storage, NOT something else (like a RAID controller).
3. You do NOT want ZFS and something else to both attempt to manage your storage.

If your workloads are not demanding, using a RAID controller will give you very noticeable performance boost in testing environments. Keywords are "not demanding", "noticeable performance post" and "testing environments".

Nobody tests their workloads by doing horribly strenuous tests on their server with massive amounts of data. People just don't. You create some small test suite with some kind of test data and go with it. Nobody fills their zpool to 80%, etc.

The results make it so noticeable it is hard to deny the conclusion that "RAID cache must be AMAZING and these people recommending HBAs aren't with the current state of technology". It's overwhelming. You saw more than an order of magnitude higher performance. That kind of things just *cannot* be ignored, right?

The problem: With hardware RAID you have two IO schedulers: ZFS and the hardware RAID controller.

Once you start putting real-world workloads on your server, you *will* hit that point where you will tax your system heavily for a few seconds and that is when the fun begins. ZFS will try to be smart, and your RAID controller will try to be smarter. What happens? BOTH fall all over each other, your server performance tanks, and it may never recover because now that performance is in the toilet, so little workload taking place in the background will keep it at its knees that you cannot recover without a reboot to flush out all of the IO.

This is no different than when we tell people not to do lots of other things because the long-term implications are well known, well understood, and can be easily avoided by listening to a few pros here.

This is very similar to RAIDZ2 running workloads that are very random in reads and writes.

Take 10 disks and put them in a RAIDZ2 (only lose 2 disks worth of capacity) and put a VM or two on there. Run some benchmarks and you'll get very good performance numbers. Do the same tests with the same 10 disks in a mirrored array and you'll often get performance numbers that are similar to the RAIDZ2, but now you've lost 50% of your raw disk space.

If you are like most people, losing 50% of your disk space when you appear to be able to lose only 20% o your disk space seems like a big cost savings and you'll promptly decide that you want to do RAIDZ2. You'll easily conclude that RAIDZ2 is the better way to go, ignoring all of the posts on the forum telling you that mirrors are the way to go. Of course, anyone that reads the forums knows that IO is handled on a per-vdev basis, so having 5 vdevs instead of 1 means 5 times the IO.

So you destroy the mirrored zpool and go back to RAIDZ2, start copying VMs over and suddenly you find ESXi is sending you warnings, your VMs are powering off for no obvious reason, and your currently transferring VM that is 1TB is only transferring at 15MB/sec.

So what happened? Your tests weren't stressing your storage subsystem for long enough and/or hard enough to see what the limitations were in relation to the real-world workload you were going to place on your server. You had bad testing parameters and then made a bad assumption based on those testing parameters.

Eventually, after reading a crapload of posts, and potentially creating "yet another one of those damn posts complaining about stuff the pros have already told you not to do 1000 times" you learn your lesson and go back to mirrors. If you are like some people I've seen, you've lost a month or more of production time, lost some data and had to do a restore from backup, and people are often asked why they are still working at the business because they failed to do appropriate testing, failed to do appropriate analysis of the results, failed to listen to the pros, and cost the company quite a bit of money and time because of a mistake.

At the end of the day, no amount of benchmarks shown on the forums will convince me that things are "so much better" in the performance arena (even ignoring all of the other fatal problems with going with ZFS and hardware RAID) because I've watched so many people have these problems, and the solution was obvious (and actually solved the problem). If your workload lets you run with a RAID controller (and you choose to ignore the fatal problems like the lack of SMART supprt) and you want to ignore us, go for it. ;)

Just don't be surprised when you're begging for forgiveness from the ZFS gods later when you're having problems with reliability and performance when you finally start relying on your server for heavy workloads and the server keeps falling over on you. It'll only piss you off once you realize its your own fault for deciding to use the hardware RAID. :P

Case and point: Your test write was 34GB... but your FreeNAS machine had 128GB of RAM. Because of that, your test set should have been no less than 1TB. 34GB fits in RAM, thereby invalidating any and all results. That's just the most basic example of how benchmarking is best left to the pros (no offense intended).
 

JCL

Dabbler
Joined
Nov 18, 2015
Messages
14
Dear Cyberjock "Pro"fessor (no offence intended)

I wrote a post on this forum to give my feedback and to get help. This is not what a forum is supposed to be ?

For the record, I am managing 3 Open-E NAS systems, and for 2 years now, 4 Freenas systems for a total of 350 tera bytes. In 20 years of computing I did not loose one bit of data yet.

All these 7 NAS systems are inter connected with a cluster of 40 Linux workstations (Infiniband network and 10GB Ethernet) and share their data through NFS protocol.

We produce tons of data from nbody simulations, which are then analyzed several time by the entire cluster. During analysis campaign, the network bandwidth, use by all these computing machines and storage devices is HUGE and everything goes fine and fast. So yes I have experience of HEAVY workload and I am always looking for the best solution (reliability vs speed vs managing time vs cost)

But by the way, did you read entirely my post ?

It's not obvious to post a "real workload" bench, so I posted TEST#1 which is, according to you, not a good/valid/useful test. Ok shame on me, my mistake then..... (and please, provide me a good test....)

But the real reason of my post was to give my feedback of the test#2 that you *did not* comment. I am system manager but I am also and most of the time developer. In test#2, which are real tests, used by people who does development, (like : GIT, SVN, COMPiLATION- gcc, gofrtan, g++, make , cmake, scons, Docker etc..) who run these tools hundred time per days. In these test I wanted to show, with real numbers (timing), that in a FREENAS+NFS configuration, it is mandatory to have a CACHE system during writes operations (again for these kind of tests). For CACHE I used/compared a pure HBA+SLOG vs a Raid controller with memory cache in jbod configuration.

This CACHE mandatory requirement was not obvious for me before I mount user's home with their development environment from FREENAS dataset (Linux client mount home dataset from FREENAS via NFS). This is essentially what I wanted to show, that a development project or a repository management could be 10 times slower without cache. And in this condition it's not possible to work at all !!!!.

Then I let open the question of using an SLOG (pure HBA) or a raid card in jbod mod with memory cache. Yes I have such system (both actually) and yes I am not sure this is good/reliable, although 1) I read everything and the opposite about this solution, and 2) I am not an expert in FREENAS. This why I posted bench timing and asked advises. You see ?

Merry Christmas.

PS :
I guess I understood
1- the total reliability of FREENAS + pure HBA without SLOG
2- the little less reliability of FREENAS + pure HBA with SLOG (SSD can crash in the middle of a powershut)

Nobody was able to explain me why a controller raid in jbod mode only (ZFS manage itself and entirely raid system, and raid controller only manage write cache) was not reliable ? How in such configuration is it possible to loose the entire pool in case of powershut ?)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Nobody was able to explain me why a controller raid in jbod mode only (ZFS manage itself and entirely raid system, and raid controller only manage write cache) was not reliable ? How in such configuration is it possible to loose the entire pool in case of powershut ?)

To be fair, you only really asked "is configuration 3 OK" which was the LSI2208 in JBOD mode with 1GB of cache and no SLOG device.

And I answered pretty clearly that no, I wouldn't trust that card in JBOD mode based on its prior behavior with VMware vSAN setups - it drops disks and results in an inconsistent state of the pool.

But let's pretend that the LSI2208 doesn't have a history of fouling data, and just pretend it's an HBA with a write cache. It's still a bad idea to use it with ZFS, because the card will still interfere with the ability of ZFS to guarantee that the data has been safely stored to the disks themselves.

Let's say you write 1GB of data to your zpool.
The LSI card caches it.
ZFS fires off a "flush your cache" command to the drives to ensure that the data is on stable storage.
The LSI card says "I've got it in cache, I'll just lie, no one will know any different" and says "okay, disks are flushed."
ZFS trusts it. Transaction 1 finished.
The LSI card starts flushing its cache to the actual disks when it feels like it.
POWER GOES OUT.
As far as ZFS knows, transaction 1 finished. This is not true; transaction 1 is still partly on disk and partly on the LSI card's cache.

When you power that system back on, you're betting your pool's safety on that card coming back, picking up exactly where it left off, and finishing that transaction correctly. ZFS can't offer any insight like it might be able to with a proper SLOG device (I was X bytes into processing the txg flush, verify those and continue from byte X+1) - you're just trusting your RAID card.

Now, let's go one further. That power outage just took out your RAID card. Oops. You don't have a completed transaction group on disk. But ZFS thinks you do. Unless of course that update to the metadata was also in cache, waiting to be written ... then who knows what ZFS thinks?

RAID adds layers of obfuscation and confusion that are anathema to ZFS's mandate of "protect the bits."
 

JCL

Dabbler
Joined
Nov 18, 2015
Messages
14
In case of FREENAS+NFS we have writes sync, and whatever cache mechanism (SLOG or jbod raid controller + ram cache) the ZIL is always committed to validate the write operation. Then in case of powershut the latest good transaction is "rebuild" from the latest ZIL, no ?

At least, this is what I understand when I read this post :
https://forums.freenas.org/index.php?threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

I quote from paragraph "what is the ZIL":

"If a crash, reboot, power loss, or other catastrophic event occurs before the transaction group is committed to the pool, the ZIL allows ZFS to read back the intent log, rebuild what was supposed to happen, and commit that to the pool. This is the only important time that the ZIL is read by ZFS, and when that happens, the data will be used to update the pool as appropriate. Under normal usage, data to the ZIL is written and then discarded as soon as the associated transaction group is safely committed."

And now about LSI2208 card + 1GB cache (see my C3 configuration), we can read from the same post, see paragraph "What is a good choice for a SLOG device?":

"An interesting but unorthodox alternative for SLOG is to use a RAID controller with battery backed write cache, along with conventional hard disks. Normally RAID controllers are frowned upon with ZFS, but here is an opportunity to take advantage of the capabilities: Since the cache absorbs sync writes and writes them as the disk allows, rotational latency becomes a nonissue, and you gain a SLOG device that can operate at the speed the drives are capable of writing at. In the case of a LSI 2208 with 1GB cache, and a pair of slowish ~50MB/sec 2.5" hard drives, it was interesting to note that a burst of ZIL writes could be absorbed by the cache at lightning speed, and then ZIL writes would slow down to the 50MB/sec that the drives were capable of sustaining. With the nearly unlimited endurance of battery-backed RAM and conventional hard drives, this is a very promising technique."

So I am really really confused......
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
jcl, everybody above is talking about using a raid controller with your pool is stupid. and the reassons should be pretty clear.

take a look at the zfs pyramid https://blogs.oracle.com/brendan/entry/test

for a zil, you have usually no war against ressources (zfs reads it only when your systems went down in an unexpected event).
zfs systems need more knowledge. you need to understand them. it is not a stupid linux box where you hope your data will be safe.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
In case of FREENAS+NFS we have writes sync, and whatever cache mechanism (SLOG or jbod raid controller + ram cache) the ZIL is always committed to validate the write operation. Then in case of powershut the latest good transaction is "rebuild" from the latest ZIL, no ?

At least, this is what I understand when I read this post :
https://forums.freenas.org/index.php?threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

I quote from paragraph "what is the ZIL":

"If a crash, reboot, power loss, or other catastrophic event occurs before the transaction group is committed to the pool, the ZIL allows ZFS to read back the intent log, rebuild what was supposed to happen, and commit that to the pool. This is the only important time that the ZIL is read by ZFS, and when that happens, the data will be used to update the pool as appropriate. Under normal usage, data to the ZIL is written and then discarded as soon as the associated transaction group is safely committed."

And now about LSI2208 card + 1GB cache (see my C3 configuration), we can read from the same post, see paragraph "What is a good choice for a SLOG device?":

"An interesting but unorthodox alternative for SLOG is to use a RAID controller with battery backed write cache, along with conventional hard disks. Normally RAID controllers are frowned upon with ZFS, but here is an opportunity to take advantage of the capabilities: Since the cache absorbs sync writes and writes them as the disk allows, rotational latency becomes a nonissue, and you gain a SLOG device that can operate at the speed the drives are capable of writing at. In the case of a LSI 2208 with 1GB cache, and a pair of slowish ~50MB/sec 2.5" hard drives, it was interesting to note that a burst of ZIL writes could be absorbed by the cache at lightning speed, and then ZIL writes would slow down to the 50MB/sec that the drives were capable of sustaining. With the nearly unlimited endurance of battery-backed RAM and conventional hard drives, this is a very promising technique."

So I am really really confused......
I think jgreco's post is about using a battery-backed raid card / array as a slog device (in its entirety). I think that's a slightly different use case and one that seems like it shouldn't cause problems (how jgreco was proposing to do it). Using the card for your data drives is another deal entirely. Ignore zambarino. He's mostly just a troll.

I know it sometimes feels like you're trying to prove a negative here on the forums (that a particular config won't lead to issues with your zpool). We're all pretty conservative because the consequences of a screw-up are usually pretty bad. There have been some cases where people ignore hardware recommendations and lost company data leading to company going under or new poster getting fired.
 
Last edited:
  • Like
Reactions: JCL

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
And now about LSI2208 card + 1GB cache (see my C3 configuration), we can read from the same post, see paragraph "What is a good choice for a SLOG device?":

"An interesting but unorthodox alternative for SLOG is to use a RAID controller with battery backed write cache, along with conventional hard disks. Normally RAID controllers are frowned upon with ZFS, but here is an opportunity to take advantage of the capabilities: Since the cache absorbs sync writes and writes them as the disk allows, rotational latency becomes a nonissue, and you gain a SLOG device that can operate at the speed the drives are capable of writing at. In the case of a LSI 2208 with 1GB cache, and a pair of slowish ~50MB/sec 2.5" hard drives, it was interesting to note that a burst of ZIL writes could be absorbed by the cache at lightning speed, and then ZIL writes would slow down to the 50MB/sec that the drives were capable of sustaining. With the nearly unlimited endurance of battery-backed RAM and conventional hard drives, this is a very promising technique."

So I am really really confused......

Yes, you are. The paragraph you quoted talks about a good choice for a SLOG device. Your C3 configuration proposes using the controller for the general pool drives. These are two entirely different things.

The mfi driver under FreeBSD SUCKS GIANT @55. It hides most aspects of the health of the underlying disks from the OS, and has some pathetic performance quirks. Using a 2208 for pool disks, ZFS is likely to be throwing immense transaction groups at it, flooding the relatively modest 1GB cache, and the read cache will never be substantially useful since ZFS is already better at that. It is a waste of a controller. Worse. It's actually putting in a really bad hardware choice.

The point I was making about the 2208 and a HDD for SLOG was that there's an option to gain a high endurance, low latency SLOG device. SLOG writes are not likely to flood the cache, and if they do, it means useful work is being done and the backing HDD isn't keeping up.
 
  • Like
Reactions: JCL

JCL

Dabbler
Joined
Nov 18, 2015
Messages
14
Ok things are pretty much clear now. I was indeed really confused, but I need to understand what I am doing.

Then I will banish LSI 2208 (it's actually in one freenas configuration of my laboratory). And I will add intel SSD s3700 as SLOG device for all FREENAS systems (and pure HBA LSI 2308)

Thanks again to all of you, and sorry for the waste of your time.

JC
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
We don't mind straightening you out. This stuff is complicated.
 

kspare

Guru
Joined
Feb 19, 2015
Messages
508
Just some feedback...

We switched from iSCSI to NFS using an Intel PCIe 750 400GB SSD.

It works great! seat of your pants performance with our terminal servers is better.

We are migrating our data off to local (DAS) storage today to work on implementing replication and migration is faster.

With iscsi we would only see about 30% saturation of a 10gb link, where at now we're at about 80-90% saturation on both servers.

Pretty impressive.
 
Status
Not open for further replies.
Top