writes with sync are abysmal, even with SSD log devices

Status
Not open for further replies.

whosmatt

Dabbler
Joined
Jun 6, 2012
Messages
20
A little background:

We've been using a ZFS storage system for main storage for about 3 years now. It's a Supermicro chassis, 36x 2TB Seagate Constellations, Intel SRCSASJV controller with BBU, OCZ SAS SSDs for log devices. Without going into too many specifics, we've been running OpenIndiana with much success. Lately though, hardware failures have plagued us. We lost a SAS expander which involved downtime, and recently we had three drives fail within a 24 hour period, which put us at risk of data loss; thankfully it didn't happen, though the resilver time on the raidz2 pool was excruciating. (we also have a mirror/stripe pool which recovered in hours instead of days)

So, we bought a new Supermicro storage server to act in a replication role. Ideally, we'd like to have one production pool on each server, and a backup pool. The servers would replicate to each other, and in the event of a catastrophe, we'd have a replica of each on the other.

The new server is the 6047R-E1R24L. Highlights are LSI 2038 HBA in IT mode, 64GB RAM, 24x WD RE SAS 4TB disks, 2x Intel SSD Pro 2500 Series.

I'm evaluating FreeNAS because OpenIndiana has an issue with the HBA / WD disks where it thinks they are over temp and faults them.
So far, I'm liking what I'm seeing. I haven't evaluated FreeNAS in about 4 years; we chose OI over it then because of the superiority of Solaris's CIFS server over Samba, though that's no longer much of a concern for us.

But --
Abysmal writes. I set up a single stipe / mirror pool using 22 of the 24 WD disks, with two as spares. It's encrypted. I took a bit of a circuitous route in setting up the log devices, preferring to do it at the CLI with small partitions (kind of like this: http://mark.nellemann.nu/2013/01/31/zfs-log-and-cache-on-sliced-disks/) so i didn't have to waste my entire 240GB SSDs on a cache that will maybe use 2GB. The logs are using a mirror of 2 2GB slices on the SSDs. I'm using a 2TB zvol presented to an ESXi host via iSCSI. I set sync=always on the zvol, I can see that the log devices are getting written to, but I can't see any performance improvement over not using them at all. If sync is off (rather, set to sync=standard) which is off with ESXi and iSCSI as I understand it, I get about the performance I'd expect given the underlying hardware and the network.

It's difficult for me to compare with my other system, as the OS is different, and the HBA is different. We never experienced this kind of problem, even before adding SSD log devices on the OI system, but having a BBU write cache probably helped a lot.

Sorry if this is a rambling first post. I'll provide as much detail as needed but figured this might get me started. This is a non production system, so I can manhandle it as I please for the time being for testing purposes.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You broke our cardinal rule.. don't do stuff from the CLI. :P Now for the reason why that was bad.

You cannot use an SSD as an L2ARC and an slog and expect either one to perform very well. Because of this, you cannot use the WebGUI to setup two slices or partitions for your slog and l2arc on the same device, nor should you do such things. The ZFS scheduler has no idea that ada1p1 and ada1p2 (or whatever equivalent you are dealing with) is actually on the same physical disk. So ZFS expects that if it tries to write data to ada1p1 that there is nothing else that would conflict with those resources. Except ada1p2 may have reads or writes, which fubars the whole performance needs for L2ARCs and slogs.

Use an SSD for an slog or an L2ARC, never for both.

That article you linked to is a classic article of what you shouldn't do if you know what you are doing, but as the reader you probably aren't able to realize his advice is terrible advice and should be ignored. ;) There's lots of bad advice in the internet. It's easy to give bad advice and not realize it, but hard to figure out what the good advice is.

You are correct that the BBU write cache probably resolved the issue since the scheduler on the RAID controller absorbed the conflicting resource limitations, but you totally broke POSIX compliance because sync writes weren't necessarily written to non-volatile storage media. You may or may not care about this, but I'm mentioning it just because it is something that many people will care about... deeply. Many companies won't even use a product that can't assure POSIX compliance.

Anyway, to come back full circle with this, don't try to outsmart the system by using the CLI. We can't stop you from using the CLI (obviously), but the CLI definitely gives you the power to do things that can (and often will) blow up in your face and result in lost data. If the WebGUI can do things (such as changing network settings, create the zpool, etc.) you should be using the WebGUI without exception. If the WebGUI won't let you do something (for example, use the same physical device as an l2arc and slog) then there's probably a really good reason why it won't let you. If it doubt, ask in IRC. There's almost always someone there that can tell you if your idea is horribad or not.

But this idea... horribad. One device, one function. ;)

Also, as a general rule with FreeNAS (I can't vouch for anything else like OpenIndiana or Nexenta as their settings may be different), but FreeNAS/FreeBSD currently forces transactions at 5 second intervals. So unless you have multiple 10Gb links (or something faster than that), 8GB of space is plenty of space and not physically possible to fill to 100%.
 

whosmatt

Dabbler
Joined
Jun 6, 2012
Messages
20
Got it. Ok, now all the cli gpart stuff is undone, and the SSDs are added via the GUI as mirrored log devices. No L2ARC. Also no performance change. Writes are appoximately 1/7 throughput with sync=always on the zvol.
 
L

L

Guest
@whosmatt I have alot of oi and solaris background. If you would like to contact me we can talk through what you seeing.
 
Status
Not open for further replies.
Top