Performance horror

Status
Not open for further replies.

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
zvol on RAIDz incredibly slow

Hi,

Given: New HP Proliant Microserver N40L (4GB RAM), newly created RAIDz1 with 3x2TB. Created 500G zvol, exported via iSCSI (Gigabit LAN). In Linux (iSCSI initiator):

Code:
# hdparm -tT /dev/sdd

/dev/sdd:
 Timing cached reads:   1608 MB in  2.00 seconds = 803.78 MB/sec
 Timing buffered disk reads: 222 MB in  3.02 seconds =  73.46 MB/sec


Everything OK. Then, creating ext3 on this volume:

Code:
# mkfs.ext4 /dev/sdd
mke2fs 1.41.12 (17-May-2010)
Dateisystem-Label=
OS-Typ: Linux
Blockgröße=4096 (log=2)
Fragmentgröße=4096 (log=2)
Stride=1 Blöcke, Stripebreite=256 Blöcke
32768000 Inodes, 131072000 Blöcke
6553600 Blöcke (5.00%) reserviert für den Superuser
Erster Datenblock=0
Maximale Dateisystem-Blöcke=4294967296
4000 Blockgruppen
32768 Blöcke pro Gruppe, 32768 Fragmente pro Gruppe
8192 Inodes pro Gruppe
Superblock-Sicherungskopien gespeichert in den Blöcken:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000

Schreibe Inode-Tabellen: 1800/4000


Hang (or very slow progress).

In the meantime:

Code:
# hdparm -tT /dev/sdd

/dev/sdd:
 Timing cached reads:   1216 MB in  2.00 seconds = 608.15 MB/sec
 Timing buffered disk reads:   2 MB in 122.25 seconds =  16.75 kB/sec


I have done this twice now, it's repeatable.
Last week, I did performance experiments with smaller disks (1GB) in order to see if this setup meets my requirements. It gave me an end-to-end performance (rsync) of 30-40MB/s which is acceptable.

What's happening suddenly?

Regards,
divB

EDIT: I did some experiments and I got results which I can't explain: I created a 500G zvol and exported via iSCSI. I did linear writes with dd in 4M blocks one time with a single drive and one time with a 3x2TB RAIDz1 (force 4k Blocks enabled).

The result diagrams are attached: While I get realistic performance for no-raid (>50MB/s), the performance drops horribly in case of RAIDz1: Except for the high starting performance (I guess filled buffers), the throughput decrases mostly linearily afterwards. I stopped the experiment at 7,8GB with 2,7MB/s.

zoom.jpg alldata.jpg

I could imagine that more random writes such as formatting the drive with mkfs.ext3 would soon lead to a performance of a few kb/s.

Is there anything I can do about? This is so totally different from e.g. http://forums.freenas.org/showthrea...amarks-and-Cache&p=24532&viewfull=1#post24532 ...

I read that RAIDz1 does not have a very good performance but these numbers are far from being useable at all.

The disks are all different vendors but all SATA-300MB/s:

Code:
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <SAMSUNG HD204UI 1AQ10001> ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <ST32000542AS CC34> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <WDC WD20EARX-00PASB0 51.0AB51> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)


One more EDIT: For testing, did not use zvol+iSCSI but created a dataset and copied data via rsync+ssh (same network!). df now states 86776412k blocks used since I started 50 minutes ago. This yields about 28MB/s which would be acceptable. What the hell is going on here? May be there is a problem is istgt? Or zvols?
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
I did further tests and found: It is not the hardware, not the network, not istgt: Using zvols on RAIDz seems to be the problem. But why?

May anybody try to reproduce the results?

500G zvol on mirror:

Code:
# zpool list
NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
plvl1i0  1.81T  1.97G  1.81T     0%  ONLINE  /mnt
# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
plvl1i0            500G  1.30T   112K  /mnt/plvl1i0
plvl1i0/zvtest     500G  1.78T  1.97G  -
# dd if=/dev/zero of=/dev/zvol/plvl1i0/zvtest bs=2048k count=1000
1000+0 records in
1000+0 records out
2097152000 bytes transferred in 17.318348 secs (121094230 bytes/sec)
#


So I get 115,48 MB/s which is good for me (similar result for a single drive).

Now the same stuff with the RAIDz1 setup described above:

Code:
# dd if=/dev/zero of=/dev/zvol/plvl5i0/zvtest bs=2048k count=1000

1000+0 records in
1000+0 records out
2097152000 bytes transferred in 700.126725 secs (2995389 bytes/sec)
#


which gives the 2,85 MB/s.
As shown above, RAIDz1 with a dataset and rsync+ssh also gives acceptable results.
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
Really no idea left? :( :( It's a catastrophe and I have no idea how to do further debugging.

Writing 1GB linear to a 500G zvol on the following pools:

ada2 ada3 (mirror) 105 MB/s
ada1 ada3 (mirror) 128 MB/s
ada1 ada2 (mirror) 117 MB/s
ada1 ada2 ada3 (raidz) 2,5 MB/s
ada1 ada2 ada3 (mirror) 112 MB/s

this is unbelievable!!

(ada1=SAMSUNG HD204UI, ada2=ST32000542AS, ada3=WDC WD20EARX).

It's not the drives, it's not the raidz, it's not the number, it's not the zvol, it's not the hardware, ... it must be a bug :-( I am desperated
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That's abysmally slow. You might want to check out bug #1531 and the related discussion to see if what you're seeing is related. There are some mitigation steps, but fundamentally ZFS is kind of a strange fit in some ways for some workloads.

ZFS is a copy-on-write filesystem so one distressing thing will be that your iSCSI volume's blocks in the pool are probably going to get fragmented over time as random writes occur (metadata updates, etc). That doesn't explain what you're seeing though.
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
Hi,

Thank you for the reply.
I am aware of the problems related to ZFS (and also fear the fragmentation of the iSCSI volume).

Still, this problem must be something very strange: It's new hardware, new drives, completely new setup. And nothing special! In bug 1531 I can't see a relation. Remember, I get reproducible 2.5MB/s in a completely new setup in one particular configuration while otherwise "normal" 60-120MB/s.

The one things I found which sound similar are:

http://lists.freebsd.org/pipermail/freebsd-fs/2008-March/004557.html
http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/047300.html ff.

However, these relate to FreeBSD 7.2 and issues should be solved in 8. So I am wondering even more.

I guess you already used zvols on top of raidz1? Have you? And never had problems like these?

At the moment I am really at a loss and it seems like I need to drop FreeNAS and go for a Debian-based solution. This is bad for two reasons: First, no fun with snapshots any more and second: I have already done half of the migration (which means I need further days)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I haven't tried a configuration with only three drives, though I can say that RAIDZ with three drives should be an okay configuration (power-of-two data drives plus parity).

You're encouraged to tell us what you're seeing if you run gstat while this is going on. It may help point out an underlying hardware issue or other bottleneck.
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
I haven't tried a configuration with only three drives, though I can say that RAIDZ with three drives should be an okay configuration (power-of-two data drives plus parity).

Thanks. Yes, it is an okay configuration. Writing to the dataset is 60 MB/s in my config

You're encouraged to tell us what you're seeing if you run gstat while this is going on. It may help point out an underlying hardware issue or other bottleneck.

I have done this, however, I am not sure how to interpret this. It can't be a hardware error because when writing to the dataset it works (only to zvol is the issue). My observation is:

When doing dd if=/dev/zero of=/mnt/mypool/testfile bs=2048k, all three drives have a high busy and kBps value (about 60 MB/s).

When doing dd if=/dev/zero of=/dev/zvol/mypool/zvtest bs=2048k, most of the time only one of the drives is "active" and this drive has only a value of about 5MB/s. When all drives are simulatously accessed, they are even at a few kb/s.

This is a BUG, there is no other way :-( :-( :-(
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
To clarify, I took a few screenshots of typical situations. For the case of the zvol, the most time only 1-2 disks are accessed (few times 3) at only a few MB/s:

zvol1.jpg zvol2.jpg zvol3.jpg

For the case of dataset, all disks are simultanously accessed at full speed:

dataset2.jpg dataset1.jpg
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The useful and interesting bits (for this issue, at this point) are:

1) How busy the drives are. The "%busy" is a system-invented measure of how busy the system thinks the drives are, but it is usually a close estimate. A drive can be very busy and still be only pushing 1MB/sec by seeking heavily; this will be reflected in "%busy" and "ms/w (millisec/write)" or "ms/r"... high numbers here will mean likely latencies from hardware.

You should look at that from both the read and write perspective. Might yield some clues.

2) Does the activity rotate amongst the drives, or is it persistent on a single drive?

3) hmm... well let's leave it there for now.
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
1.) In both cases, the drives are busy (99%-100%), as indicated on the screenshots.

still be only pushing 1MB/sec by seeking heavily

There should be almost no seeking. As I said, I freshly created the arrays every time and I am writing linearly with dd (one time to the zvol device, one time to a file on the dataset)

Concerning the "ms/w (millisec/write)": As can be seen on the screenshots, the values are more high in the dataset case (800-900) and more varying and low in the zvol case (100-700).

You should look at that from both the read and write perspective. Might yield some clues.

Until now, I only observed writing. However, currently tested: Reading works although zvol is again very slower: 100MB/s compared to 170MB/s of the file in the dataset. In both cases, the drives are not at 100% but rougly around 50%. Example:

dataset-reading2.jpg

Does the activity rotate amongst the drives, or is it persistent on a single drive?

Seems that it is balanced across all drives.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The difference between "should be" and "is" not seeking sometimes turns out to be a factor. Those of us who've been working with storage for a long time often curse at the addition of abstractions that sometimes result in the unexpected. We used to be able to walk our disk subsystems around the floor by making them seek... now you often can't even hear them doing so.

I'm guessing, though, that the drives aren't maxxing out. Hard to see from your screenshot, but green is generally good, purple is busy, and red is mostly full busy. Judging from the lack of red let's guess that the drives aren't completely swamped or something like that. That's a good thing.

What tuning variables did you use, if any? At 4GB, you're theoretically under what is suggested for ZFS on FreeNAS, though generally it seems that the suggestion of 6-8GB is more one of the developers not focusing on the memory footprint than any actual inherent limitation. Could be a factor. I've found that insanely fast machine, lots of RAM, and slowish disks can be a real ZFS performance gotcha.

But...

I'm kind of leaning towards waving my hands a bit and suggesting that it's something like the zvol is doing sync writes, but that doesn't explain the lower read speeds. Can't figure out where the setting is, it doesn't seem to be listed under "zfs get all foo", but I also feel fine pointing you at that and letting you see if it gives you any further avenues to explore. I do actually have real work (of my own!) to do today ;-)
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
suggesting that it's something like the zvol is doing sync writes

I searched hard for this, no results. Except that these issues have been solved in FreeBSD 8.

Think I'll give up. Thanks for help.
FreeNAS/ZFS looked so promising :-(
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
Hmm, strange. I can't stop. Further research. Only using command line:

Code:
zpool create -m /mnt/plvl5i0 plvl5i0 raidz1 /dev/ada1 /dev/ada2 /dev/ada3
zfs create -V 500g plvl5i0/zvtest
dd if=/dev/zero of=/dev/zvol/plvl5i0/zvtest bs=2048k count=2000
4194304000 bytes transferred in 140.715098 secs (29807065 bytes/sec)


-> 28MB/s. Still not fast but ten times (!) the speed when the pool is created with the GUI. This yields a few questions:

1.) What is the exact command line the GUI uses to create the zpool and the zvol? There is nothing in the logs
2.) zdb does not have an entry for the new pool. Why? Also, so I can't check the value of ashift ...
3.) "zpool list" does not show an entry for ALTROOT. This is set to "/mnt" when the pool is created with the GUI. Why?

4.) If this really solved the problem, it is indeed FreeNAS related. Any ideas how this can be fixed then?

/EDIT: WTF???? If i use gnop to force 4096 blocks, I again get these 2,8MB/s. With command (i.e., mixing 512/4096), 28MB/s. However writing again directly to a file in the pool gives 61 MB/s. That's all so damn strange
 

divB

Dabbler
Joined
Aug 20, 2012
Messages
41
Hmm, maybe it's related, I read some posts but as I said they all should have been solved in FreeBSD 8.

But please tell me what's "obscure" with having a zvol on a raidz1??

If this is obscure, all the zvol stuff can be dropped completely!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hmm, maybe it's related, I read some posts but as I said they all should have been solved in FreeBSD 8.

But please tell me what's "obscure" with having a zvol on a raidz1??

If this is obscure, all the zvol stuff can be dropped completely!

"obscure" --> a really small number of people who have tried this

compare and contrast to CIFS, something most FreeNAS users use.
 
Status
Not open for further replies.
Top