Performance Bottleneck?

Nexitus · Apr 14, 2013

Hi,

I'm running FreeNAS 8.3.1 using iSCSI with the following hardware.

CPU: E-350
RAM: 2x4GB
HDD: 8x 2TB WD Green (10TB Raidz2)

I'm having a little bit of streaming issue with video. They're not lossless or anything with crazy high bit rate. But many 720p videos run a lot into buffering issues. With this hardware, I think that I should be okay.
However, with the issues I'm running into, I think that it maybe time to up the RAM?

Suggestions?

jgreco · Apr 14, 2013

More RAM might or might not help. Spend a little time to characterize the problem. While you are having issues, are the disks very busy? ("gstat" and "zpool iostat 1" are helpful tools) Is the system running out of CPU? ("top") etc.

You're using iSCSI. So what's serving the video? A filesystem layered on top of iSCSI may not be the friendliest way to make sure ZFS is able to perform well. Let's say you're running a full and fairly fragmented NTFS filesystem on that iSCSI drive. ZFS has no clue as to what the next blocks in the file you are streaming out is. It might speculatively read ahead, but if it is not reading the correct thing, so what? If the host mounting the iSCSI isn't doing any sort of read-ahead, then it might indeed pause when it asks for the blocks via iSCSI, and ZFS has to actually seek somewhere to access that data.

Nexitus · Apr 14, 2013

Thanks for the suggestions!

Truth be told, its only recently that I have been running into iSCSI issues and performance issues with the newer 8.3.1 versions (with constant ***ERROR*** lu_disk_lbwrite() failed). It might be the iSCSI that is broken at this point in time. However, how do you go abouts defragging a NTFS system that is running ZFS underneath? Treat it like a NTFS drive and defrag away?

jgreco · Apr 15, 2013

The whole lu_disk_lbwrite thing tends to be sluggishness with iSCSI, which in turn is caused by ZFS being insufficiently responsive, and if that's the case, see bug 1531. Not really a bug so much as it is that FreeNAS comes out of the box set for performance, not responsiveness. There is no "magic fix" but you can exchange performance for responsiveness and wind up with a system that is responsive under a scrub with read and write loads.

It isn't possible to meaningfully defrag a NTFS filesystem sitting on top of ZFS. ZFS reads ahead through mechanisms such as the DMU and the vdev cache. For ZFS based files, both of these together tend to result in efficient prefetch. However, being a CoW filesystem, blocks written to the iSCSI extent will tend to be scattered wherever the ZFS allocator finds space, so even in the case where NTFS has contiguous blocks 1, 2, and 3 and has to update block 2, on the underlying ZFS storage, that update won't be on a contiguous block on the ZFS pool. So all defragging will do is actually cause more fragmentation at the ZFS level.

My guess is that if you were to run arc_summary, you would see disappointing numbers under "DMU Efficiency".

Nexitus · Apr 15, 2013

This is my DMU Efficiency

Code:

DMU Efficiency:                                 610.23k                         
        Hit Ratio:                      98.06%  598.36k                         
        Miss Ratio:                     1.94%   11.87k                          
                                                                                
        Colinear:                               11.87k                          
          Hit Ratio:                    0.06%   7                               
          Miss Ratio:                   99.94%  11.86k                          
                                                                                
        Stride:                                 118.56k                         
          Hit Ratio:                    100.00% 118.56k                         
          Miss Ratio:                   0.00%   0

Should I be worried about Colinear Hit/Miss Ratio. Its obviously very bad from the looks of thing...

This is where "Top" is sitting at too. It doesn't make any sense to me...

Code:

last pid:  4183;  load averages:  1.90,  1.80,  1.67    up 0+00:36:02  19:15:30 
28 processes:  1 running, 27 sleeping                                           
CPU:  4.1% user,  0.0% nice, 35.0% system,  8.3% interrupt, 52.6% idle          
Mem: 149M Active, 60M Inact, 3918M Wired, 2124K Cache, 192M Buf, 3380M Free     
ARC: 3407M Total, 1926M MFU, 1256M MRU, 16K Anon, 157M Header, 67M Other        
Swap: 16G Total, 16G Free

jgreco · Apr 16, 2013

No, the percentages actually look pretty okay, but the numbers themselves seem like they're too low. Are you maybe using a device extent instead of a file extent?

The top output seems to suggest your system is a bit busier than I would expect for merely serving iSCSI at a leisurely pace. What does it say is eating all the CPU?

Nexitus · Apr 16, 2013

Code:

last pid: 12458;  load averages:  1.44,  1.31,  1.37    up 0+12:37:52  07:17:20 
26 processes:  2 running, 24 sleeping                                           
CPU:  5.0% user,  0.0% nice, 37.1% system,  9.9% interrupt, 47.9% idle          
Mem: 153M Active, 57M Inact, 4029M Wired, 2124K Cache, 199M Buf, 3268M Free     
ARC: 3333M Total, 29M MFU, 3123M MRU, 304K Anon, 131M Header, 49M Other         
Swap: 16G Total, 16G Free                                                       
                                                                                
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND    
 2544 root          5  47    0 34724K 20292K RUN     0  51:28  5.18% istgt      
12026 root          2  44    0 52392K 13400K select  1   0:07  1.27% python     
 2740 root          6  45    0   189M 98468K uwait   0   1:19  0.68% python     
 2895 root          7  44    0 70560K 10788K ucond   0   0:38  0.00% collectd   
 2218 root          1  44    0  6784K  1464K select  1   0:08  0.00% syslogd    
 2465 root          1  44    0 11672K  2840K select  0   0:03  0.00% ntpd       
 3711 root          1  76    0 86780K 34680K ttyin   0   0:01  0.00% python     
 3703 root          1  76    0 86780K 34680K ttyin   1   0:01  0.00% python     
 4952 www           1  44    0 14408K  4840K kqread  0   0:01  0.00% nginx      
 3015 root          1  76    0  7840K  1508K nanslp  0   0:00  0.00% cron       
 3411 root          1  44    0  7844K  1552K select  0   0:00  0.00% rpcbind    
12435 root          1  44    0  9240K  2156K CPU1    1   0:00  0.00% top        
12433 root          1  44    0  7024K  2744K wait    1   0:00  0.00% bash       
 2799 root          1  44    0 14408K  4356K pause   0   0:00  0.00% nginx      
 1995 root          1  44    0  5252K  3200K select  0   0:00  0.00% devd       
 3707 root          1  76    0  6780K  1272K ttyin   0   0:00  0.00% getty

Here is the full "Top". This is a device extent that is setup.

jgreco · Apr 19, 2013

I think your system is just getting very busy with I/O. I see high load averages but a lot of system time.

Might try running "gstat" and seeing how busy the disks are when it burps.

papageorgi · Aug 1, 2013

i hope you found your solution but incase you have not, i had a similar issue and gstat did show me one drive was always in the red or very busy, and once it was replaced the issue was resolved.

Important Announcement for the TrueNAS Community.

Performance Bottleneck?

Nexitus

Dabbler

jgreco

Resident Grinch

Nexitus

Dabbler

jgreco

Resident Grinch

Nexitus

Dabbler

jgreco

Resident Grinch

Nexitus

Dabbler

jgreco

Resident Grinch

papageorgi

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Performance Bottleneck?

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Performance Bottleneck?"

Similar threads