How to troubleshoot and avoid disk overload?

Status
Not open for further replies.

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
I'm testing the viability and performance of FreeNAS as game storage using a 2x4TB Seagate NAS mirrored pool. If it works well enough, I am planning to move 4x640GB WD Black drives to FreeNAS and run them in a striped mirror (RAID10) pool, mainly for game storage.

Currently I'm testing the pool with Dishonored, installed from Steam, on a Windows 7 PC. The pool is newly created, and only filled to 2% (94.6 GB out of 3.4 TB), using lz4 compression, dedupe off, and atime off. I have no other pool activity concurrent with running the game. I monitor the drives with arcstat, zilstat, and mainly gstat.

FreeNAS is running from a VM in ESXi 6.0, with 10 GB RAM and pass-through of an LSI SAS9212-4i4e in IT mode. Two 4TB Seagate NAS drives form the mirrored pool. The pool is shared over CIFS.

The game loads fine and well, no discernable difference in speed compared with a local HDD. Pool, according to gstat, does not get overloaded in this scenario. The issue I have arise from on-demand loading of certain in-game assets. Especially noticable during in-game cut scenes with spoken dialog. Most of the asset files are not particularly large.

The relevant data is requested from the network storage (over CIFS), but is not delivered for several seconds (typically 4-5 seconds!). When the data is finally loaded in the game, I finally see activity in the gstat monitoring window. As gstat reports it, the ops/s is low (2-4), the data read is not very large (10-30 MB), and yet the %busy is in the hundreds. With atime on, it showed both disks at around 300-400% busy. With atime off, it shows often only one disk as busy, the delay remains fairly constant, and the %busy is increased to 800-1100%.

ARC usage is hovering around 3-4 GB.

Initially I had atime on, however switching it off didn't produce any noticable improvement. I've also tested with adding one Intel X25-M 80GB SSD partitioned to 20GB as cache, as well as one 2GB VM drive from VMware's SSD datastore (Samsung 850 Pro 240 GB @ 20% additional OP) as log. Neither has changed the situation.

Changing vfs.zfs.txg.timeout from 5 (default) to 1 did not help.

Performance in general is fine. Even downloading via torrents, using qBittorrent from a Windows 10 VM, can easily reach 20-25 MB/s during downloads (basically maxing out my 250 Mbps download speed) and display 25-35% busy in gstat.

How can I proceed with troubleshooting this extreme read delay (4-5 seconds)?

Are there any relevant tunables I can tweak?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
And it has the same problem when FreeNAS is running on the bare metal?
 

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
Yes.

I just verified that I see the same behavior with FreeNAS running from a USB drive, on bare metal, with the same pool.
 

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
Here's one example from log.smbd (full logging enabled), when loading a UI info box for a particular skill. Many are loaded instantly. This one took several seconds the first time (and subsequent loads are instant once it's loaded in RAM on my Windows PC, or ARC on the NAS.)

Note the 8 second delay between (open_file) @ 21:17:09, and first (smb2_read_complete) @ 21:17:17.

This is not always the case with the delayed asset loading, however. All previous open_file were quick, even though there were some noticable delays here and there. Some open_file also had a higher numopen count. It is merely the most obvious example I have so far found.

Code:
[2016/02/15 21:17:09.149925,  3] ../source3/smbd/vfs.c:1137(check_reduced_name)
  check_reduced_name [SteamLibrary/steamapps/common/Dishonored/Binaries/Win32/UI_Tuto_ShadowKill_SF] [/mnt/data_mirrored/homes/logan893]
[2016/02/15 21:17:09.150099,  3] ../source3/smbd/vfs.c:1267(check_reduced_name)
  check_reduced_name: SteamLibrary/steamapps/common/Dishonored/Binaries/Win32/UI_Tuto_ShadowKill_SF reduced to /mnt/data_mirrored/homes/logan893/SteamLibrary/steamapps/common/Dishonored/Binaries/Win32/UI_Tuto_ShadowKill_SF
[2016/02/15 21:17:09.150182,  3] ../source3/smbd/dosmode.c:163(unix_mode)
  unix_mode(SteamLibrary/steamapps/common/Dishonored/Binaries/Win32/UI_Tuto_ShadowKill_SF) returning 0666
[2016/02/15 21:17:09.169369,  3] ../source3/smbd/vfs.c:1137(check_reduced_name)
  check_reduced_name [SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk.uncompressed_size] [/mnt/data_mirrored/homes/logan893]
[2016/02/15 21:17:09.169537,  3] ../source3/smbd/vfs.c:1267(check_reduced_name)
  check_reduced_name: SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk.uncompressed_size reduced to /mnt/data_mirrored/homes/logan893/SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk.uncompressed_size
[2016/02/15 21:17:09.169647,  3] ../source3/smbd/dosmode.c:163(unix_mode)
  unix_mode(SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk.uncompressed_size) returning 0666
[2016/02/15 21:17:09.170644,  3] ../source3/smbd/vfs.c:1137(check_reduced_name)
  check_reduced_name [SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk] [/mnt/data_mirrored/homes/logan893]
[2016/02/15 21:17:09.170725,  3] ../source3/smbd/vfs.c:1267(check_reduced_name)
  check_reduced_name: SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk reduced to /mnt/data_mirrored/homes/logan893/SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk
[2016/02/15 21:17:09.170777,  3] ../source3/smbd/dosmode.c:163(unix_mode)
  unix_mode(SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk) returning 0666
[2016/02/15 21:17:09.170962,  2] ../source3/smbd/open.c:1002(open_file)
  logan893 opened file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk read=Yes write=No (numopen=232)
[2016/02/15 21:17:17.167488,  3] ../source3/smbd/smb2_read.c:391(smb2_read_complete)
  smbd_smb2_read: fnum 2187328483, file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk, length=32768 offset=0 read=32768
[2016/02/15 21:17:17.167720,  3] ../source3/smbd/smb2_read.c:391(smb2_read_complete)
  smbd_smb2_read: fnum 1433264082, file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/Bank_UI.pck, length=65536 offset=2031616 read=65536
[2016/02/15 21:17:17.169596,  3] ../source3/smbd/smb2_read.c:391(smb2_read_complete)
  smbd_smb2_read: fnum 2187328483, file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk, length=32040 offset=40960 read=32040
[2016/02/15 21:17:17.170631,  3] ../source3/smbd/smb2_read.c:391(smb2_read_complete)
  smbd_smb2_read: fnum 1433264082, file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/Bank_UI.pck, length=4096 offset=2097152 read=4096
[2016/02/15 21:17:17.170718,  3] ../source3/smbd/smb2_read.c:391(smb2_read_complete)
  smbd_smb2_read: fnum 1433264082, file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/Bank_UI.pck, length=61440 offset=2101248 read=61440
[2016/02/15 21:17:17.175669,  3] ../source3/smbd/vfs.c:1137(check_reduced_name)
  check_reduced_name [SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk.uncompressed_size] [/mnt/data_mirrored/homes/logan893]
[2016/02/15 21:17:17.175804,  3] ../source3/smbd/vfs.c:1267(check_reduced_name)
  check_reduced_name: SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk.uncompressed_size reduced to /mnt/data_mirrored/homes/logan893/SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk.uncompressed_size
[2016/02/15 21:17:17.175896,  3] ../source3/smbd/dosmode.c:163(unix_mode)
  unix_mode(SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk.uncompressed_size) returning 0666
[2016/02/15 21:17:17.176703,  3] ../source3/smbd/smb2_read.c:391(smb2_read_complete)
  smbd_smb2_read: fnum 2187328483, file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/UI_Tuto_ShadowKill_SF.upk, length=8192 offset=32768 read=8192
 

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
Some further debug logging of samba shows more detail surrounding the stalling reads. Below is an example of a 3.5 second stall (very common length, from my highly empirical observation of gstat "ms/r" column when the problem is triggering.)

Code:
[2016/02/15 22:28:48.443640, 10, pid=12476, effective(1001, 1001), real(0, 0)] ../source3/smbd/smb2_server.c:1778(smbd_smb2_request_verify_creditcharge)
  mid 4922, CreditCharge: 1, NeededCharge: 1
[2016/02/15 22:28:48.443653, 10, pid=12476, effective(1001, 1001), real(0, 0)] ../source3/smbd/smb2_read.c:439(smbd_smb2_read_send)
  smbd_smb2_read: SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/Textures.tfc - fnum 2262348439
[2016/02/15 22:28:48.443667, 10, pid=12476, effective(1001, 1001), real(0, 0)] ../source3/smbd/aio.c:702(schedule_smb2_aio_read)
  smb2: read size (4096) too small for minimum aio_read of 0
[2016/02/15 22:28:48.443682, 10, pid=12476, effective(1001, 1001), real(0, 0), class=vfs] ../source3/modules/vfs_streams_xattr.c:906(streams_xattr_pread)
  streams_xattr_pread: offset=1208578048, size=4096
[2016/02/15 22:28:51.904322, 10, pid=12476, effective(1001, 1001), real(0, 0)] ../source3/smbd/fileio.c:98(read_file)
  read_file (SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/Textures.tfc): pos = 1208578048, size = 4096, returned 4096
[2016/02/15 22:28:51.904386, 10, pid=12476, effective(1001, 1001), real(0, 0)] ../source3/smbd/smb2_read.c:557(smbd_smb2_read_send)
  smbd_smb2_read: file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/Textures.tfc, fnum 2262348439, offset=1208578048 len=4096 returned 4096
[2016/02/15 22:28:51.904407,  3, pid=12476, effective(1001, 1001), real(0, 0)] ../source3/smbd/smb2_read.c:391(smb2_read_complete)
  smbd_smb2_read: fnum 2262348439, file SteamLibrary/steamapps/common/Dishonored/DishonoredGame/CookedPCConsole/Textures.tfc, length=4096 offset=1208578048 read=4096
[2016/02/15 22:28:51.904428, 10, pid=12476, effective(1001, 1001), real(0, 0)] ../source3/smbd/smb2_server.c:2502(smbd_smb2_request_done_ex)
  smbd_smb2_request_done_ex: idx[1] status[NT_STATUS_OK] body[16] dyn[yes:4096] at ../source3/smbd/smb2_read.c:154
[2016/02/15 22:28:51.904446, 10, pid=12476, effective(1001, 1001), real(0, 0)] ../source3/smbd/smb2_server.c:873(smb2_set_operation_credit)
  smb2_set_operation_credit: requested 1, charge 1, granted 1, current possible/max 1/512, total granted/max/low/range 512/8192/4923/512


Samba is not using async IO at all with default configuration. Perhaps I'll look into turning it on.

Any other ideas?
 

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
Preliminarily the problem is caused, or at least impacted, by (in my oppinion overly aggressive) drive sleep functionality, possibly specific to these Seagate NAS drives, with Advanced Power Management at 127 or lower.

During yesterday's continued troubleshooting I decided to run a "long" SMART test, and after it was finished I noticed that even just reading the SMART data (smartctl -a /dev/daX) would take several seconds to get going the first time. Subsequent runs would be instantaneous, and waiting 20 seconds or more again caused the delay.

I was using a setting of 127 to enable spin-down, as part of my burn-in test until I put the drives into "production". Reading the SMART data after the initial delay, the "Load Cycle Count" (variable 193) is the only variable stepped compared with previous read. However it is not merely due to drive head parking. With an APM setting of 128, the drive heads are also parked (after 30 seconds, instead of 20), but the SMART command execution is just barely delayed.

HDD Standby is set to 20 (minutes), and "Start Stop Count" (variable 4) does not step in either case, as expected.

Drives: Seagate NAS 4TB ST4000VN000
Settings during following test:
da1 APM=127
da2 APM=128

APM=127; First run after 20+ seconds idle:
Code:
# date; time smartctl -a /dev/da1 > /dev/null
Wed Feb 17 09:25:32 CET 2016
0.010u 0.000s 0:03.61 0.2%  976+3376k 0+0io 0pf+0w


APM=127; Second run, within 20 seconds of first run:
Code:
# date ; time smartctl -a /dev/da1 > /dev/null
Wed Feb 17 09:25:48 CET 2016
0.010u 0.000s 0:00.18 5.5%  488+1688k 0+0io 0pf+0w


APM=128; First run after 30+ seconds idle:
Code:
# date ; time smartctl -a /dev/da2 > /dev/null
Wed Feb 17 09:25:56 CET 2016
0.005u 0.005s 0:00.39 0.0%  0+0k 0+0io 0pf+0w


APM=128; Second run, within 30 seconds of first run:
Code:
# date ; time smartctl -a /dev/da2 > /dev/null
Wed Feb 17 09:26:05 CET 2016
0.000u 0.010s 0:00.19 5.2%  0+0k 0+0io 0pf+0w


Tonight I will turn off APM completely and hopefully the game will run smoothly.
To be continued...
 
Last edited:

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
I dont know many people who run games via smb.. as for apm and sleep stuff should all be OFF.. i wouldn't of changed anything related to that as it'll skew your results.. go back to defaults.. if its still slow beset the server for raw game storage versus loading games..
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I switched my entire steam library from a wd black to my nas and couldn't be happier. It's way faster than a hdd when it comes to random access. Verifying the local data went from 30min for some games to just 5min.

Why on earth did you put your drives to sleep? If you care about performance this is just something you don't do. Also putting them to sleep will kill them sooner.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Is this the actual steam library? I apologize I didn't see that.. So what is in the library? Seems like physical game data?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Is this the actual steam library? I apologize I didn't see that.. So what is in the library? Seems like physical game data?
at the start they said 'testing with a game installed from steam' so i'm assuming they installed it to a network drive.

I recently moved my games over(2 days ago) and have had good success so far. Most people will say the performance will suck because of network bottleneck but when using a HDD to load games you never get 100MB/s anyways. With my nas i easily get 100MB/s because it has multiple spindles and is better at random IO. Now a SSD will be best if you can get one big enough, in my case HDD<NAS<SSD.
 

logan893

Dabbler
Joined
Dec 31, 2015
Messages
44
Game performs without a hitch after switching off APM.

I initially set the APM as part of my burn-in test, to generate a few Start/Stop cycles, and drive head parkings, over several weeks of light to medium use. I had not expected it to be this detrimental to performance, especially for the highest APM setting which still enables spindown. I'm relieved that the debilitating periodic latency (and "disk overload") is not permanent.

My vision is to achieve basically the same, and preferably better, performance compared with local single HDD storage, for games. I expect a multi-drive pool in my NAS will provide good read and better write and random IO performance, as SweetAndLow appears to have already experienced. After the game assets have been loaded from the pool once or twice, especially for games played often, subsequent requests for the same assets should be very quick when served from ARC or SSD L2ARC.

Even if the performance were not at the level of local HDD storage, as long as it is consistent and smooth, I would very much consider using the NAS for most games. I want to remove the spinning drives from my otherwise cool and quiet desktop PC. And perhaps the next computer I buy will be smaller, or portable.

I added the mounted network drive as an alternative Steam library folder, and so far only installed Dishonored there. I wonder if I can share the same network drive based Steam library folder between PCs...
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I wonder if I can share the same network drive based Steam library folder between PCs...
Yes you can, not sure if it requires anything special though. It should just work.
 
Status
Not open for further replies.
Top