I'm testing the viability and performance of FreeNAS as game storage using a 2x4TB Seagate NAS mirrored pool. If it works well enough, I am planning to move 4x640GB WD Black drives to FreeNAS and run them in a striped mirror (RAID10) pool, mainly for game storage.
Currently I'm testing the pool with Dishonored, installed from Steam, on a Windows 7 PC. The pool is newly created, and only filled to 2% (94.6 GB out of 3.4 TB), using lz4 compression, dedupe off, and atime off. I have no other pool activity concurrent with running the game. I monitor the drives with arcstat, zilstat, and mainly gstat.
FreeNAS is running from a VM in ESXi 6.0, with 10 GB RAM and pass-through of an LSI SAS9212-4i4e in IT mode. Two 4TB Seagate NAS drives form the mirrored pool. The pool is shared over CIFS.
The game loads fine and well, no discernable difference in speed compared with a local HDD. Pool, according to gstat, does not get overloaded in this scenario. The issue I have arise from on-demand loading of certain in-game assets. Especially noticable during in-game cut scenes with spoken dialog. Most of the asset files are not particularly large.
The relevant data is requested from the network storage (over CIFS), but is not delivered for several seconds (typically 4-5 seconds!). When the data is finally loaded in the game, I finally see activity in the gstat monitoring window. As gstat reports it, the ops/s is low (2-4), the data read is not very large (10-30 MB), and yet the %busy is in the hundreds. With atime on, it showed both disks at around 300-400% busy. With atime off, it shows often only one disk as busy, the delay remains fairly constant, and the %busy is increased to 800-1100%.
ARC usage is hovering around 3-4 GB.
Initially I had atime on, however switching it off didn't produce any noticable improvement. I've also tested with adding one Intel X25-M 80GB SSD partitioned to 20GB as cache, as well as one 2GB VM drive from VMware's SSD datastore (Samsung 850 Pro 240 GB @ 20% additional OP) as log. Neither has changed the situation.
Changing vfs.zfs.txg.timeout from 5 (default) to 1 did not help.
Performance in general is fine. Even downloading via torrents, using qBittorrent from a Windows 10 VM, can easily reach 20-25 MB/s during downloads (basically maxing out my 250 Mbps download speed) and display 25-35% busy in gstat.
How can I proceed with troubleshooting this extreme read delay (4-5 seconds)?
Are there any relevant tunables I can tweak?
Currently I'm testing the pool with Dishonored, installed from Steam, on a Windows 7 PC. The pool is newly created, and only filled to 2% (94.6 GB out of 3.4 TB), using lz4 compression, dedupe off, and atime off. I have no other pool activity concurrent with running the game. I monitor the drives with arcstat, zilstat, and mainly gstat.
FreeNAS is running from a VM in ESXi 6.0, with 10 GB RAM and pass-through of an LSI SAS9212-4i4e in IT mode. Two 4TB Seagate NAS drives form the mirrored pool. The pool is shared over CIFS.
The game loads fine and well, no discernable difference in speed compared with a local HDD. Pool, according to gstat, does not get overloaded in this scenario. The issue I have arise from on-demand loading of certain in-game assets. Especially noticable during in-game cut scenes with spoken dialog. Most of the asset files are not particularly large.
The relevant data is requested from the network storage (over CIFS), but is not delivered for several seconds (typically 4-5 seconds!). When the data is finally loaded in the game, I finally see activity in the gstat monitoring window. As gstat reports it, the ops/s is low (2-4), the data read is not very large (10-30 MB), and yet the %busy is in the hundreds. With atime on, it showed both disks at around 300-400% busy. With atime off, it shows often only one disk as busy, the delay remains fairly constant, and the %busy is increased to 800-1100%.
ARC usage is hovering around 3-4 GB.
Initially I had atime on, however switching it off didn't produce any noticable improvement. I've also tested with adding one Intel X25-M 80GB SSD partitioned to 20GB as cache, as well as one 2GB VM drive from VMware's SSD datastore (Samsung 850 Pro 240 GB @ 20% additional OP) as log. Neither has changed the situation.
Changing vfs.zfs.txg.timeout from 5 (default) to 1 did not help.
Performance in general is fine. Even downloading via torrents, using qBittorrent from a Windows 10 VM, can easily reach 20-25 MB/s during downloads (basically maxing out my 250 Mbps download speed) and display 25-35% busy in gstat.
How can I proceed with troubleshooting this extreme read delay (4-5 seconds)?
Are there any relevant tunables I can tweak?