Huge IOwait Issues With VMs

brando56894 · Jun 18, 2019

I decided to move back over to FreeNAS since I've always liked ZFS and have never found a better UI, compared to whatever Linux has to offer, even though I'm a Linux guy at heart. I run all the standard Usenet apps, and lately I've taken to running them in Docker. I haven't really messed with Rancher that much and just wanted to get things up and running, so I created an Arch VM and setup everything in there with my compose scripts.

It was fine at first but when I had 15 containers running and downloading a bunch of things at 1 Gbps stuff started to slow to a crawl and freeze. I noticed I had set the storage driver to AHCI instead of VirtIO, so I reset that and rebuilt my initrd...no real change. I then realized that all my write activity was based in my pool, which only has 2 VDEVs (RAIDZ2, 6 drives wide), so I created a new pool with a Samsung 970 Pro NVMe drive and moved the VM ZVOL over to that, initially there was a large difference, but once again once everything picked up it would slow to a crawl. I moved the downloads directory from the rotational pool to the NVMe pool and download speeds and performance picked up for a bit...and then started to suffer again.

I look at htop and unrar, docker, sonarr and usually nzbget are all in state D which is "waiting for IO"

Theoretically I should have no issues with IO since the NVMe bus can handle a ton, and I haven't had any issues in Linux, but FreeBSD seems to be different. I did an uncompressed test write in FreeNAS itself and "only" about 700 MB/sec, which is nothing to shake a stick at. I think the Bhyve storage drivers are to blame for the poor performance.

Hardware
Asus X99-WS/IPMI
Liquid cooled Xeon E5-1650 (6 cores/12 Threads @ 3.7 GHz)
32 GB Samsung DDR4 ECC RAM
EVGA 1KW PSU
EVGA GeForce GTX 1070
NZXT H440 Case
Storage: 42 TB RAIDZ2 Pool: 6x 6TB HDDs (VDEV1), 6x 6TB (VDEV2)
OS: Samsung 840 Pro SSD
Flash: Samsung 970 Pro NVMe for VM storage and usenet downloads zvol for the VM

I'm still getting call traces too, which I've only ever seen when virtualizing Linux under Bhyve. Is it still pretty poor compared to Linux KVM? Ever since it was integrated back in late 9.x or 10 I've never really had any luck with it. I could do FreeBSD Jails, but I'm aware the push is over to Rancher, the performance just seems to be severely lacking.

Code:

tsc: Marking TSC unstable due to clocksource watchdog
[Tue Jun 18 14:25:58 2019] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[Tue Jun 18 14:25:58 2019] sched_clock: Marking unstable (226027407471, -20007071)<-(226197930906, -190530541)
[Tue Jun 18 14:25:58 2019] clocksource: Switched to clocksource hpet
[Tue Jun 18 14:28:20 2019] INFO: task kworker/3:0:36 blocked for more than 122 seconds.
[Tue Jun 18 14:28:20 2019]       Not tainted 5.1.9-arch1-1-ARCH #1
[Tue Jun 18 14:28:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 18 14:28:20 2019] kworker/3:0     D    0    36      2 0x80000000
[Tue Jun 18 14:28:20 2019] Workqueue: events slab_caches_to_rcu_destroy_workfn
[Tue Jun 18 14:28:20 2019] Call Trace:
[Tue Jun 18 14:28:20 2019]  ? __schedule+0x30b/0x8b0
[Tue Jun 18 14:28:20 2019]  ? insert_work+0x39/0x70
[Tue Jun 18 14:28:20 2019]  schedule+0x32/0x80
[Tue Jun 18 14:28:20 2019]  schedule_timeout+0x311/0x4a0
[Tue Jun 18 14:28:20 2019]  ? _raw_spin_lock_irqsave+0x25/0x50
[Tue Jun 18 14:28:20 2019]  ? preempt_count_add+0x79/0xb0
[Tue Jun 18 14:28:20 2019]  ? _raw_spin_lock_irqsave+0x25/0x50
[Tue Jun 18 14:28:20 2019]  wait_for_common+0x15f/0x190
[Tue Jun 18 14:28:20 2019]  ? wake_up_q+0x70/0x70
[Tue Jun 18 14:28:20 2019]  rcu_barrier+0x169/0x1d0
[Tue Jun 18 14:28:20 2019]  slab_caches_to_rcu_destroy_workfn+0x8f/0xe0
[Tue Jun 18 14:28:20 2019]  process_one_work+0x1eb/0x410
[Tue Jun 18 14:28:20 2019]  worker_thread+0x2d/0x3d0
[Tue Jun 18 14:28:20 2019]  ? process_one_work+0x410/0x410
[Tue Jun 18 14:28:20 2019]  kthread+0x112/0x130
[Tue Jun 18 14:28:20 2019]  ? kthread_park+0x80/0x80
[Tue Jun 18 14:28:20 2019]  ret_from_fork+0x35/0x40
[Tue Jun 18 14:30:23 2019] INFO: task kworker/3:0:36 blocked for more than 245 seconds.
[Tue Jun 18 14:30:23 2019]       Not tainted 5.1.9-arch1-1-ARCH #1
[Tue Jun 18 14:30:23 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 18 14:30:23 2019] kworker/3:0     D    0    36      2 0x80000000
[Tue Jun 18 14:30:23 2019] Workqueue: events slab_caches_to_rcu_destroy_workfn
[Tue Jun 18 14:30:23 2019] Call Trace:
[Tue Jun 18 14:30:23 2019]  ? __schedule+0x30b/0x8b0
[Tue Jun 18 14:30:23 2019]  ? insert_work+0x39/0x70
[Tue Jun 18 14:30:23 2019]  schedule+0x32/0x80
[Tue Jun 18 14:30:23 2019]  schedule_timeout+0x311/0x4a0
[Tue Jun 18 14:30:23 2019]  ? _raw_spin_lock_irqsave+0x25/0x50
[Tue Jun 18 14:30:23 2019]  ? preempt_count_add+0x79/0xb0
[Tue Jun 18 14:30:23 2019]  ? _raw_spin_lock_irqsave+0x25/0x50
[Tue Jun 18 14:30:23 2019]  wait_for_common+0x15f/0x190
[Tue Jun 18 14:30:23 2019]  ? wake_up_q+0x70/0x70
[Tue Jun 18 14:30:23 2019]  rcu_barrier+0x169/0x1d0
[Tue Jun 18 14:30:23 2019]  slab_caches_to_rcu_destroy_workfn+0x8f/0xe0
[Tue Jun 18 14:30:23 2019]  process_one_work+0x1eb/0x410
[Tue Jun 18 14:30:23 2019]  worker_thread+0x2d/0x3d0
[Tue Jun 18 14:30:23 2019]  ? process_one_work+0x410/0x410
[Tue Jun 18 14:30:23 2019]  kthread+0x112/0x130
[Tue Jun 18 14:30:23 2019]  ? kthread_park+0x80/0x80
[Tue Jun 18 14:30:23 2019]  ret_from_fork+0x35/0x40
[Tue Jun 18 14:31:05 2019] audit: type=1130 audit(1560882666.686:360): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=haveged comm="systemd" exe="/usr/lib/systemd/systemd" hostname=?>
[Tue Jun 18 14:32:26 2019] INFO: task kworker/3:0:36 blocked for more than 368 seconds.
[Tue Jun 18 14:32:26 2019]       Not tainted 5.1.9-arch1-1-ARCH #1
[Tue Jun 18 14:32:26 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 18 14:32:26 2019] kworker/3:0     D    0    36      2 0x80000000
[Tue Jun 18 14:32:26 2019] Workqueue: events slab_caches_to_rcu_destroy_workfn
[Tue Jun 18 14:32:26 2019] Call Trace:
[Tue Jun 18 14:32:26 2019]  ? __schedule+0x30b/0x8b0
[Tue Jun 18 14:32:26 2019]  ? insert_work+0x39/0x70
[Tue Jun 18 14:32:26 2019]  schedule+0x32/0x80
[Tue Jun 18 14:32:26 2019]  schedule_timeout+0x311/0x4a0
[Tue Jun 18 14:32:26 2019]  ? _raw_spin_lock_irqsave+0x25/0x50
[Tue Jun 18 14:32:26 2019]  ? preempt_count_add+0x79/0xb0
[Tue Jun 18 14:32:26 2019]  ? _raw_spin_lock_irqsave+0x25/0x50
[Tue Jun 18 14:32:26 2019]  wait_for_common+0x15f/0x190
[Tue Jun 18 14:32:26 2019]  ? wake_up_q+0x70/0x70
[Tue Jun 18 14:32:26 2019]  rcu_barrier+0x169/0x1d0
[Tue Jun 18 14:32:26 2019]  slab_caches_to_rcu_destroy_workfn+0x8f/0xe0
[Tue Jun 18 14:32:26 2019]  process_one_work+0x1eb/0x410
[Tue Jun 18 14:32:26 2019]  worker_thread+0x2d/0x3d0
[Tue Jun 18 14:32:26 2019]  ? process_one_work+0x410/0x410
[Tue Jun 18 14:32:26 2019]  kthread+0x112/0x130
[Tue Jun 18 14:32:26 2019]  ? kthread_park+0x80/0x80
[Tue Jun 18 14:32:26 2019]  ret_from_fork+0x35/0x40
[Tue Jun 18 14:34:28 2019] INFO: task kworker/3:0:36 blocked for more than 491 seconds.
[Tue Jun 18 14:34:28 2019]       Not tainted 5.1.9-arch1-1-ARCH #1
[Tue Jun 18 14:34:28 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 18 14:34:28 2019] kworker/3:0     D    0    36      2 0x80000000
[Tue Jun 18 14:34:28 2019] Workqueue: events slab_caches_to_rcu_destroy_workfn
[Tue Jun 18 14:34:28 2019] Call Trace:
[Tue Jun 18 14:34:28 2019]  ? __schedule+0x30b/0x8b0
[Tue Jun 18 14:34:28 2019]  ? insert_work+0x39/0x70
[Tue Jun 18 14:34:28 2019]  schedule+0x32/0x80
[Tue Jun 18 14:34:28 2019]  schedule_timeout+0x311/0x4a0
[Tue Jun 18 14:34:28 2019]  ? _raw_spin_lock_irqsave+0x25/0x50
[Tue Jun 18 14:34:28 2019]  ? preempt_count_add+0x79/0xb0
[Tue Jun 18 14:34:28 2019]  ? _raw_spin_lock_irqsave+0x25/0x50
[Tue Jun 18 14:34:28 2019]  wait_for_common+0x15f/0x190
[Tue Jun 18 14:34:28 2019]  ? wake_up_q+0x70/0x70
[Tue Jun 18 14:34:28 2019]  rcu_barrier+0x169/0x1d0
[Tue Jun 18 14:34:28 2019]  slab_caches_to_rcu_destroy_workfn+0x8f/0xe0
[Tue Jun 18 14:34:28 2019]  process_one_work+0x1eb/0x410
[Tue Jun 18 14:34:28 2019]  worker_thread+0x2d/0x3d0
[Tue Jun 18 14:34:28 2019]  ? process_one_work+0x410/0x410
[Tue Jun 18 14:34:28 2019]  kthread+0x112/0x130
[Tue Jun 18 14:34:28 2019]  ? kthread_park+0x80/0x80
[Tue Jun 18 14:34:28 2019]  ret_from_fork+0x35/0x40
[Tue Jun 18 14:35:06 2019] audit: type=1325 audit(1560882907.285:361): table=nat family=2 entries=63
[Tue Jun 18 14:35:06 2019] audit: type=1325 audit(1560882907.292:362): table=filter family=2 entries=59
[Tue Jun 18 14:35:06 2019] audit: type=1325 audit(1560882907.295:363): table=nat family=2 entries=62
[Tue Jun 18 14:35:06 2019] veth8b35267: renamed from eth0

dlavigne · Jun 27, 2019

Have you considered reporting this upstream for the bhyve devs?

Important Announcement for the TrueNAS Community.

Huge IOwait Issues With VMs

brando56894

Wizard

dlavigne

Guest

Similar threads