Just want to start by saying I have searched the forum on performance issues for the past few months and they all come back to failing drives. I have tried all the smart tests etc and all the drives seem healthy. So i feel like its time to reach out to see if anyone can help.
The issue at hand... Copying large amounts of files to the system (200GB+), the system will burst extremely fast (900MB/s+) for a few seconds as expected. Then the speeds will settle down to about 250MB/s for about a minute. Then the fun begins. The system will become unresponsive. The drive activity lights will stop flashing, ssh will disconnect, the WebGUI becomes unresponsive, jails hang or die. It will stay hung for upwards of 5 minutes sometimes causing applications to fail along with any file transfers. I have been scratching my head on this as I cannot figure out whats causing it. Just as a frame of reference, when I first got the box and it had windows server 2012R2 installed, I could sustain 250MB/s to it all day without a single hiccup. Now that I enabled JBOD and installed FreeNAS the system is extremely unstable. Looking for help troubleshooting the hang. Searched logs and don't see anything obvious so any help would be appreciated!
System build...
Intel dual SFP+ card configured for LACP to Nexus Core switching
I have 2 500GB HDDs connected to motherboard Sata and zfs mirrored. Then I have 24 seagate constellations configured as follows... The raidz2 is required for drive loss requirements set by others...
During the hang I cannot check gstat as the system is completely unresponsive. Again, I'm at a loss on how to trouble shoot this further.
Thanks in advance!
The issue at hand... Copying large amounts of files to the system (200GB+), the system will burst extremely fast (900MB/s+) for a few seconds as expected. Then the speeds will settle down to about 250MB/s for about a minute. Then the fun begins. The system will become unresponsive. The drive activity lights will stop flashing, ssh will disconnect, the WebGUI becomes unresponsive, jails hang or die. It will stay hung for upwards of 5 minutes sometimes causing applications to fail along with any file transfers. I have been scratching my head on this as I cannot figure out whats causing it. Just as a frame of reference, when I first got the box and it had windows server 2012R2 installed, I could sustain 250MB/s to it all day without a single hiccup. Now that I enabled JBOD and installed FreeNAS the system is extremely unstable. Looking for help troubleshooting the hang. Searched logs and don't see anything obvious so any help would be appreciated!
System build...
Intel dual SFP+ card configured for LACP to Nexus Core switching
Code:
OS Version: FreeNAS-11.2-U8 (Build Date: Feb 14, 2020 15:55) Processor: Intel(R) Xeon(R) CPU E5607 @ 2.27GHz (8 cores) Memory: 72 GiB
I have 2 500GB HDDs connected to motherboard Sata and zfs mirrored. Then I have 24 seagate constellations configured as follows... The raidz2 is required for drive loss requirements set by others...
Code:
root@freenas[~]# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT Data 43.5T 15.1T 28.4T - - 24% 34% 1.48x ONLINE /mnt freenas-boot 464G 2.98G 461G - - - 0% 1.00x ONLINE - root@freenas[~]# zpool status pool: Data state: ONLINE scan: scrub repaired 0 in 1 days 04:56:26 with 0 errors on Mon May 4 04:56:36 2020 config: NAME STATE READ WRITE CKSUM Data ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/ba5b8d42-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/bb44aabe-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/bc29615c-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/bd218228-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/be12168e-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/bef2d192-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 gptid/bff8ace4-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c0e54728-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c1efc57f-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c2e5c00e-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c3d7d3af-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c4c47384-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 gptid/c5bb9669-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c6afba1d-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c7a915ef-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c8928c80-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/c994b8c2-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/ca9c8587-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 gptid/cba20d8a-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/cc9edb60-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/cd968c08-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/cea7f25b-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/cfa689fd-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 gptid/d0a8b450-9abc-11e9-8ddb-001e672bf05c ONLINE 0 0 0 errors: No known data errors pool: freenas-boot state: ONLINE scan: scrub repaired 0 in 0 days 00:00:34 with 0 errors on Wed May 13 03:45:35 2020 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 errors: No known data errors
During the hang I cannot check gstat as the system is completely unresponsive. Again, I'm at a loss on how to trouble shoot this further.
Thanks in advance!