Source NAS
Destination NAS
Hi all, second ever post here - so thank you for hearing me out.
I have built the above two monsters for one of our datacenter product lines, and they are working great. The destination NAS triggers a snapshot and replication task on the Source and copies over into our second datacenter making full use of our 20Gbps pipe. It is all pure backup data, nothing else. No VM's no Apps running. Just pure raw backup storage.
Just today I have had to transfer some data to the Source NAS over NFS around 80-90TB, which starts at 7.8GB/s and sustains for a 4-5 hours before dropping to next to nothing for 1-2 hours before then picking back up for another 2 hours @ 7.8GB's - rinse and repeat.
If I restart the transfer, then I get peak speeds again for a few hours and I get the same if I stop the NFS service and restart it. (on the source NAS)
Network wise, there is no other traffic going through the TOR switching, and if I do test transfers from other machines DC to DC I get 7-8GB/s all while the NAS's are stalled. So happy it isn't a network infrastructure issue.
I have searched around the forums, and can see posts about NFS performance from back in the freeNAS days, where one of the troubleshooting steps, the chap found that if he logs into CLi and runs any command what so ever, performance picks back up. I just tried it - and slap me down with a wet trout! after running an ls command, transfer speed pops back up to 7.8-8Gb/s.
Thread here: https://www.truenas.com/community/threads/nfs-dies-under-load.14346/
I don't think this thread ever concluded what the issue was, and I guess this wont be too big of an issue once I get the 80TB across, as we wont be using NFS often in this way. Just wondering if anyone else has come across this, or has any ideas on what may be the cause / fix?
All the best.
Mox
- TrueNAS-13.0-U5.3
- Host: Fujitsu PRIMERGY RX25030 M5
- CPU: Dual Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz
- RAM: 192GB
- Boot Disk: M.2-SSD
- Pool Disk: 60x Western Digital Ultrastar 16TB SAS
- JBOD Shelf: Western Digital 4U60 G2
- SAS Card: 2x Broadcom MegaRAID 9480-8i8e SAS
- Network: 2x X710-DA2 2x10Gb SFP
Destination NAS
- TrueNAS-13.0-U5.3
- Host: Fujitsu PRIMERGY RX25030 M5
- CPU: Dual Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz
- RAM: 192GB
- Boot Disk: M.2-SSD
- Pool Disk: 90x Western Digital Ultrastar 16TB SAS
- JBOD Shelf: Western Digital 4U102 G2
- SAS Card: 2x Broadcom MegaRAID 9480-8i8e SAS
- Network: 2x X710-DA2 2x10Gb SFP
Hi all, second ever post here - so thank you for hearing me out.
I have built the above two monsters for one of our datacenter product lines, and they are working great. The destination NAS triggers a snapshot and replication task on the Source and copies over into our second datacenter making full use of our 20Gbps pipe. It is all pure backup data, nothing else. No VM's no Apps running. Just pure raw backup storage.
Just today I have had to transfer some data to the Source NAS over NFS around 80-90TB, which starts at 7.8GB/s and sustains for a 4-5 hours before dropping to next to nothing for 1-2 hours before then picking back up for another 2 hours @ 7.8GB's - rinse and repeat.
If I restart the transfer, then I get peak speeds again for a few hours and I get the same if I stop the NFS service and restart it. (on the source NAS)
Network wise, there is no other traffic going through the TOR switching, and if I do test transfers from other machines DC to DC I get 7-8GB/s all while the NAS's are stalled. So happy it isn't a network infrastructure issue.
I have searched around the forums, and can see posts about NFS performance from back in the freeNAS days, where one of the troubleshooting steps, the chap found that if he logs into CLi and runs any command what so ever, performance picks back up. I just tried it - and slap me down with a wet trout! after running an ls command, transfer speed pops back up to 7.8-8Gb/s.
Thread here: https://www.truenas.com/community/threads/nfs-dies-under-load.14346/
I don't think this thread ever concluded what the issue was, and I guess this wont be too big of an issue once I get the 80TB across, as we wont be using NFS often in this way. Just wondering if anyone else has come across this, or has any ideas on what may be the cause / fix?
All the best.
Mox