- Joined
- Jul 3, 2015
- Messages
- 926
Hi All,
This is just a bit of knowledge sharing so hope somebody finds this helpful.
For a long time now all my systems have been using LZ4 compression during replication as after testing (a long time ago) this seemed to be a good idea. I must add that all my datasets are LZ4 by default.
However recently I was troubleshooting an issue with one of my replica boxes whereby it was randomly rebooting. It appears IPMI watchdog was hard reseting it during the replication window. I watched it during replication the other night and noticed LZ4c was using a lot of CPU between 90-100% so assumed that this was freezing the system and causing watchdog to reset it. Anyway I started looking at all my other systems and both on the send and receive systems they also had very high CPU usage thanks to LZ4c but I figured its been like this for a while so its not an issue. However I've never been able to replicate quickly even though all systems have 10Gb network connections and am often stuck at 100Mbps up to about 500Mbps.
I decided to re-investigate compression during replication and essentially disabled it and now all my systems are pushing 2Gbps. Suddenly ssh is consuming about 80-90% CPU (which it was about 5% before) but all seems happy. Interestingly LZ4c has now vanished from CPU usage stats during replication.
Anyway take from it what you will but it might be worth just revisiting your replication setup if you feel you should be able to get the sort of speeds I am.
All the best.
This is just a bit of knowledge sharing so hope somebody finds this helpful.
For a long time now all my systems have been using LZ4 compression during replication as after testing (a long time ago) this seemed to be a good idea. I must add that all my datasets are LZ4 by default.
However recently I was troubleshooting an issue with one of my replica boxes whereby it was randomly rebooting. It appears IPMI watchdog was hard reseting it during the replication window. I watched it during replication the other night and noticed LZ4c was using a lot of CPU between 90-100% so assumed that this was freezing the system and causing watchdog to reset it. Anyway I started looking at all my other systems and both on the send and receive systems they also had very high CPU usage thanks to LZ4c but I figured its been like this for a while so its not an issue. However I've never been able to replicate quickly even though all systems have 10Gb network connections and am often stuck at 100Mbps up to about 500Mbps.
I decided to re-investigate compression during replication and essentially disabled it and now all my systems are pushing 2Gbps. Suddenly ssh is consuming about 80-90% CPU (which it was about 5% before) but all seems happy. Interestingly LZ4c has now vanished from CPU usage stats during replication.
Anyway take from it what you will but it might be worth just revisiting your replication setup if you feel you should be able to get the sort of speeds I am.
All the best.
Last edited: