FYI, in the ticket with ixsystems it was suggested that one or more pods is overloading the system and I should try to narrow down the problem process/pod.
Since replicating this seems to take on average 2 weeks, I am not able to justify disabling an app and then waiting two weeks without that application.
For now I have set the following script to run every 5m and hopefully next time this happens I will capture something useful.
Since replicating this seems to take on average 2 weeks, I am not able to justify disabling an app and then waiting two weeks without that application.
For now I have set the following script to run every 5m and hopefully next time this happens I will capture something useful.
Code:
#!/bin/bash
# Define the directory where the stats files will be stored
output_directory="/root/dockerstats_output/"
# Define the file name format with date
output_file="${output_directory}$(date +"%Y%m%d").txt"
# Get current timestamp
timestamp=$(date +"%Y-%m-%d %H:%M:%S")
# Get top processes with headers
top_processes=$(top -b -n 1 | head -n 22)
# Run docker stats command and filter top 10 containers by memory usage
docker stats --format "table {{.ID}}\t{{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" --no-stream | {
echo "Timestamp: $timestamp"
echo -e "\nDocker Stats:"
echo -e "CONTAINER ID\tNAME\tCPU %\tMEM USAGE (LIMIT)"
tail -n +2 | sort -k4 -h -r | head -n 10
echo -e "\nTop Processes:"
echo "$top_processes"
} >> "$output_file"