FYI, in the ticket with ixsystems it was suggested that one or more pods is overloading the system and I should try to narrow down the problem process/pod.
Since replicating this seems to take on average 2 weeks, I am not able to justify disabling an app and then waiting two weeks without that application.
For now I have set the following script to run every 5m and hopefully next time this happens I will capture something useful.
Since replicating this seems to take on average 2 weeks, I am not able to justify disabling an app and then waiting two weeks without that application.
For now I have set the following script to run every 5m and hopefully next time this happens I will capture something useful.
Code:
#!/bin/bash # Define the directory where the stats files will be stored output_directory="/root/dockerstats_output/" # Define the file name format with date output_file="${output_directory}$(date +"%Y%m%d").txt" # Get current timestamp timestamp=$(date +"%Y-%m-%d %H:%M:%S") # Get top processes with headers top_processes=$(top -b -n 1 | head -n 22) # Run docker stats command and filter top 10 containers by memory usage docker stats --format "table {{.ID}}\t{{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" --no-stream | { echo "Timestamp: $timestamp" echo -e "\nDocker Stats:" echo -e "CONTAINER ID\tNAME\tCPU %\tMEM USAGE (LIMIT)" tail -n +2 | sort -k4 -h -r | head -n 10 echo -e "\nTop Processes:" echo "$top_processes" } >> "$output_file"