Volumes offline after shutdown + reports problems

rmont

Dabbler
Joined
Jun 18, 2020
Messages
42
I started having some issues with my Scale (22.02-RC2) server. All operations on starting/editing/stopping apps became extremely slow.

I shut down and after restart 2 volumes remained offline. I also received these errors:
- Reporting database used size 2.51 GiB is larger than 1.01 GiB.
- Core files for the following executables were found: /usr/bin/python3.9 (Fri Dec 31 10:45:48 2021).
- NTP health check failed - No NTP peers: [{'80.211.137.82': 'REJECT'}, {'212.237.31.130': 'REJECT'}, {'85.199.214.99': 'REJECT'}, {'95.110.248.206': 'REJECT'}]

After another reboot (no shutdown), the volumes came back online and ntp (ntpq -p) did not show any issue.

But all reports are not working:

1640949839905.png



I shut down again and same problem happened (NTP error, volumes offline).
After a restart without shutdown, I get the same situation with volumes online and reports not working.

I don't know how to debug this...
 
Last edited:

crkinard

Explorer
Joined
Oct 24, 2019
Messages
80
I have been getting the "NTP health check failed - No NTP peers" errors too since updating to RC2.
 

rmont

Dabbler
Joined
Jun 18, 2020
Messages
42
I'm doing some more investigation on the issue.

The system crashed today, so I tried to shutdown/restart the server multiple times.
It always hangs for 15 minutes at the stage:

"A start job is running for import ZFS pools (XXs / 15 min 11s)"

After the wait is over, the system booted:
- first time with 2 volumes offline
- second time with 1 volume offline. After a while, the offline disk came back online
- third time all disks online (but the boot process still got stuck for the full 15 minutes)
- fourth time - same as second time

"zpool status" now shows everything is ok after all disks come online

The console it shows a bunch of errors on the volume that's coming online only after boot.

1641139103364.png


When I shut down the system also showed error on the same volume that had more difficulty coming online: "failed unmounting /mnt/DISKNAME"
Every time I reboot, I also get the NTP error.

(Edit with additional info)
I wiped the disk that was creating more issues, removed the volume and disconnected the cable.
The boot still hangs 15 minutes.

I reconnected the disk and did a quick test. The test hangs at 99% forever.
1641207083481.png




Thanks for helping
 
Last edited:

leeroy

Dabbler
Joined
Dec 23, 2017
Messages
29
I'm running into this as well with RC2. Has there been a issue opened up on this I can contribute to?

This is currently a test system that is going into production in feb, so it is in a lab env on a slower switch/network, but this hasn't ever created any issues for anything else. The drive not reporting is concerning, but it hasn't been an issue since?

Oddly had one disk disconnect on a test pool. I disconnected reconnected. Started up fine, but then I have this alert:
Code:
NTP health check failed - No NTP peers: [{'162.159.200.123': 'REJECT'}, {'205.206.70.40': 'REJECT'}, {'158.69.254.196': 'REJECT'}]
2022-01-27 10:44:51 (America/Los_Angeles)

Also see the network error getting chart data error:

tempsnip.png
 

nickspacemonkey

Dabbler
Joined
Jan 13, 2022
Messages
22
I'm getting the same issue on RC2.

NTP health check failed - No NTP peers: [{'81.21.65.168': 'REJECT'}, {'77.68.29.174': 'REJECT'}, {'51.89.151.183': 'REJECT'}][/HEADING] 2022-02-10 01:25:53 (Etc/UTC)

No crashes or anything I just wake up to this alert every few days.​

 
Top