Pools going offline and then coming back almost immediately

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
In my sig there is a NAS called ScaleNAS. I keep getting messages such as:
New alerts: [LIST] [*]Pool GoodHDD state is OFFLINE: None [*]Pool SingleSSD state is OFFLINE: None [/LIST] Current alerts: [LIST] [*]Pool GoodHDD state is OFFLINE: None [*]Pool SingleSSD state is OFFLINE: None [/LIST]

Followed almost immediately by
These alerts have been cleared: [LIST] [*]Pool GoodHDD state is OFFLINE: None [*]Pool SingleSSD state is OFFLINE: None [/LIST]

I have checked a variety of logs (middleware, kern, daemon, messages,debug, syslog) but can't see much of any relavence. There are errors, but none seem relevent
This is in a 4U case with a proper fanwall so there should be proper airflow.
The server does almost nothing, is just a testbox so not heavily loaded

Anyone got any ideas for where to look?

I have a mix of SATA (motherboard) and SAS via an HBA and expander and this is effecting both randomly. I don't have any pools that combine the motherboard ports with the HBA ports
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I would have expected messages to contain some references to CAM activity or something like that.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
So would I - but nothing that I can see
 

Ppriorfl

Dabbler
Joined
May 22, 2021
Messages
46
I had this same message this am: one of my pools was alerted as state is offline and then message right after that the alert has been cleared.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Glad (I think) that I am not the only person. But I have no idea what to do about it - or even if its real
 

papa7775

Cadet
Joined
Jun 12, 2017
Messages
1
I'm getting the exact same thing I believe it started after the last update we had no issues before I'm trying to figure it out because nothing is going down it just sending the alert via email.
 

adam7288

Cadet
Joined
Jun 18, 2022
Messages
7
Glad this thread exists. Same thing happening to me. Here are my middlware logs at the same time it happened:

[2022/06/18 04:32:34] (WARNING) middlewared._loop_monitor_thread():1625 - Task seems blocked:
File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1129, in nf
res = await f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 431, in is_upgraded
return await self.is_upgraded_by_name((await self.get_instance(oid))['name'])
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool.py", line 448, in is_upgraded_by_name
proc = await Popen(
File "/usr/lib/python3.9/asyncio/subprocess.py", line 216, in create_subprocess_shell
transport, protocol = await loop.subprocess_shell(
... + 10 lines below ...
[2022/06/18 04:33:59] (DEBUG) google_auth_httplib2.__call__():115 - Making request: POST https://oauth2.googleapis.com/token
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
This thread may exist - but its not going anywhere.
I have decomissioned the server in question where I was getting the errors.
It needs someone to post a Jira bug report and a debug so that IX can see if there is an underlying issue or its just a fake error
 

adam7288

Cadet
Joined
Jun 18, 2022
Messages
7
I apologize - but is there a guide on how to make a proper Jira bug report and debug?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
not that I am aware of. Just click on Report a Bug, login/signup to JIRA of create an issue
 
Top