'Disk removed by administrator' 2 minutes later alert is cleared

bmf614z

Dabbler
Joined
Aug 23, 2019
Messages
10
Hello,

I noticed that since I installed the latest update stable (12.0) I have been getting emails like this:

ew alerts:
* Pool Pool state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
Disk ATA ST4000NM0033-9ZM Z1Z6EDSM is REMOVED

Current alerts:
* Pool Pool state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
Disk ATA ST4000NM0033-9ZM Z1Z6EDSM is REMOVED

then 2 minutes later it says this:

The following alert has been cleared:

* Pool Pool state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
Disk ATA ST4000NM0033-9ZM Z1Z6EDSM is REMOVED

It concerns me because it says it was removed by the administrator, and not that it failed or whatever. Is this a security thing where someone is offlining my disks remotely?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
That's the same disk each time... time to run smartctl -a /dev/daX on it and/or check dmesg to see what CAM messages are telling you.

I suspect something is on it's way to death.

It concerns me because it says it was removed by the administrator, and not that it failed or whatever. Is this a security thing where someone is offlining my disks remotely?
Not at all... the system itself (middleware) is doing that as root, so seems like the "administrator" did it.
 

Pabs

Explorer
Joined
Jan 18, 2017
Messages
52
I have been experiencing the same issue, were you able to solve it? if so, how?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
How are your disks connected?

What hardware are you using?
 

Pabs

Explorer
Joined
Jan 18, 2017
Messages
52
How are your disks connected?

What hardware are you using?
Connected Via LSI 9211-8i 6G SAS HBA FW:P20 IT Mode

WD 16TB Enterprise Drives, SuperMicro MOBO, would you need specifics for those?

Thx
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
would you need specifics for those?
Forum rules, etc. but not really.
What would be interesting to know is airflow and temperature... that sounds like an overheating HBA to me.
 

Pabs

Explorer
Joined
Jan 18, 2017
Messages
52
Forum rules, etc. but not really.
What would be interesting to know is airflow and temperature... that sounds like an overheating HBA to me.
But wouldn't the alerts within TrueNAS keep a record of this issue?
Thx
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
But wouldn't the alerts within TrueNAS keep a record of this issue?
Alerts isn't a record keeping system.

You may find in /var/log/messages (or one of the bzipped archives of it) that you have some clues there.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Yesterday, I experienced the same symptom (for hardware details see signature). The corresponding part of /var/log/messages shows the following:

Code:
Sep 19 21:46:54 nas3 isci: 1663616814:544693 ISCI isci: bus=0 target=3 lun=0 cdb[0]=359fc1f0 terminated
Sep 19 21:46:54 nas3 (da3:isci0:0:3:0): READ(16). CDB: 88 00 00 00 00 01 19 0d 8f 10 00 00 00 10 00 00
Sep 19 21:46:54 nas3 (da3:isci0:0:3:0): CAM status: CCB request terminated by the host
Sep 19 21:46:54 nas3 (da3:isci0:0:3:0): Retrying command, 3 more tries remain
Sep 19 21:46:54 nas3 (da3:isci0:0:3:0): Invalidating pack
Sep 19 21:46:54 nas3 GEOM_ELI: g_eli_read_done() failed (error=6) gptid/???.eli[READ(offset=2412079030272, length=8192)]
Sep 19 21:46:54 nas3 da3 at isci0 bus 0 scbus0 target 3 lun 0
Sep 19 21:46:54 nas3 da3: <ATA ST16000NM001G-2K SN03>  s/n ??? detached
Sep 19 21:46:54 nas3 GEOM_ELI: g_eli_read_done() failed (error=6) gptid/???.eli[READ(offset=270336, length=8192)]
Sep 19 21:46:54 nas3 GEOM_ELI: g_eli_read_done() failed (error=6) gptid/???.eli[READ(offset=15998752399360, length=8192)]
Sep 19 21:46:54 nas3 GEOM_ELI: g_eli_read_done() failed (error=6) gptid/0b195f5a-1ad0-11eb-b5e9-0cc47a052e3c.eli[READ(offset=15998752661504, length=8192)]
Sep 19 21:46:54 nas3 GEOM_MIRROR: Device swap0: provider da3p1 disconnected.
Sep 19 21:46:55 nas3 GEOM_ELI: Device gptid/???.eli destroyed.
Sep 19 21:46:55 nas3 GEOM_ELI: Detached gptid/???.eli on last close.
Sep 19 21:46:55 nas3 (da3:isci0:0:3:0): Periph destroyed
Sep 19 21:49:58 nas3 da3 at isci0 bus 0 scbus0 target 3 lun 0
Sep 19 21:49:58 nas3 da3: <ATA ST16000NM001G-2K SN03> Fixed Direct Access SPC-3 SCSI device
Sep 19 21:49:58 nas3 da3: Serial Number ????
Sep 19 21:49:58 nas3 da3: 300.000MB/s transfers
Sep 19 21:49:58 nas3 da3: Command Queueing enabled
Sep 19 21:49:58 nas3 da3: 15259648MB (31251759104 512 byte sectors)
Sep 19 21:49:58 nas3 GEOM_MIRROR: Device mirror/swap2 launched (3/3).
Sep 19 21:49:59 nas3 GEOM_ELI: Device mirror/swap2.eli created.
Sep 19 21:49:59 nas3 GEOM_ELI: Encryption: AES-XTS 128
Sep 19 21:49:59 nas3 GEOM_ELI:     Crypto: hardware

I interpreted this is as a "glitch" in the communication between motherboard (where the drive is connected) and the drive and after a reboot this morning, things appear(!) to be ok again.

SMART reports no errors. I have initiated an extra scrub and await the results. Will report back.
 
Top