Help for investigation on system freezes when unlocking volume

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
Hello,

I have an annoying behavior on my backup system and I'd like to have some help to try to investigate it.

System:
C2750 motherboard with 32GB of RAM
7x 2TB drives in RAIDz1 with encryption
M1015 HBA card
FreeNAS 9.10.2-U6

Issue:
I boot up the system and I get to the FreeNAS interface.
The volume is locked but so far, everything is working fine.
I can stay in that state (locked volume) several hours (I've been running long SMART tests on the drives for example).

But, as soon as I unlock the volume, the system freezes (drives get decrypted and that's it): web interface or terminal are not responsive, ping does not answer.
I have access to the IPMI, but power cycle (or reset or whatever) is not working. Even the power switch (long press for more than 4 sec) is not reacting. I have to plug the system out to cut the power.

Question:
I do suspect the disks but SMART (long) tests are completed without errors.
Could it be because I use some WD blue (lacking the TLER feature)? (but not sure if the lack of TLER feature results in such behavior)

I'd be happy to have your input on how could I investigate further...
I checked the logs but couldn't find any entry related with that timestamp. Granted, I only have limited knowledge of the system log files (which one to look and so...). There is no message on the console either.

My next step:
I want to try to recreate a new volume with other drives I have laying around to see if I can reproduce the behavior.
I can also delete the actual volume and recreate a new one with the actual, that's not an issue (since it's my backup).
But first I'd like to try to understand where this is coming from.

Thanks for any feedback on how to dig into this issue.
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
Some updates:

I unplugged the disks and added two new ones to create an encrypted mirrored volume but I couldn't reproduce the behavior.
The volume can be unlocked without any freezing problem.
My conclusion to that point is that it seems to confirm the cause is coming from the (other) drives (either a problem at drive level or some sort of incompatibility with a drive and the system for whatever reason).

The only way (I can come up with) of finding out, is a rather tedious one: since the volume is in RAIDz1, I'll see to unplug one drive to see if the problem still persists... only 6 drives to go! :smile:
(let's hope this issue is not caused by two drives!! :tongue:)
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
Continuing with some updates:

I created a volume (mirror) with two WD blue and I could reproduce the behavior.
The system was stable for few hours, so I started to do some replication from my main server to the backup one. And after 2 hours.... the system became unresponsive. No message in the console.
In that scenario, I didn't had to unlock the volume because I just created one, created an encryption key and started with the replication (so I didn't want to lock or restart the system, I thought I'd do that later on... but didn't last that long...).

So it seems that I get a confirmation about my suspicions on the WD blue drives.
But I'd still be interested in finding out from a system's perspective: why does it freezes and can I see that in some log?
And what bugs me is that the ipmi power actions doesn't work!
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
Now I'm back to the beginning...
I created a volume with 6 disks (without the WD blue) and shortly after I started to experience the same behavior again!
As soon as I unlock the volume, the system freezes.
Don't know where to look. If you have a suggestion I'd be happy to hear.
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,523
By the way, the encryption doesn't seem to have any effect. It just delays the occurrence.
I just created a volume without encryption and when the wizard finishes the system froze...
 
Top