The volume tank(ZFS) state is UNAVAIL

Status
Not open for further replies.

Bostjan

Contributor
Joined
Mar 24, 2014
Messages
122
I was copying some files from FreeNAS to my computer when suddenly the transfer stopped. I was the only one using FreeNAS.


I didn’t do any upgrades, I didn’t replace any disks. The scrub of pool happened 15 days ago, scrub of pool 'freenas-boot' occurred one day ago.


When the transfer stopped I’ve got emails


------Critical Alerts---------
The volume fn (ZFS) state is UNAVAIL: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
---------------


------Critical Alerts---------
The volume fn (ZFS) state is UNAVAIL: One or more devices are faulted in response to IO failures.
---------------


I have FreeNAS 9.3.

I cannot log in through browser - I get an error ERR_EMPTY_RESPONSE

I cannot access FreeNAS through samba - I tried it on windows and on android (where it usually worked).

I tried SSH. It asks me for username and password and then it stops. No command works. There is nothing written in front of the cursor.

Ping works.


I don’t know what to do anymore. What the error means which I’ve got to my email?
How can I connect to the FreeNAS?
 

TinTIn

Contributor
Joined
Feb 9, 2015
Messages
142
Can you connect a screen to the box and see what it's doing?


Sent from my iPhone using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, your main pool went offline. When it did your .system dataset went bye-bye and with it all your services (WebGUI, SSH, etc.) basically went crazy because their working space disappeared. You'll need to see if you can get access locally and see what the status of the server is, then do a reboot of the server and see what the status is.

Let's just hope no data is lost, but a zpool status that is UNAVAIL is not an encouraging sign.
 

TinTIn

Contributor
Joined
Feb 9, 2015
Messages
142
What's the configuration of the pool, i.e. raid level?


Sent from my iPhone using Tapatalk
 

Bostjan

Contributor
Joined
Mar 24, 2014
Messages
122
I’ve connected the screen to the box. The screen showed this:
vm_fault: pager read error, pid 87337 (nginx)
swap_pager: I/O error - pagein failed; blkno 2622307, size 8192, error 6


These two lines were repeating through the whole screen.
This one (swap_pager: I/O error - pagein failed; blkno 2622307, size 8192, error 6) was the same the whole time.
This one (vm_fault: pager read error, pid 87337 (nginx)), the number incremented - by one number, sometimes by 300.
 

Bostjan

Contributor
Joined
Mar 24, 2014
Messages
122
I've managed to reboot FreeNAS. So far everything works ok.

I ran #zpool status -v
Among other it stated errors:
errors: Permanent errors have been detected in the following files:
/var/db/system/update/MANIFEST
/mnt/fn/jails/plexmediaserver_1/var/db/plexdata/Plex Media Server/Plug-ins/WebClient.bundle/Contents/Resources/js/plex.js
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Check your disk health and replace the disk that is having problems. Your system shouldn't just go offline like that
 

TinTIn

Contributor
Joined
Feb 9, 2015
Messages
142
Im not sure if this comment is true or related but worth a read.

"The default setting in FreeNAS is to create a 2GB swap partition on every drive which acts like striped swap space (I am not making this up, this is the default setting). So if any one of the drives fails it can take FreeNAS down."

https://b3n.org/freenas-vs-omnios-napp-it/


Sent from my iPhone using Tapatalk
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Since swap is supposed to be empty, it shouldn't take down a system. Of course, lack of RAM could lead to exactly that scenario.

Any swap usage in FreeNAS means something needs urgent attention, as it's only there as a last resort. It mostly just a convenient way of sidestepping issues regarding drives that are just slightly (a few blocks) smaller than the ones they're supposed to replace.
 

Rabuyik

Cadet
Joined
Nov 16, 2016
Messages
7
I know this post is quite old, but I encountered this exact same problem while doing replication task and I've never really found any other post about it.

I can reproduce it easily each time I try to replicate my main pool into a secondary backup pool (both locally). Everything is exactly as the OP describe it excepted it's the backup pool that end up UNAVAIL. The whole system (web gui, ssh, etc.) stop working and I need to force shutdown the computer as the local console is not working too. After the reboot I don't have any data loss on the main pool.

Ericloewe and SweetAndLow: I referred to this problem in this thread: https://forums.freenas.org/index.ph...ync-of-smb-shared-datasets.55427/#post-387052 (I did decide to give it another try with replication in hope of giving up my rsync-based backup scheme).

I'm starting to think I might lack RAM / Swap for all my pools or something else HW related. I tried to replicate on several different HDs and they all end up producing this problem.

- Main pool is 4x4TB WD Red in Z2 (du is 600GB) and backup pool is 1x3TB Samsung (but I also tried with 1X2TB WD Green and 1x1TB Old WD).
- 16GB of ECC RAM, Asrock c2550d4i (using the Intel SATA controller only).
- OS is stored on a Samsung EVO SSD.
- I never had any other problem with this box so far.

Any ideas ?

Thanks again!
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I know this post is quite old, but I encountered this exact same problem while doing replication task and I've never really found any other post about it.

I can reproduce it easily each time I try to replicate my main pool into a secondary backup pool (both locally). Everything is exactly as the OP describe it excepted it's the backup pool that end up UNAVAIL. The whole system (web gui, ssh, etc.) stop working and I need to force shutdown the computer as the local console is not working too. After the reboot I don't have any data loss on the main pool.

Ericloewe and SweetAndLow: I referred to this problem in this thread: https://forums.freenas.org/index.ph...ync-of-smb-shared-datasets.55427/#post-387052 (I did decide to give it another try with replication in hope of giving up my rsync-based backup scheme).

I'm starting to think I might lack RAM / Swap for all my pools or something else HW related. I tried to replicate on several different HDs and they all end up producing this problem.

- Main pool is 4x4TB WD Red in Z2 (du is 600GB) and backup pool is 1x3TB Samsung (but I also tried with 1X2TB WD Green and 1x1TB Old WD).
- 16GB of ECC RAM, Asrock c2550d4i (using the Intel SATA controller only).
- OS is stored on a Samsung EVO SSD.
- I never had any other problem with this box so far.

Any ideas ?

Thanks again!
Maybe open your own thread for your issue. You need to test local write performance. Then test network performance and see if just using either of these 2 things cause the problem to happen. Normally I would say the marvel ports can cause issues but you say you are not using them, can you double check please?
 

Rabuyik

Cadet
Joined
Nov 16, 2016
Messages
7
Maybe open your own thread for your issue. You need to test local write performance. Then test network performance and see if just using either of these 2 things cause the problem to happen. Normally I would say the marvel ports can cause issues but you say you are not using them, can you double check please?

The only drive using a Marvel port is the OS SSD (since I wanted to keep the Intel ones for data drives).

I tried looking for "local write performance test", but I'm not sure what you meant exactly. As far as network tests, I did copy in/out many GBs of data from this NAS maxing the NIC speed several times...

I'm looking into log files for some indication of what could happen.

I will create a different thread if nothing comes out (as you suggested).

Thanks !
 

Bostjan

Contributor
Joined
Mar 24, 2014
Messages
122
I found that Marvell Controllers are usully gist of the problem.

Try to connect all disk to Intel's controllers and see if the problem goes away.
 

Rabuyik

Cadet
Joined
Nov 16, 2016
Messages
7
I found that Marvell Controllers are usully gist of the problem.

Try to connect all disk to Intel's controllers and see if the problem goes away.

The only drive using the Marvell Controllers is the OS.

Another hint that the OS controller doesn't seem related is that the replication problem only occurs when I'm replicating my main z2 pool to another backup z1 pool. When doing the replication from one of my backup pool to another one, it works fine at >100MB sec.

I could try moving my OS SSD to the Intel controller and give it another shot.
 
Status
Not open for further replies.
Top