The volume tank(ZFS) state is UNAVAIL

Bostjan · Apr 28, 2015

I was copying some files from FreeNAS to my computer when suddenly the transfer stopped. I was the only one using FreeNAS.

I didn’t do any upgrades, I didn’t replace any disks. The scrub of pool happened 15 days ago, scrub of pool 'freenas-boot' occurred one day ago.

When the transfer stopped I’ve got emails

------Critical Alerts---------
The volume fn (ZFS) state is UNAVAIL: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
---------------

------Critical Alerts---------
The volume fn (ZFS) state is UNAVAIL: One or more devices are faulted in response to IO failures.
---------------

I have FreeNAS 9.3.

I cannot log in through browser - I get an error ERR_EMPTY_RESPONSE

I cannot access FreeNAS through samba - I tried it on windows and on android (where it usually worked).

I tried SSH. It asks me for username and password and then it stops. No command works. There is nothing written in front of the cursor.

Ping works.

I don’t know what to do anymore. What the error means which I’ve got to my email?
How can I connect to the FreeNAS?

TinTIn · Apr 28, 2015

Can you connect a screen to the box and see what it's doing?

Sent from my iPhone using Tapatalk

cyberjock · Apr 28, 2015

Yeah, your main pool went offline. When it did your .system dataset went bye-bye and with it all your services (WebGUI, SSH, etc.) basically went crazy because their working space disappeared. You'll need to see if you can get access locally and see what the status of the server is, then do a reboot of the server and see what the status is.

Let's just hope no data is lost, but a zpool status that is UNAVAIL is not an encouraging sign.

TinTIn · Apr 28, 2015

What's the configuration of the pool, i.e. raid level?

Sent from my iPhone using Tapatalk

Bostjan · Apr 28, 2015

I’ve connected the screen to the box. The screen showed this:
vm_fault: pager read error, pid 87337 (nginx)
swap_pager: I/O error - pagein failed; blkno 2622307, size 8192, error 6

These two lines were repeating through the whole screen.
This one (swap_pager: I/O error - pagein failed; blkno 2622307, size 8192, error 6) was the same the whole time.
This one (vm_fault: pager read error, pid 87337 (nginx)), the number incremented - by one number, sometimes by 300.

Bostjan · Apr 28, 2015

in pool there are 6 disks in mirror

Bostjan · Apr 28, 2015

I've managed to reboot FreeNAS. So far everything works ok.

I ran #zpool status -v
Among other it stated errors:
errors: Permanent errors have been detected in the following files:
/var/db/system/update/MANIFEST
/mnt/fn/jails/plexmediaserver_1/var/db/plexdata/Plex Media Server/Plug-ins/WebClient.bundle/Contents/Resources/js/plex.js

SweetAndLow · Apr 29, 2015

Check your disk health and replace the disk that is having problems. Your system shouldn't just go offline like that

TinTIn · Apr 30, 2015

Im not sure if this comment is true or related but worth a read.

"The default setting in FreeNAS is to create a 2GB swap partition on every drive which acts like striped swap space (I am not making this up, this is the default setting). So if any one of the drives fails it can take FreeNAS down."

https://b3n.org/freenas-vs-omnios-napp-it/

Sent from my iPhone using Tapatalk

Ericloewe · Apr 30, 2015

Since swap is supposed to be empty, it shouldn't take down a system. Of course, lack of RAM could lead to exactly that scenario.

Any swap usage in FreeNAS means something needs urgent attention, as it's only there as a last resort. It mostly just a convenient way of sidestepping issues regarding drives that are just slightly (a few blocks) smaller than the ones they're supposed to replace.

Rabuyik · Jun 26, 2017

I know this post is quite old, but I encountered this exact same problem while doing replication task and I've never really found any other post about it.

I can reproduce it easily each time I try to replicate my main pool into a secondary backup pool (both locally). Everything is exactly as the OP describe it excepted it's the backup pool that end up UNAVAIL. The whole system (web gui, ssh, etc.) stop working and I need to force shutdown the computer as the local console is not working too. After the reboot I don't have any data loss on the main pool.

Ericloewe and SweetAndLow: I referred to this problem in this thread: https://forums.freenas.org/index.ph...ync-of-smb-shared-datasets.55427/#post-387052 (I did decide to give it another try with replication in hope of giving up my rsync-based backup scheme).

I'm starting to think I might lack RAM / Swap for all my pools or something else HW related. I tried to replicate on several different HDs and they all end up producing this problem.

- Main pool is 4x4TB WD Red in Z2 (du is 600GB) and backup pool is 1x3TB Samsung (but I also tried with 1X2TB WD Green and 1x1TB Old WD).
- 16GB of ECC RAM, Asrock c2550d4i (using the Intel SATA controller only).
- OS is stored on a Samsung EVO SSD.
- I never had any other problem with this box so far.

Any ideas ?

Thanks again!

SweetAndLow · Jun 26, 2017

Rabuyik said:
I know this post is quite old, but I encountered this exact same problem while doing replication task and I've never really found any other post about it.

I can reproduce it easily each time I try to replicate my main pool into a secondary backup pool (both locally). Everything is exactly as the OP describe it excepted it's the backup pool that end up UNAVAIL. The whole system (web gui, ssh, etc.) stop working and I need to force shutdown the computer as the local console is not working too. After the reboot I don't have any data loss on the main pool.

Ericloewe and SweetAndLow: I referred to this problem in this thread: https://forums.freenas.org/index.ph...ync-of-smb-shared-datasets.55427/#post-387052 (I did decide to give it another try with replication in hope of giving up my rsync-based backup scheme).

I'm starting to think I might lack RAM / Swap for all my pools or something else HW related. I tried to replicate on several different HDs and they all end up producing this problem.

- Main pool is 4x4TB WD Red in Z2 (du is 600GB) and backup pool is 1x3TB Samsung (but I also tried with 1X2TB WD Green and 1x1TB Old WD).
- 16GB of ECC RAM, Asrock c2550d4i (using the Intel SATA controller only).
- OS is stored on a Samsung EVO SSD.
- I never had any other problem with this box so far.

Any ideas ?

Thanks again!

Maybe open your own thread for your issue. You need to test local write performance. Then test network performance and see if just using either of these 2 things cause the problem to happen. Normally I would say the marvel ports can cause issues but you say you are not using them, can you double check please?

Rabuyik · Jun 26, 2017

SweetAndLow said:
Maybe open your own thread for your issue. You need to test local write performance. Then test network performance and see if just using either of these 2 things cause the problem to happen. Normally I would say the marvel ports can cause issues but you say you are not using them, can you double check please?

The only drive using a Marvel port is the OS SSD (since I wanted to keep the Intel ones for data drives).

I tried looking for "local write performance test", but I'm not sure what you meant exactly. As far as network tests, I did copy in/out many GBs of data from this NAS maxing the NIC speed several times...

I'm looking into log files for some indication of what could happen.

I will create a different thread if nothing comes out (as you suggested).

Thanks !

Bostjan · Jul 4, 2017

I found that Marvell Controllers are usully gist of the problem.

Try to connect all disk to Intel's controllers and see if the problem goes away.

Rabuyik · Jul 6, 2017

Bostjan said:
I found that Marvell Controllers are usully gist of the problem.

Try to connect all disk to Intel's controllers and see if the problem goes away.

The only drive using the Marvell Controllers is the OS.

Another hint that the OS controller doesn't seem related is that the replication problem only occurs when I'm replicating my main z2 pool to another backup z1 pool. When doing the replication from one of my backup pool to another one, it works fine at >100MB sec.

I could try moving my OS SSD to the Intel controller and give it another shot.

Important Announcement for the TrueNAS Community.

The volume tank(ZFS) state is UNAVAIL

Bostjan

Contributor

TinTIn

Contributor

cyberjock

Inactive Account

TinTIn

Contributor

Bostjan

Contributor

Bostjan

Contributor

Bostjan

Contributor

SweetAndLow

Sweet'NASty

TinTIn

Contributor

Ericloewe

Server Wrangler

Rabuyik

Cadet

SweetAndLow

Sweet'NASty

Rabuyik

Cadet

Bostjan

Contributor

Rabuyik

Cadet

Similar threads