Unexpected behaviour, did I understand it right?

Pitfrr · Jan 6, 2017

Hello,

Something unexpected happened on my test setup (AsRockC2750 - 2x8GB (KVR16E11/8I) ECC - 6x2TB RaidZ2) and I'd be glad to have your opinion on it to know if I understood it correcly.

Not sure if this is anyhow relevant but I updated my test server from 9.2.1.8 to 9.10.2 last week. I did a fresh install .

So here is what happened:

I logged into the GUI of the test server and after entering the credentials, I got a screen with "Forbidden (403) CSRF verification failed. Request aborted." When I tried with a wrong password then I had the correct behaviour (incorrect login message).

20170105 - PetitNaz - Login issue 01.jpg

I tried with an other browser and also from an other computer as well: same issue.

Then I logged in through Putty but could not get past the password (i.e. I entered the password and then nothing happened).

So I connected to the IPMI and started the remote control console.

I had the expected FreeNAS menu where I selected the shell.

I checked the status of the volume with " zpool status" and saw that 3 drives (out of 6) where unavailable.

I assumed it was a connection problem (no specific SMART warnings on those drives), I wanted to check in the BIOS and the cables afterwards, so I decided to restart the system.

I don't know why but I used the command " reboot" (instead of going back to the FreeNAS menu and select reboot but it should do the same), so I typed in " reboot" and enter and I didn't get any feedback, no reaction.

After a while I tried in the IPMI GUI "Remote control/Server Power Control/Power off server- Orderly shutdown" which was more effective: some processes were ended, some other couldn't so after a while again the system was still up.

Then... well it was late already so I did a "Power off server - Immediate"! :-O

And the next day I powered up the server and... everything worked fine. The disks were all available, the volume was online, a scrub ran without errors.
scan: resilvered 64.1M in 0h27m with 0 errors on Fri Jan 6 18:10:20 2017
Almost like nothing happened.

Code:

# zpool status

  pool: Nasse
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
		attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
		using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 64.1M in 0h27m with 0 errors on Fri Jan  6 18:10:20 2017
config:

		NAME												STATE	 READ WRITE CKSUM
		Nasse											   ONLINE	   0	 0	 0
		  raidz2-0										  ONLINE	   0	 0	 0
			gptid/79317b32-4be6-11e4-807c-002590d5437f.eli  ONLINE	   0	 0	 3
			gptid/7a001215-4be6-11e4-807c-002590d5437f.eli  ONLINE	   0	 0	 0
			gptid/7a80b426-4be6-11e4-807c-002590d5437f.eli  ONLINE	   0	 0	 0
			gptid/811506ff-a4ab-11e4-b618-002590d5437f.eli  ONLINE	   0	 0	 0
			gptid/7c1e4f48-4be6-11e4-807c-002590d5437f.eli  ONLINE	   0	 0	 0
			gptid/7cde438e-4be6-11e4-807c-002590d5437f.eli  ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: none requested
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

And here is what I understand:

Well I'm not so worried about the 3 disks becoming unavailable... at least not for now. ;-)

But where I'm a bit more concerned is that the system didn't respond when I tried to log in. Even with 3 disks (or more) unavailable, I should have been able to connect to the GUI or log in with Putty?

I'm thinking: if the system stores data in a dataset on the volume and the volume becomes unavailable (which is what happened) then the system might experience problems. But.. I couldn't find a .system dataset in the volume. On the production server I have a .system dataset but not on the test server.
After searching a while (with df -h), I found a .system dataset but I'm a bit confused and unsure: does that mean that the .system dataset is mounted as /var/db/system and therefore does not show up in ll /mnt/Nas?

Code:

# df -Th

Filesystem											  Type	   Size	Used   Avail Capacity  Mounted on
freenas-boot/ROOT/default							   zfs		7.1G	639M	6.4G	 9%	/
devfs												   devfs	  1.0K	1.0K	  0B   100%	/dev
tmpfs												   tmpfs	   32M	8.5M	 23M	27%	/etc
tmpfs												   tmpfs	  4.0M	8.0K	4.0M	 0%	/mnt
tmpfs												   tmpfs	  5.3G	106M	5.2G	 2%	/var
freenas-boot/grub									   zfs		6.5G	6.5M	6.4G	 0%	/boot/grub
fdescfs												 fdescfs	1.0K	1.0K	  0B   100%	/dev/fd
Nas													 zfs		1.0T	384K	1.0T	 0%	/mnt/Nas
Nas/Document											zfs		2.3T	1.3T	1.0T	55%	/mnt/Nas/Document
Nas/.system											 zfs		1.0T	400K	1.0T	 0%	/var/db/system
Nas/.system/cores									   zfs		1.0T	1.2M	1.0T	 0%	/var/db/system/cores
Nas/.system/samba4									  zfs		1.0T	1.0M	1.0T	 0%	/var/db/system/samba4
Nas/.system/syslog-eab18b758b91471d95803a91d80bfcda	 zfs		1.0T	1.0M	1.0T	 0%	/var/db/system/syslog-   eab18b758b91471d95803a91d80bfcda
Nas/.system/rrd-eab18b758b91471d95803a91d80bfcda		zfs		1.0T	272K	1.0T	 0%	/var/db/system/rrd-eab   18b758b91471d95803a91d80bfcda
Nas/.system/configs-eab18b758b91471d95803a91d80bfcda	zfs		1.0T	1.0M	1.0T	 0%	/var/db/system/configs   -eab18b758b91471d95803a91d80bfcda

Would that explain the behaviour I observed or did I miss something?

Bonus question: in such cases when something unexpected occurs, I'd check the logs. But I'm not very familiar with FreeNAS's logs. They are located in /var but I don't know were to start to look.
Any advise/pointers on how I could learn more on how to handle logs?

Thank you for your help.

m0nkey_ · Jan 6, 2017

You could be suffering from a known issue with that board. Try applying this workaround and see how you get on: https://forums.freenas.org/index.ph...2x50-d4i-boards-by-disabling-the-watchdog.16/

Pitfrr · Jan 7, 2017

Sorry that I didn't mention it. Since my system is powering up just fine, I didn't think it was related to that bug.
But I already have applied the watchdog workaround. So I hope I'm fine on that side... let's see!

Pitfrr · Jan 7, 2017

I reproduced the issue (i.e. disconnected 3 drives) to see if I would get the same behaviour and I did.
So it confirms (in practice at least) the understanding I had.

Important Announcement for the TrueNAS Community.

Unexpected behaviour, did I understand it right?

Pitfrr

Wizard

m0nkey_

MVP

Pitfrr

Wizard

Pitfrr

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

Unexpected behaviour, did I understand it right?

Pitfrr

Wizard

m0nkey_

MVP

Pitfrr

Wizard

Pitfrr

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Unexpected behaviour, did I understand it right?"

Similar threads