Post 9.2.1.8 upgrade issue

Status
Not open for further replies.

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
After upgrading to 9.2.1.8, my FreeNAS array begins exhibiting a bizarre issue shortly after rebooting. All files written to the disks (both ZFS and UFS) by CIFS, RSYNC, or even kernel messages end up showing as 8192PB in size. When attempting to do any manipulation (ls, rm, etc.) I receive an error "
Value too large to be stored in data type". Internal log rotation jobs fail, etc.

I can use dd to overwrite the files and eventually delete them, however the condition persists.

I also get many errors sent by e-mail:

newsyslog
bzip2: I/O or other error, bailing out. Possible reason follows.
bzip2: Value too large to be stored in data type
Input file = /var/log/messages.0, output file = /var/log/messages.0.bz2
newsyslog: `bzip2 -f /var/log/messages.0' terminated with a non-zero status (1)

save_rrds.sh
mv: /data/rrd_dir.tar.bz2: set owner/group (was: 0/0): Invalid argument
mv: /data/rrd_dir.tar.bz2: set mode (was: 0644): Invalid argument
mv: /data/rrd_dir.tar.bz2: set flags (was: 00000000): Invalid argument
mv: /data/rrd_dir.tar.bz2: set times: Invalid argument

I am copying data off the array as quickly as possible, in case I need to completely burn it down and start over. What could cause this? There is nothing obvious in /var/log which would jump out at me.
 
D

dlavigne

Guest
Huh, that is weird. Please create a bug report at bugs.freenas.org and post the issue number here.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
What's your hardware? Can you post a debug?
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
I'm running on an HP Proliant Microserver N40L, with 8GB RAM. I've been running this hardware for about 3 years now and have not seen this symptom before.

I've created a bug report:
Bug #6334
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The debug file.. it's a button in the system settings part of the WebGUI.
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
I don't think it worked...

Request Method:GET
Request URL:http://freenas/system/debug/
Software Version:FreeNAS-9.2.1.8-RELEASE-x64 (e625626)
Exception Type:IOError
Exception Value:
[Errno 2] No such file or directory: u'/mnt/public/.system/ixdiagnose/ixdiagnose.tgz'
Exception Location:/usr/local/www/freenasUI/../freenasUI/system/views.py in debug, line 692
Server time:Mon, 13 Oct 2014 12:15:39 -0700
Traceback

Request information
GET
No GET data

POST
No POST data

FILES
No FILES data

COOKIES
VariableValue
csrftoken'mpQfhLGAUD0Q4ip3UDEkuPqcP3T2WJFu'
sessionid'3hjjjhs7l799lnfixidmpr045miuebs3'
fntreeSaveStateCookie'root%2Croot%2F62%2Croot%2F62%2F71%2Croot%2F62%2F71%2F80%2Croot%2F58%2F64%2Croot%2F8%2Croot%2F58%2F67%2F76%2Croot%2F58%2F67%2F68%2Croot%2F58%2F67%2F72%2Croot%2F103%2Croot%2F103%2F114%2Croot%2F103%2F114%2F118%2Croot%2F103%2F114%2F130%2Croot%2F103%2F114%2F133%2Croot%2F91%2Croot%2F91%2F100%2Croot%2F1%2Croot%2F90%2Croot%2F90%2F99%2Croot%2F58%2Croot%2F58%2F65%2Croot%2F58%2F65%2F74%2Croot%2F58%2F85%2Croot%2F91%2F95%2Croot%2F106%2Croot%2F59%2Croot%2F59%2F110%2Croot%2F148%2Croot%2F148%2F187%2Croot%2F148%2F187%2F189%2Croot%2F110%2Croot%2F127%2Croot%2F127%2F131%2Croot%2F8%2F9%2Croot%2F8%2F20%2Croot%2F127%2F168%2Croot%2F127%2F168%2F170'
META
VariableValue
wsgi.multiprocessFalse
HTTP_REFERER'http://freenas/'
REDIRECT_STATUS'200'
SERVER_SOFTWARE'nginx/1.4.4'
SCRIPT_NAMEu''
REQUEST_METHOD'GET'
PATH_INFOu'/system/debug/'
SERVER_PROTOCOL'HTTP/1.1'
QUERY_STRING''
CONTENT_LENGTH''
HTTP_USER_AGENT'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.58 Safari/537.36'
HTTP_CONNECTION'keep-alive'
HTTP_COOKIE'sessionid=3hjjjhs7l799lnfixidmpr045miuebs3; fntreeSaveStateCookie=root%2Croot%2F62%2Croot%2F62%2F71%2Croot%2F62%2F71%2F80%2Croot%2F58%2F64%2Croot%2F8%2Croot%2F58%2F67%2F76%2Croot%2F58%2F67%2F68%2Croot%2F58%2F67%2F72%2Croot%2F103%2Croot%2F103%2F114%2Croot%2F103%2F114%2F118%2Croot%2F103%2F114%2F130%2Croot%2F103%2F114%2F133%2Croot%2F91%2Croot%2F91%2F100%2Croot%2F1%2Croot%2F90%2Croot%2F90%2F99%2Croot%2F58%2Croot%2F58%2F65%2Croot%2F58%2F65%2F74%2Croot%2F58%2F85%2Croot%2F91%2F95%2Croot%2F106%2Croot%2F59%2Croot%2F59%2F110%2Croot%2F148%2Croot%2F148%2F187%2Croot%2F148%2F187%2F189%2Croot%2F110%2Croot%2F127%2Croot%2F127%2F131%2Croot%2F8%2F9%2Croot%2F8%2F20%2Croot%2F127%2F168%2Croot%2F127%2F168%2F170; csrftoken=mpQfhLGAUD0Q4ip3UDEkuPqcP3T2WJFu'
SERVER_NAME'localhost'
REMOTE_PORT'46025'
wsgi.url_scheme'http'
SERVER_PORT'80'
SERVER_ADDR'192.168.1.200'
DOCUMENT_ROOT'/usr/local/etc/nginx/html'
DOCUMENT_URI'/system/debug/'
wsgi.input<flup.server.fcgi_base.InputStream object at 0x80b8d8710>
HTTP_DNT'1'
HTTP_HOST'freenas'
wsgi.multithreadTrue
REQUEST_URI'/system/debug/'
HTTP_ACCEPT'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
wsgi.version(1, 0)
GATEWAY_INTERFACE'CGI/1.1'
wsgi.run_onceFalse
wsgi.errors<flup.server.fcgi_base.TeeOutputStream object at 0x810e76dd0>
REMOTE_ADDR'192.168.1.103'
HTTP_ACCEPT_LANGUAGE'en-US,en;q=0.8'
CONTENT_TYPE''
CSRF_COOKIEu'mpQfhLGAUD0Q4ip3UDEkuPqcP3T2WJFu'
HTTP_ACCEPT_ENCODING'gzip,deflate,sdch'
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Nope. That tells me that your file system file size problems are just a symptom of something bigger. Have you done RAM test lately?
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
Looking at the filesystem directly:

[root@freenas] /mnt/public/.system/ixdiagnose/ixdiagnose/log# ls -al
ls: 3ware_raid_alarms.today: Value too large to be stored in data type
ls: 3ware_raid_alarms.yesterday: Value too large to be stored in data type
ls: auth.log: Value too large to be stored in data type
ls: dmesg.today: Value too large to be stored in data type
ls: dmesg.yesterday: Value too large to be stored in data type
ls: lpd-errs: Value too large to be stored in data type
ls: mount.yesterday: Value too large to be stored in data type
ls: pbid.log: Value too large to be stored in data type
ls: pf.today: Value too large to be stored in data type
ls: ppp.log: Value too large to be stored in data type
ls: security: Value too large to be stored in data type
ls: ups.log: Value too large to be stored in data type
ls: userlog: Value too large to be stored in data type
ls: utx.lastlogin: Value too large to be stored in data type
ls: utx.log: Value too large to be stored in data type
ls: wtmp: Value too large to be stored in data type
ls: wtmp.0: Value too large to be stored in data type
ls: wtmp.1: Value too large to be stored in data type
ls: wtmp.2: Value too large to be stored in data type
ls: xferlog: Value too large to be stored in data type
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
Not recently, but I don't have a reason to believe that it would all of the sudden go bad after upgrading to 9.2.1.8

What's odd is that reading/writing over NFS is unaffected. Once I've backed the data I want to save off the unit, I can try a memory test, but I don't expect to find an issue.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
There is the chance it's just a coincidence since you upgraded, but I'd do a RAM test just to be sure everything is okay.

*Something* is very wrong with your setup, and I don't think it's limited to your pool.
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
I was able to do a warm reboot and get a debug. See attached.

Also, since I just rebooted, the system is acting normally, for a short while. In some undetermined time, the weirdness will start. Is 9.2.1.9 close? Maybe I could try it?
 

Attachments

  • debug-freenas-20141013123809.tgz
    201.8 KB · Views: 278

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
What tunables and sysctls do you have set? I see at least one custom, so I'm sure there's more. :P
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
There is the chance it's just a coincidence since you upgraded, but I'd do a RAM test just to be sure everything is okay.

*Something* is very wrong with your setup, and I don't think it's limited to your pool.
I agree that there is *something* **very** wrong.

As a hail Mary, I could pull the USB boot media and start completely over and import my zpools and a config. However, I don't have any sort of indication that it would be any better than what I have now.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Yeah, the corruption is likely already on the pool, so starting over with a fresh USB stick isn't likely to fix the problem.

I'm still leaning heavily towards some kind of hardware problem, and RAM seems like the easiest to blame as well as rule out.
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
upload_2014-10-13_12-51-38.png


upload_2014-10-13_12-51-55.png
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
Yeah, the corruption is likely already on the pool, so starting over with a fresh USB stick isn't likely to fix the problem.

I'm still leaning heavily towards some kind of hardware problem, and RAM seems like the easiest to blame as well as rule out.
The baffling thing is that it happens on the UFS-formatted boot volume too, so I don't think it's necessarily a ZFS problem.

[root@freenas] ~# mount
/dev/ufs/FreeNASs1a on / (ufs, local, read-only)
devfs on /dev (devfs, local, multilabel)
/dev/md0 on /etc (ufs, local)
/dev/md1 on /mnt (ufs, local)
/dev/md2 on /var (ufs, local)
/dev/ufs/FreeNASs4 on /data (ufs, local, noatime, soft-updates)

I see weirdness in /var/log and other places. Wouldn't that rule out a ZFS bug, if it's happening on UFS filesystems as well?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
The baffling thing is that it happens on the UFS-formatted boot volume too, so I don't think it's necessarily a ZFS problem.

[root@freenas] ~# mount
/dev/ufs/FreeNASs1a on / (ufs, local, read-only)
devfs on /dev (devfs, local, multilabel)
/dev/md0 on /etc (ufs, local)
/dev/md1 on /mnt (ufs, local)
/dev/md2 on /var (ufs, local)
/dev/ufs/FreeNASs4 on /data (ufs, local, noatime, soft-updates)

I see weirdness in /var/log and other places. Wouldn't that rule out a ZFS bug, if it's happening on UFS filesystems as well?

Yes, but that doesn't rule out a RAM problem. Contrary to popular belief, when RAM goes bad all sorts of weird and f*cked up stuff starts happening. It's like a family member with Alzheimer's. She'll talk about back when she was in the civil war and you're like 'What!?'. Then she's tell you that you aren't family and you're just baffled at what comes out of her mouth.

Bad RAM fubars pretty much everything. Doesn't have to be ZFS. Every time I've had bad RAM on a desktop it has required a reload (or restore from backup) of Windows because not only did the operating system files themselves get corrupted, but the file system was trashed too.

Ok, a couple of unrelated tidbits:

1. Your *can* put 16GB of RAM in your box. So if you have wanted more RAM, that is an option.
2. An ARC that is only 1GB is terrifyingly small. The vfs.zfs.arc_max and the kmem_size both got my attention and clued me in that you had custom sysctls/tunables. BUT... I don't think that's your problem either. It could be, but I doubt it. But with 8GB of RAM, your ARC is going to be small regardless.
3. How long has this system been setup?
 

Scott Mercer

Dabbler
Joined
Oct 12, 2014
Messages
12
Yes, but that doesn't rule out a RAM problem. Contrary to popular belief, when RAM goes bad all sorts of weird and f*cked up stuff starts happening. It's like a family member with Alzheimer's. She'll talk about back when she was in the civil war and you're like 'What!?'. Then she's tell you that you aren't family and you're just baffled at what comes out of her mouth.

Bad RAM fubars pretty much everything. Doesn't have to be ZFS. Every time I've had bad RAM on a desktop it has required a reload (or restore from backup) of Windows because not only did the operating system files themselves get corrupted, but the file system was trashed too.

Ok, a couple of unrelated tidbits:

1. Your *can* put 16GB of RAM in your box. So if you have wanted more RAM, that is an option.
2. An ARC that is only 1GB is terrifyingly small. The vfs.zfs.arc_max and the kmem_size both got my attention and clued me in that you had custom sysctls/tunables. BUT... I don't think that's your problem either. It could be, but I doubt it. But with 8GB of RAM, your ARC is going to be small regardless.
3. How long has this system been setup?

Yes, I will probably consider more RAM at some point. However, the current config (when working properly) seems pretty performant, so I don't have a driving need right now. In terms of the ARC, I don't recall setting that setting. I do recall tweaking the kmem_size some time ago, when I upgraded from 2GB to 8GB RAM.

This system has been set up and running since the early days of FreeNAS 8, it looks like I have a couple of saved configs going back to January 2011. The oldest release I have archived is 8.0RC2. I'm pretty sure I was running this particular config before that even.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Hmm....

Well, I'd do a RAM test first since that's the easiest to do (as well as to rule out this particular problem).

I recommend memtestx86+ for 3 passes. Should take 2-6 hours depending on the speed of the machine.
 
Status
Not open for further replies.
Top