Last night I got an email at 12:00am sharp from my FreeNAS server(bigbigrack). Here is the body of the email:
Subject: Cron <root@bigbigrack> /bin/sh /root/save_rrds.sh
bzip2/libbzip2: internal error number 1007.
This is a bug in bzip2/libbzip2, 1.0.5, 10-Dec-2007.
Please report it to me at: jseward@bzip.org. If this happened
when you were using some program which uses libbzip2 as a
component, you should also report this bug to the author(s)
of that program. Please make an effort to report this bug;
timely and accurate bug reports eventually lead to higher
quality software. Thanks. Julian Seward, 10 December 2007.
*** A special note about internal error number 1007 ***
Experience suggests that a common cause of i.e. 1007
is unreliable memory or other hardware. The 1007 assertion
just happens to cross-check the results of huge numbers of
memory reads/writes, and so acts (unintendedly) as a stress
test of your memory system.
I suggest the following: try compressing the file again,
possibly monitoring progress in detail with the -vv flag.
* If the error cannot be reproduced, and/or happens at different
points in compression, you may have a flaky memory system.
Try a memory-test program. I have used Memtest86
(www.memtest86.com). At the time of writing it is free (GPLd).
Memtest86 tests memory much more thorougly than your BIOSs
power-on test, and may find failures that the BIOS doesn't.
* If the error can be repeatably reproduced, this is a bug in
bzip2, and I would very much like to hear about it. Please
let me know, and, ideally, save a copy of the file causing the
problem -- without which I will be unable to investigate it.
The server has been working without issues since April(aside from random powerdowns we never explained). Any idea what this means? Could this really mean we have a RAM issue? I tried googling save_rrds.sh and I couldn't get any clues as to what was being zipped with bzip2.
When we first built the system we did RAM tests and we had bizaar results. We ran memtestx86 on the system for several days(I was gone for the weekend so I let it run) and it would run fine for 18+ cycles, but randomly we'd get tons of errors. By the end of the weekend we had 10k+ errors after 37 passes. We then determined that by adding a Kingston RAM cooler to blow air over the RAM sticks the error went away. Should we revisit the idea that RAM may be bad?
Thanks!
Subject: Cron <root@bigbigrack> /bin/sh /root/save_rrds.sh
bzip2/libbzip2: internal error number 1007.
This is a bug in bzip2/libbzip2, 1.0.5, 10-Dec-2007.
Please report it to me at: jseward@bzip.org. If this happened
when you were using some program which uses libbzip2 as a
component, you should also report this bug to the author(s)
of that program. Please make an effort to report this bug;
timely and accurate bug reports eventually lead to higher
quality software. Thanks. Julian Seward, 10 December 2007.
*** A special note about internal error number 1007 ***
Experience suggests that a common cause of i.e. 1007
is unreliable memory or other hardware. The 1007 assertion
just happens to cross-check the results of huge numbers of
memory reads/writes, and so acts (unintendedly) as a stress
test of your memory system.
I suggest the following: try compressing the file again,
possibly monitoring progress in detail with the -vv flag.
* If the error cannot be reproduced, and/or happens at different
points in compression, you may have a flaky memory system.
Try a memory-test program. I have used Memtest86
(www.memtest86.com). At the time of writing it is free (GPLd).
Memtest86 tests memory much more thorougly than your BIOSs
power-on test, and may find failures that the BIOS doesn't.
* If the error can be repeatably reproduced, this is a bug in
bzip2, and I would very much like to hear about it. Please
let me know, and, ideally, save a copy of the file causing the
problem -- without which I will be unable to investigate it.
The server has been working without issues since April(aside from random powerdowns we never explained). Any idea what this means? Could this really mean we have a RAM issue? I tried googling save_rrds.sh and I couldn't get any clues as to what was being zipped with bzip2.
When we first built the system we did RAM tests and we had bizaar results. We ran memtestx86 on the system for several days(I was gone for the weekend so I let it run) and it would run fine for 18+ cycles, but randomly we'd get tons of errors. By the end of the weekend we had 10k+ errors after 37 passes. We then determined that by adding a Kingston RAM cooler to blow air over the RAM sticks the error went away. Should we revisit the idea that RAM may be bad?
Thanks!