Email error problem.

tmacka88 · Sep 21, 2011

Hi, since I have setup email notifications with smart extras '-m root', I receive two daily emails one a security run output and one daily run output. In the daily run I am getting this

"status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected."

This is the whole email:

Removing stale files from /var/preserve:

Cleaning out old system announcements:

Backup passwd and group files:
no /var/backups/master.passwd.bak
no /var/backups/group.bak

Verifying group file syntax:
/etc/group is fine

Disk status:
Filesystem Size Used Avail Capacity Mounted on
/dev/ufs/FreeNASs1a 927M 429M 424M 50% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/md0 4.3M 3.5M 420K 90% /etc
/dev/md1 732K 16K 660K 2% /mnt
/dev/md2 75M 20M 48M 30% /var
/dev/ufs/FreeNASs4 20M 667K 18M 4% /data
Volume1 3.6T 2.4T 1.2T 66% /mnt/Volume1

Last dump(s) done (Dump '>' file systems):

Checking status of zfs pools:
pool: Volume1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
Volume1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gptid/29204abe-d29d-11e0-b9cd-f46d04de02ec ONLINE 0 0 2
gptid/2a07f729-d29d-11e0-b9cd-f46d04de02ec ONLINE 0 0 0
gptid/2ad9f58b-d29d-11e0-b9cd-f46d04de02ec ONLINE 0 0 0

errors: No known data errors

Checking status of ATA raid partitions:

Checking status of gmirror(8) devices:

Checking status of graid3(8) devices:

Checking status of gstripe(8) devices:

Network interface status:
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
alc0* 1500 <Link#1> f4:6d:04:de:02:ec 0 0 0 0 0 0
sk0* 1500 <Link#2> 00:1b:11:0f:f1:73 0 0 0 0 0 0
em0 1500 <Link#3> 00:04:23:a8:ba:c8 27802 0 0 2169 0 0
em1 1500 <Link#4> 00:04:23:a8:ba:c8 1208712 3 0 1446663 0 0
lo0 16384 <Link#5> 1375 0 0 1375 0 0
lo0 16384 fe80:5::1 fe80:5::1 0 - - 0 - -
lo0 16384 localhost ::1 0 - - 0 - -
lo0 16384 your-net localhost 1356 - - 1375 - -
lagg0 1500 <Link#6> 00:04:23:a8:ba:c8 1236850 0 0 1448832 0 0
lagg0 1500 192.168.2.0 192.168.2.19 1191281 - - 1447625 - -

Security check:
(output mailed separately)

Checking for denied zone transfers (AXFR and IXFR):

Scrubbing of zfs pools:
skipping scrubbing of pool 'Volume1':
last scrubbing is 22 days ago, threshold is set to 30 days

Checking status of 3ware RAID controllers:
Alarms (most recent first):
+++ /var/log/3ware_raid_alarms.today 2011-09-22 03:01:24.000000000 +0930
@@ -0,0 +1 @@
+

-- End of daily output --

Now yesterday when I got it I though it was cause I tried testing the email function by pulling out a data cable to see if the email was sent (it didn't) so I restarted the system hoping it would get rid of the problem. Seems i did not.

In the console this is what is being displayed:

len :0, unexpected EOF
read : Connection reset by peer
error reading message header : Connection reset by peer

Some of these repeating.

Any help would be great..

Thanks

ProtoSD · Sep 21, 2011

What do you see when you do a 'zpool status -v' from the command line?
It could be from pulling the cable, but?
Depending on what you see from the zpool status command above, you might want to run a smartctl command on the disk it complains about to see what error the drive might possibly have.
Tell us what the status command says first.

joeschmuck · Sep 21, 2011

Have you tried zpool clear yet? I don't know if that will fix it but maybe the error you created by pulling that SATA cable is what is causing this message.

joeschmuck · Sep 21, 2011

protosd said:
What do you see when you do a 'zpool status -v' from the command line?
It could be from pulling the cable, but?
Depending on what you see from the zpool status command above, you might want to run a smartctl command on the disk it complains about to see what error the drive might possibly have.
Tell us what the status command says first.

Aren't you on vacation still? I'll PM you in the next day or two.

ProtoSD · Sep 21, 2011

I was going to suggest the clear & a scrub after seeing what the status was and looking with smartctl to see what kind of errors the drive might be reporting.

My vacation has been prematurely terminated :-( but I'm not back home yet....

tmacka88 · Sep 22, 2011

This is what I get she I typed in 'spool status -v':

pool: Volume1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
Volume1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gptid/29204abe-d29d-11e0-b9cd-f46d04de02ec ONLINE 0 0 2
gptid/2a07f729-d29d-11e0-b9cd-f46d04de02ec ONLINE 0 0 0
gptid/2ad9f58b-d29d-11e0-b9cd-f46d04de02ec ONLINE 0 0 0

errors: No known data errors

ProtoSD · Sep 22, 2011

So it's the drive on the first line with the '2' at the end:

gptid/29204abe-d29d-11e0-b9cd-f46d04de02ec ONLINE 0 0 2

It was probably just pulling the cable. I would just do the 'zpool clear' like @joeschmuck said and do a 'zpool scrub Volume1' after that.

Important Announcement for the TrueNAS Community.

Email error problem.

tmacka88

Patron

ProtoSD

MVP

joeschmuck

Old Man

joeschmuck

Old Man

ProtoSD

MVP

tmacka88

Patron

ProtoSD

MVP

Similar threads

Important Announcement for the TrueNAS Community.

Email error problem.

tmacka88

Patron

ProtoSD

MVP

joeschmuck

Old Man

joeschmuck

Old Man

ProtoSD

MVP

tmacka88

Patron

ProtoSD

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Email error problem."

Similar threads