Volume degraded ? until serverrestart then online again for some time ...

Status
Not open for further replies.

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
The serial number issue sounds a bit like https://bugs.freenas.org/issues/5418

Don't know if it is a separate issue from the pool being degraded, though.

Edit: am I right to assume that the pool now complaining of being degraded is not the same pool as the one you have subtracted a disk from in order to remove the SATA card?

No, my first pool is degraded, but that is expected because removed one drive to free one sata for the favour of the second pool so I can get rid of the SATA CARD. That is how i could be sure the SATA card is not the problem, it is not even inside the case.
So pool 2 is now connected to motherboard with all its drive and isn't degraded, but still/giving this critical warning ?!
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
So what do you guys think the problem could be? any more ideas :) ?
SMART data was provided earlier in the thread.
 

Sir.Robin

Guru
Joined
Apr 14, 2012
Messages
554
Is it always the same drives? Or is it random drives?
If so, it could be a power issue?


Sent from my mobile using Tapatalk
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
The information i get from the GUI doesn't specify a harddrive that is causing problem. I dont know if logs show more details? never checked freeNAS logs before.
But I know for sure it is not a powerissue. I calculated very carefully before choosing a PSU. Even with respect to power up spins the 750W be quit delivers much more than needed.
Ofcourse there is the possibility that the PSU is having some kind of issue, I had it for 3 years. But if ever that would be the case then my other pool should be having issues too. So far the problem is isolated to the second pool !
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
I am not an expert. But "zpool status <second pool name>" might show some useful information about what the error is. And I am fairly sure you can't totally discount the remaining error having been introduced when using the SATA card. If we know what the error is, and whether any new errors develop, this may help to narrow it down.
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
Actually I check poolstatus all time, not saying much.
35a0gv8.jpg


About the reamaining error, I didnt get you.
But as it looks now, I am not reintroducing the SATA card if ever until I know for sure what the problem is.
Just to narrow possible issues one by one until I figured out the faulty link :cool:
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
Is it always the same disk that shows up with the errors? If not, it is unlikely to be the disk at fault, but it is important to find out if you still get new errors on different disks after removing the SATA card. If it is always the same disk then repeating smartctl -a on it might be revealing. If it is different disks each time it does begin to look like some other sort of hardware error.

Edit: my remark about 'remaining error' would only apply if this checksum error remained the same on the same disk, without any new errors occurring, but smartctl showed no explanation for it.

Edit2: in case I am not being clear, the immediate cause of your warning message is the 103 checksum errors on one disk. It will be useful to note the ID of the disk and reboot and see if the same disk produces the same error in zpool status. There is probably a better way than rebooting, but that is what I would do!
 
Last edited:

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
if you mean the CKSUM value, 102 in the pic above, then I can keep an eye on that after each warning I get in the GUI.
Even if i dont know what drive is the corresponding for its value, gptid/3e9fee0b-bf38-11e4-bc37-002590f5b804 ? where can i see the corresponding drive for that long name ?

Agree about isolating the fault to find if it is a specific drive or something else.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You can use my script (see the link in my sig) or directly the glabel status command ;)
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
It seems quite likely that the checksum problem is different from your original drive disappearing problem. The latter could still have been due to the SATA card!

There is a way to clear the checksum error, unless or until the drive develops further faults (but it seems to have passed the smartctl assessment if it was one you gave the result for earlier). I am a bit vague on how to clear it, but someone will tell you!
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
Can I ask a question? Should you do a scrub after an error like this to maintain maximum redundancy?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Errors, once found by ZFS, are supposed to be corrected automatically. But, if the error condition continues to reoccur, then the correction may not have corrected the issue. A scrub is definitely a good idea if it doesn't make things too inconvenient for your workload.
If you keep getting errors, keep clearing them and doing a scrub and the errors just come back, then you have a problem that definitely needs to be identified and corrected.
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
You can use my script (see the link in my sig) or directly the glabel status command ;)

Very usefull scripts you have !
Havnt gone throug all. But is the any possibility of getting harddrive temperature reports by emails at customized times and intervalls ?
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
Errors, once found by ZFS, are supposed to be corrected automatically. But, if the error condition continues to reoccur, then the correction may not have corrected the issue. A scrub is definitely a good idea if it doesn't make things too inconvenient for your workload.
If you keep getting errors, keep clearing them and doing a scrub and the errors just come back, then you have a problem that definitely needs to be identified and corrected.

So how do I proceed to find out/fix what is wrong? I am scrubbing often now :)
Because what happens now is that every few days one of the drives is showing as faulty. When I see that get paniced and usually restart the server and suddenly the drive is back online and getting resilverd.
This doenst feel good specially because i store some sensitive data there. Question is what can I do?
how to identify the problem ? :rolleyes:
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You're welcome ;)

Yes, you can see how the others scripts that send emails are made to customize the temperature script. The times and intervals are simply a question of setting up a CRON task accordingly to what you want ;)
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
Here comes an update on the issue of this thread.
Since i removed the SATA card the second pool is working fine! So I think we can say the cause was the SATA Card.
Now I am having issues with my first pool. Some drives goes faulty from time to time. usually restarting the server gets it online and it starts to resilver. I dont know how to go on and identify what is wrong ?!

Regarding the SATA card I will buy a new one that is supposed to work fine with freeNAS.
According to threads here the IBM ServeRAID M1015 should work without any issues after beeing flashed. Can someone confirm?
If that is the case I will purchase either IBM ServeRAID M1015 46M0831 or the IBM ServeRAID M1015 90Y4556. Didnt find any information about the difference between those, only that the later is cheaper. Maybe someone else knows ?
I guess I will be limited to 8 additional SATA ports with this card ?
Or is there any alternative card with more ports ? open for suggestions :)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If you want more ports, you can
  1. Get more HBAs (M1015s, LSI SAS 9207 or SAS 9211)
  2. Get HBAs with more ports (LSI SAS 9201)
  3. Get an SAS expander
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
Here comes an update on the issue of this thread.
Since i removed the SATA card the second pool is working fine! So I think we can say the cause was the SATA Card.
Now I am having issues with my first pool. Some drives goes faulty from time to time. usually restarting the server gets it online and it starts to resilver. I dont know how to go on and identify what is wrong ?!

You need to find out what is causing this. Is it always the same drive? It does sound like an actual hardware fault, as your current setup should be completely compatible with FreeNAS. As with your previous problem, finding out whether it is always the same drive is important to narrow it down.
 

arameen

Contributor
Joined
Sep 4, 2014
Messages
145
If you want more ports, you can
  1. Get more HBAs (M1015s, LSI SAS 9207 or SAS 9211)
  2. Get HBAs with more ports (LSI SAS 9201)
  3. Get an SAS expander
you dont happen to know the difference between the IBM ServeRAID M1015 46M0831 and the IBM ServeRAID M1015 90Y4556 ?
 
Status
Not open for further replies.
Top