Reslivering to an UNAVAIL disk? What's going on

Status
Not open for further replies.

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Disks won't exceed 45 / 50c at the very hottest summer day now.
If the temps are routinely exceeding 40C, the temp issue isn't solved.
Surely this means I have infact corrupted some data, correct?
Only if data is on one or more of those 16 sectors. 'zpool status' should report if there are data errors.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Well, something bad is occurring. I'm hearing disk thrash coming out of my little server cupboard..... I fear this is the end, surely?
ETA has gone up a huge huge amount suddenly, performance dropped heavily. No errors listed (yet)


[root@freenas] ~# zpool status
pool: ARRAY
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Mar 22 11:51:10 2016
11.0G scanned out of 24.0T at 57.7M/s, 120h59m to go
1.81G resilvered, 0.04% done
config:

NAME STATE READ WRITE CKSUM
ARRAY DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/7a3a09f4-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/7aaa3c1a-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/2a07e64b-9fc5-11e4-8140-28924a2d5aca ONLINE 0 0 0
gptid/7b8d77f7-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/c366811f-efc5-11e5-aa4f-28924a2d5aca ONLINE 0 0 0 (resilvering)
2815441331035575542 UNAVAIL 0 0 0 was /dev/gptid/9f120a1c-5531-11e5-95cd-28924a2d5aca

errors: No known data errors


EDIT:
How can it go from 1.81G (above) to 19.3M (below) ? Also the time of resliver has apparently changed.
This is ... very bad, surely?
[root@freenas] ~# zpool status
pool: ARRAY
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Mar 22 11:55:14 2016
125M scanned out of 24.0T at 2.41M/s, (scan is slow, no estimated time)
19.3M resilvered, 0.00% done
config:

NAME STATE READ WRITE CKSUM
ARRAY DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/7a3a09f4-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/7aaa3c1a-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/2a07e64b-9fc5-11e4-8140-28924a2d5aca ONLINE 0 0 0
gptid/7b8d77f7-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/c366811f-efc5-11e5-aa4f-28924a2d5aca ONLINE 0 0 0 (resilvering)
2815441331035575542 UNAVAIL 0 0 0 was /dev/gptid/9f120a1c-5531-11e5-95cd-28924a2d5aca

errors: No known data errors



[root@freenas] ~# zpool status
pool: ARRAY
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Mar 22 11:59:21 2016
3.46G scanned out of 24.0T at 42.7M/s, 163h32m to go
586M resilvered, 0.01% done

EDIT: and again, resliver has reset itself again, is this common?


Another edit (very sorry): Here it is .... as expected?
[root@freenas] ~# zpool status
pool: ARRAY
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Mar 22 12:01:32 2016
257M scanned out of 24.0T at 5.84M/s, (scan is slow, no estimated time)
28.0M resilvered, 0.00% done
config:

NAME STATE READ WRITE CKSUM
ARRAY DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/7a3a09f4-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/7aaa3c1a-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/2a07e64b-9fc5-11e4-8140-28924a2d5aca ONLINE 0 0 0
gptid/7b8d77f7-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/c366811f-efc5-11e5-aa4f-28924a2d5aca ONLINE 0 3.60K 0 (resilvering)
2815441331035575542 UNAVAIL 0 0 0 was /dev/gptid/9f120a1c-5531-11e5-95cd-28924a2d5aca
 
Last edited:

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
It's a HP Microserver N54L with 8GB of Non ECC. It's an AMD Turion CPU, 2.2ghz.
I specifically terminated all my jails and all my disk processes that talk to the server for this activity, so that it wouldn't strain it :/
NOTE: this machine has been running fine for over 18 months, only issues was dying disks every 4 or 5 months, primarily due to me stupidily cooking them

I'm unsure how to proceed now. Maybe I should swap some SATA cables. It seems super unlikely ANOTHER disk is faulty? Can I even shut the damn thing off in the process it's currently in (seems bad news to me)
The resliver IS just reading from disks 1,2,3,4 and writing to 5, right? So if I cancel it, presumably data on 1,2,3,4 is still ok?
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
It kicked the disk out, claims no data errors, hurmmm.

[root@freenas] ~# zpool status
pool: ARRAY
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.

action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub in progress since Tue Mar 22 12:05:47 2016
6.00G scanned out of 24.0T at 66.0M/s, 105h43m to go
0 repaired, 0.02% done
config:

NAME STATE READ WRITE CKSUM
ARRAY DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/7a3a09f4-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/7aaa3c1a-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/2a07e64b-9fc5-11e4-8140-28924a2d5aca ONLINE 0 0 0
gptid/7b8d77f7-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/c366811f-efc5-11e5-aa4f-28924a2d5aca FAULTED 3 43.7K 0 too many errors
2815441331035575542 UNAVAIL 0 0 0 was /dev/gptid/9f120a1c-5531-11e5-95cd-28924a2d5aca

errors: No known data errors
Can I cancel the scrub?
(I was lucky enough to pick up 2 spare disks......)
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
I have decided not to cancel the scrub, I'm going to let it run (it's slowly speeding up)
In the meantime, I've got the second and final replacement disk, in a regular desktop, I'm going to run a h2testw write/read test on it, then I'll do a SMART short and long test, through multiple programs too. I want to be exceedingly sure the disk was flawless before going in my server, I'm already on thin ice with my disk supplier and returning faulty drives. I'm putting through a LOT with them at the moment.

If the disk comes back, a-ok, without ever going in the server. Then I'm putting it in the server, see what happens (When I do I'll clean for dust and replace several cables too)

As it stands, I will be on 4 disks (with 1 starting to get bad sectors) for at least the next 48 to 72 hours. Oh joy.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
It's a HP Microserver N54L with 8GB of Non ECC. It's an AMD Turion CPU, 2.2ghz.
http://h20195.www2.hp.com/v2/GetDoc...kspecs&doclang=EN_US&searchquery=&cc=us&lc=en

Is this your system?
upload_2016-3-21_22-5-32.png


Power Supply is 150W
Internal Storage = SATA 8.0TB (4 x 2TB) Maximum (If this is correct; then how did you cram 6 drives in there?) :)
It takes ECC; but you got Non-ECC to work?

Poor little server is all kinda of abused... Kinda feel sorry for it... :(
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
That's the one, it's a very very common home server. I did infact purchase 16GB of ECC at one point from a US seller and the guy sent the wrong stuff, very frustrating situation.
None the less, she works - also, I can confirm it uses about 80 to 85w at the wall.
As for the 2 extra hard drives, very common for this model - there's 6 sata ports, it's just the only 4 of them are in the regular drive cage.

Going to give it a thorough dust clean and cable change when this scrub is done. By then, hopefully have some idea on the actual reliabiltiy of these disks too.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Have you run a memory test?

Re-silver restarting may indicate a corrupted pool.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
A scrub is a waste of time in a pool with no redundancy. Better just start resilvering with your new disk.

But number one, start thinking about what could be wrong! If the same port is failing repeatedly, don't use it! Try the new drive on the other failed port/drive. Or determine the system is failed and move the pool to better hardware.

The 16 pending sectors are not 8.9MB. But apparently, 8.9MB is as far as it got with the prior resilver before it dropped out. I guess in that instance it didn't stop the resilver (because of two resilvers at same time?)

The 16 pending sectors may amount to nothing. If they amount to anything, it will probably be next-to-nothing. That you completed a 20+ hour resilver without error is very good.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
A scrub is a waste of time in a pool with no redundancy.
I disagree; at least it will indicate if your data is good (and if not, where the faults are).
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
In the meantime, I've got the second and final replacement disk, in a regular desktop, I'm going to run a h2testw write/read test on it, then I'll do a SMART short and long test, through multiple programs too.
I think this has already been asked here--did you do burn-in testing on the new disks before installing them? And after removing the apparently-failed disks, were you able to read them on another computer?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Unfortunately the OP can't easily replace the PSU.

At this point, if the OP wants to stay with FreeNAS, I'd buy a new server that's built to properly handle the quantity of drives he wants to use.

Cooling has been an issue for awhile. IIRC, at one point his drives had gotten up to 59 C. He needs to address this issue.


Sent from my iPhone using Tapatalk
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
I disagree; at least it will indicate if your data is good (and if not, where the faults are).
I have at this point, only lost a single file which was part of a queue in sabnzbd - hugely fortunate.

The machine is due to finish its scrub in about 2 hours. I'm then going to shut it down, unplug all hard disks, clean out dust, replace 3 cables. Then fire it back up and run at LEAST a 3 or 4 pass memtest on her.
While that happens, I'm still running a very thorough SMART test on the second and final replacement disk (which hasn't been exposed to my 'evil' server) at this point.
I have encountered something which may be the cause of the replacement disks dying but it's very complicated (as usual for me) and I'd rather not delve into it at this point, incase I'm wrong. It'll detract from the thread as is.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Unfortunately the OP can't easily replace the PSU.

At this point, if the OP wants to stay with FreeNAS, I'd buy a new server that's built to properly handle the quantity of drives he wants to use.

Cooling has been an issue for awhile. IIRC, at one point his drives had gotten up to 59 C. He needs to address this issue.


Sent from my iPhone using Tapatalk

I can actually replace PSU, there's a standard part# for it, sub $150 shipped IIRC. However testing to be sure the PSU is the problem is an issue.
Cooling has been an issue and while many of you will give me a hard time for it (perhaps rightfully so) in my opinion, with the external fan on it, 24/7 as it has been, disks have been in the "that's a bit warm but ok" range, rather than critical (59 is awful)
I don't feel that heat is the problem that's going on right now, something else is at hand.

P.S anyone know a bootable testing software, besides memtest which will thrash a machine (HDD's excluded) - perhaps CPU / chipset?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
That doesn't seem to be a bootable testing software unfortunately.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Status
Not open for further replies.
Top