Reslivering to an UNAVAIL disk? What's going on

Status
Not open for further replies.

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
ZFS doesn't have a database of every disk ever made that it consults to figure out the size of the drive, it just inquires what size the drive is. If drives lied about their size, everything would go to hell ... which some people have experienced with counterfiet USB sticks.
I agree with the logic entirely - but I did get h2testw to spit out this error on my replacement Toshiba hard disk with the new firmware...

Media has filled up earlier than expected!
In the beginning there were 4769030 MByte free, but only
4769023 MByte could be written.
Warning: Only 4769030 of 4769305 MByte tested.
Writing speed: 158 MByte/s
H2testw v1.4

Yes, I ran h2testw on a 5TB hard disk - and I'm gonna quote that line again. Media has filled up earlier than expected!
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Does WD define 5TB as exactly the same amount of space as Toshiba?
The thing to look for is the LBA count on the label on the drive. If it's the same or larger, it'll work. It's been quite a while since I saw different LBA counts for drives of nominally equal size.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
No, the software works fine for flash drives and my other Toshiba drive which has had bad sectors already marked.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Well, there are two possibilities: the drive is lying about its size, or there's a bug in the software. The former would break an awful lot of stuff, so I think the latter is more likely.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Well, we'll know soon, 3h:21m to go, soon the nightmare will be over. (I still feel I then need to do a scrub, due to disk 0)
Then in about 5 or 6 days when I get my replacement disks, re-add disk 6 and then same again (resliv, scrub)
then finally when that's done, replace disk #0

In theory in about 10 or 12 days, all 6 disks should be 100% ......
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
(I still feel I then need to do a scrub, due to disk 0)
Unnecessary--the resilver necessarily reads everything in the vdev, so if there are any data errors on disk 0, they'll be found.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Oh lovely, thanks - well that will save some time, it's certainly not a short process - 25 hours as it stands.
I'll update the thread when it's all over with I guess - currently, 1 file lost (un-important) .... phew.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
The drive makers set standards for how many LBA in a given capacity, so equal sizes are interchangeable between makes. This was not always the case.

Because of that, I think ZFS does allow for a variation in size for a replace, on the order of megabytes.

I also think the software is buggy on your HD test.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Well finally a disk has worked, the replacement server was worth it. It's up and running.
So now to very very sheepishly (sigh) replace 2 more disks and then re-add the 6'th and when that's finally done, replace the 1'st (0)
I'm still not safe, I have "4.5" disks right now, but it's better than the 3.5 I had previously.

Still super blown away at disks getting perm SMART bad sectors from a bloody power supply, BLOWN AWAY
Entirely destroyed disk? Sure
Read surface error, ok fine
Actual SMART level, diagnosed bad blocks? what?!
(It could've been the motherboard too)

Just to clarify, I did some googling - but to my knowledge NOTHING will do a 100% total scan of the disk and un-mark bad sectors if they prove right, correct?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Still super blown away at disks getting perm SMART bad sectors from a bloody power supply, BLOWN AWAY
Entirely destroyed disk? Sure
Read surface error, ok fine
Actual SMART level, diagnosed bad blocks? what?!

Well, bad power can lead to bad writes that can lead to bad reads when you try to get the data and bad read = sector added to the pending sector list, then if it can read it by retrying then the sector is healthy again, if it doesn't (more likely option here, because the write was bad in the first place) then it's reallocated. That's how you get reallocated sectors because of a crappy PSU. There's probably others means to get there too because of the PSU but this one is the one I can think of the more likely to happen :)

Bad power can also lead to the motor controller failure, CRC errors, ...

Just to clarify, I did some googling - but to my knowledge NOTHING will do a 100% total scan of the disk and un-mark bad sectors if they prove right, correct?

Yep, unless you're the manufacturer once a sector is remapped it's for life.
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Well finally a disk has worked, the replacement server was worth it. It's up and running.
So now to very very sheepishly (sigh) replace 2 more disks and then re-add the 6'th and when that's finally done, replace the 1'st (0)
I'm still not safe, I have "4.5" disks right now, but it's better than the 3.5 I had previously.

Still super blown away at disks getting perm SMART bad sectors from a bloody power supply, BLOWN AWAY
Entirely destroyed disk? Sure
Read surface error, ok fine
Actual SMART level, diagnosed bad blocks? what?!
(It could've been the motherboard too)
If your disks were running hot due to inadequate cooling, it is entirely possible that other components were running hot as well. Power supplies can run hot, even with decent cooling, and motherboard components can be damaged by excessive heat without failing entirely. A couple of years ago, I had a power supply that tested OK with testing equipment, but had intermittent failures when installed. It caused all sorts of untraceable headaches and was only discovered via trial and error after replacing the cpu and motherboard failed to fix the problem.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Interesting. I'm having trouble figuring out the real meaning of that document.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Never ending nightmares with this thing........
Any idea why this hapenned?
[root@freenas] ~# zpool status
pool: ARRAY
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Mar 29 17:32:59 2016
466G scanned out of 24.2T at 169M/s, 41h4m to go
155G resilvered, 1.88% done
config:

NAME STATE READ WRITE CKSUM
ARRAY ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/7a3a09f4-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/7aaa3c1a-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/2a07e64b-9fc5-11e4-8140-28924a2d5aca ONLINE 0 0 0
gptid/7b8d77f7-10cb-11e4-bc4d-441ea13cac6b ONLINE 0 0 0
gptid/765fc016-f1a5-11e5-b50b-9cb6540b10fe ONLINE 0 0 0 (resilvering)
gptid/02a66f4c-f578-11e5-9fd0-9cb6540b10fe ONLINE 0 0 0 (resilvering)

errors: No known data errors


?
I had a working system before shutdown, 5 disks (out of 6) installed and working.
Added disk #6 and reslivered, so now it feels compelled to do 5 as well..? worrying, normal?
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Seriously, does anyone know why 2 disks would resliver, when replacing 1? It feels worrying to me.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Is there some kind of advanced log I can check as to what or why it initiated it to 2 disks? (Thus far, it seems to be working fine - 7 hours to go) but still an extra disk is thrashing now. Makes 0 sense to me.
I couldn't have accidentally done this through clicking something could I?
 
Status
Not open for further replies.
Top