Replacing dead drive

Status
Not open for further replies.

Iandoug

Dabbler
Joined
Mar 5, 2015
Messages
20
Hi

I did do a search but could not find anything, if this has already been dealt with please point me in the right direction ;-)

HP Proliant server, 4 x 2TB drives, in Raid5 configuration AFAIK. Freenas 9.2.1.5

The box started beeping 11 times on startup, and I eventually got around to investigating. Turns out one of the drives was dead ... BIOS detects it as SMART capable but DEAD.

It does not show up in the Freenas GUI under View Disks. So I can't do things like 'take it offline'.

I have replaced it with a new drive (also 2TB) which does show up under View Disks, but how do I tell Freenas/ZFS to start using it? I will admit that the process I followed was based on the video demo of ZFS where the presenter smashed a disk with a hammer, unplugged it, plugged in a new one, and ZFS sorted itself out ... I did not think that there was some software-based approach that I needed to follow.

I went to the Volume manager which offers me the option of adding the drive but under Volume Layout if offers Stripe, Log, Cache or Spare which does not seem correct ... also there are buttons for "Extend volume" and "Add extra device" and I'm not sure what to click ... the manual does not seem to have this dialogue.

Advice gratefully received :smile:

thanks, Ian
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
DO NOT USE THE VOLUME MANAGER TO REPLACE DISKS.

The only possible outcomes from the Volume Manager are creating a new pool, or adding a new disk to your existing pool (destroying redundancy in the process).

I don't know where people get this idea from--the manual has click-by-click instructions on how to replace a failed drive. Nowhere do they say to go anywhere near the volume manager. The manual is at http://web.freenas.org/images/resources/freenas9.2.1/freenas9.2.1_guide.pdf, and the instructions for replacing a failed disk are in section 6.3.12, starting on page 133. Storage -> Volumes -> View Volumes -> select your pool and click Volume Status (button at the bottom that looks like a blank sheet of notebook paper). If the failed drive is already offline (which it probably is), and your new disk is already installed, selected the failed drive, click the replace button, select your new disk in the pop-up window, and click Replace Disk.
 

Iandoug

Dabbler
Joined
Mar 5, 2015
Messages
20
Hi
Thanks for your reply. I was in section 6.3.12. I didn't follow the
"Before physically removing the failed device, go to Storage → Volumes → View Volumes → Volume
Status and locate the failed disk. Once you have located the failed device in the GUI, perform the
following steps:" because the dead drive didn't show up in View Disks and I assumed that doing the above step would not work either.

Doing the above produced a screen similar to the manual, the long-number drive was listed as unavailable, I clicked it, which produced a Replace button at the bottom, and that let me select the new drive as replacement. It's now busy doing it's thing, presumably formatting the drive etc.

I went to the Volume manager out of ignorance ... much of ZFS and FreeNas is black magic to me, and my precise situation wasn't covered in the manual (well, that's how it seemed to me). Freenas is something I use without fully understanding all the terminology which is not a good thing to admit, but there's only so many hours in a day and much to do ... :smile:

Anyway after doing ANOTHER volume status screen, it now shows four drives online and the long-number one still as Unavailable.

Am now watching progress via ssh and zpool status ... so far looks okay.

thanks, Ian
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Anyway after doing ANOTHER volume status screen, it now shows four drives online and the long-number one still as Unavailable.

That's fine. It's unavailable, so not unexpected. The pool will continue to reflect a degraded status until you jettison the failed component from the ZFS configuration. Follow the manual and it'll take you through it, step by step, and at the end all should be fine. It's important to do the right things in the right sequence so follow the manual. You should be just fine.
 

Robert Smith

Patron
Joined
May 4, 2014
Messages
270
Ian, stop. It sounds like you are in integrated RAID mode. Thread carefully. Backup your important data immediately.
 

Iandoug

Dabbler
Joined
Mar 5, 2015
Messages
20
Ian, stop. It sounds like you are in integrated RAID mode. Thread carefully. Backup your important data immediately.

Um, not quite sure what you mean by integrated RAID mode, AFAIK the setup is RAID5. Current situation looks like this:
Code:
[ian@freenas ~]$ zpool status
  pool: zti
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
  continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Mar  5 14:47:54 2015
  2.03T scanned out of 5.92T at 113M/s, 9h59m to go
  520G resilvered, 34.29% done
config:

  NAME  STATE  READ WRITE CKSUM
  zti  DEGRADED  0  0  0
  raidz1-0  DEGRADED  0  0  0
  replacing-0  UNAVAIL  0  0  0
  13791693195237792015  UNAVAIL  0  0  0  was /dev/gpt/disk0
  gptid/d5f3948f-c335-11e4-b7af-78acc0f79a09  ONLINE  0  0  0  block size: 512B configured, 4096B native  (resilvering)
  gpt/disk1  ONLINE  0  0  0  block size: 512B configured, 4096B native
  gpt/disk2  ONLINE  0  0  0  block size: 512B configured, 4096B native
  gpt/disk3  ONLINE  0  0  0  block size: 512B configured, 4096B native

errors: No known data errors


Data on the drive is still accessible over the LAN via Dolphin file browser (am on a Linux desktop).

So I hope ZFS/Freenas is doing what it is supposed to do, nicely... :smile:

thanks, Ian
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
"So I hope ZFS/Freenas is doing what it is supposed to do, nicely..." yes, no problem here ;)

But I can see you use RAID-Z1 with drives bigger than 1TB, it's not recommended because of the likelihood of an uncorrectable error during the resilvering. But right now you can't do much except pray that the resilver ends without errors...

I can also see that this pool is configured with 512 bytes sectors and that all the drives use 4k sectors. It's not that important however.
 

Robert Smith

Patron
Joined
May 4, 2014
Messages
270
Um, not quite sure what you mean by integrated RAID mode, AFAIK the setup is RAID5.

The way you described the issue, it sounds like you have hard drives connected to an integrated RAID controller, which in turn exports them to the system.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The way you described the issue, it sounds like you have hard drives connected to an integrated RAID controller, which in turn exports them to the system.

I didn't get that out of it, though I assumed when he said "RAID5" he actually meant "RAIDZ." The perpetual terminology failure thing makes translating problems hard.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The OP should say what he means. RAIDZ1 is NOT RAID5. If you do RAID5 with ZFS you are an idiot. If you do RAIDZ1 with ZFS you are just taking major risks of losing all of your data.
 

Iandoug

Dabbler
Joined
Mar 5, 2015
Messages
20
The OP should say what he means. RAIDZ1 is NOT RAID5. If you do RAID5 with ZFS you are an idiot. If you do RAIDZ1 with ZFS you are just taking major risks of losing all of your data.

The OP set up this NAS box a few years ago and made what he thought was the best decision .... I'm not a RAID expert and RAID5 (okay Z1) seemed like the best choice at the time. I will read your guide, because I see comments like I am at risk, and I should not be using Z1 with drives bigger than 1TB, but in all fairness I do not recall any warnings from FreeNas about things like that when I set it up. Possibly the version I used didn't have such warnings.

Anyway I am now done resilvering but given the above, and the warnings here:
Code:
[ian@freenas ~]$ zpool status
  pool: zti
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
  continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Mar  5 14:47:54 2015
  5.92T scanned out of 5.92T at 107M/s, 0h0m to go
  1.48T resilvered, 100.00% done
config:

  NAME  STATE  READ WRITE CKSUM
  zti  DEGRADED  0  0  0
  raidz1-0  DEGRADED  0  0  0
  replacing-0  UNAVAIL  0  0  0
  13791693195237792015  UNAVAIL  0  0  0  was /dev/gpt/disk0
  gptid/d5f3948f-c335-11e4-b7af-78acc0f79a09  ONLINE  0  0  0  block size: 512B configured, 4096B native  (resilvering)
  gpt/disk1  ONLINE  0  0  0  block size: 512B configured, 4096B native
  gpt/disk2  ONLINE  0  0  0  block size: 512B configured, 4096B native
  gpt/disk3  ONLINE  0  0  0  block size: 512B configured, 4096B native

errors: No known data errors
[ian@freenas ~]$ zpool status
  pool: zti
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
  Expect reduced performance.
action: Replace affected devices with devices that support the
  configured block size, or migrate data to a properly configured
  pool.
  scan: resilvered 1.48T in 16h6m with 0 errors on Fri Mar  6 06:54:16 2015
config:

  NAME  STATE  READ WRITE CKSUM
  zti  ONLINE  0  0  0
  raidz1-0  ONLINE  0  0  0
  gptid/d5f3948f-c335-11e4-b7af-78acc0f79a09  ONLINE  0  0  0
  gpt/disk1  ONLINE  0  0  0  block size: 512B configured, 4096B native
  gpt/disk2  ONLINE  0  0  0  block size: 512B configured, 4096B native
  gpt/disk3  ONLINE  0  0  0  block size: 512B configured, 4096B native

errors: No known data errors


what is the best way of following the Action step suggested above?

The drives are Seagate 3x ST2000DL series (old) and 1x ST2000DM series.

When I set up the box I don't recall having to decide on block size and if I did I would have followed whatever was recommended.

I have another FreeNAS box which I think has 4x3TB or 4x4TB which I could use to back everything on the first box to if necessary. I think this is also Z1 which likely has the same issues, I don't know :

Code:
[ian@freenas2 ~]$ zpool  status -v
  pool: freenas2
 state: ONLINE
  scan: scrub repaired 0 in 5h36m with 0 errors on Sun Feb 15 05:36:07 2015
config:

  NAME  STATE  READ WRITE CKSUM
  freenas2  ONLINE  0  0  0
  gptid/56c481e7-d169-11e3-ab96-9cb65404609d  ONLINE  0  0  0
  gptid/57a45521-d169-11e3-ab96-9cb65404609d  ONLINE  0  0  0
  gptid/588266a6-d169-11e3-ab96-9cb65404609d  ONLINE  0  0  0
  gptid/59643a99-d169-11e3-ab96-9cb65404609d  ONLINE  0  0  0

errors: No known data errors


Let me read your guide, since I could not get the Perfect Answer out of the manual ... :smile:

thanks, Ian
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The OP set up this NAS box a few years ago and made what he thought was the best decision .... I'm not a RAID expert and RAID5 (okay Z1) seemed like the best choice at the time. I will read your guide, because I see comments like I am at risk, and I should not be using Z1 with drives bigger than 1TB, but in all fairness I do not recall any warnings from FreeNas about things like that when I set it up. Possibly the version I used didn't have such warnings.

There is no such warning. It's expected that the server administrator will weigh the pros and cons when deciding what to use and choose appropriately for the workload. If it's a backup server, you might be okay with RAIDZ1 (and many do use RAIDZ1 for backup servers because it is a backup).

The way to resolve the "action" is to replace the other 3 disks in that pool with newer disks that have a 4k sector size. If you are okay with the performance of that pool then ignore the "action".

Uh, I have bad news for you about freenas2. That pool is a striped 4-disk set. If any disk in that pool starts having *any* kind of unrecoverable error, you will likely lose *all* of the data in the entire pool. I would *definitely* make it a priority to redo freenas2 ASAP! When I say ASAP I mean "before the end of the weekend wouldn't be soon enough".
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Good, no errors on the resilver ;)

You can't correct the block size problem without deleting the current pool. So if you decide to fix it you need to copy the data elsewhere before. It's not a big problem since it's performance related and your network is likely the bottleneck but you can take the opportunity to also change the RAID-Z1 to a RAID-Z2 (note that you'll have one disk less of space if you don't add one) ;)

To fix the block size problem just delete and recreate a new pool, FreeNAS uses 4k by default now. To change from RAID-Z1 to RAID-Z2 simply chose RAID-Z2 when you recreate the pool.

Edit: cyberjock beated me...

"Uh, I have bad news for you about freenas2. That pool is a striped 4-disk set." Arf, I was about to ask why the type of vdev wasn't displayed on that pool... now I know.

So, yeah, backup all data from freenas2 to zti, delete freenas2, recreate freenas2 using RAID-Z1 (RAID-Z2 if you can and want), copy the data back from zti to freenas2. Only then you can change zti if you want.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Correction to my last post. I had it wrong. Bidule0hm has it right. Those drives are 4k drives, but you are using a 512 byte sector size in ZFS. The only way to fix it is to destroy the pool and recreate it. If you are okay with the performance as it is (and you probably are fine) I'd leave it like it is. But when you decide you want to expand the pool someday instead of adding a new vdev make a new pool so that it uses 4k blocks and 4k drives. ;)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The OP should say what he means. RAIDZ1 is NOT RAID5. If you do RAID5 with ZFS you are an idiot. If you do RAIDZ1 with ZFS you are just taking major risks of losing all of your data.

I'm tired of the terminology fails. I wrote this, and I intend to start pointing people at it and asking them to fix their posts when they make a major terminology fail like RAID5 instead of RAIDZ1. Communicating technical problems with approximate terminology is a confusing fail, as you note. I've been doing far too much forgiving-autocorrect-in-my-head for too long, and that just enables the badness to continue.
 

Iandoug

Dabbler
Joined
Mar 5, 2015
Messages
20
Hi Gentlemen (Ian assumes you are all gentlemen)

Thanks for all the replies and advice, gratefully received.

Some things I want to query and 'explain' others....

1. Re Raid5 vs RaidZ1: Please remember I'm a web developer (as opposed to a storage specialist) who set up the first NAS box a few years ago. From what I remember of that, was that I was using RAID5 -style *as opposed to* mirroring or striping. I had forgotten ZFS has its own variant. Apologies for any blood pressures raised .. :smile:

2. Re "If you do RAIDZ1 with ZFS you are just taking major risks of losing all of your data.", what then SHOULD I be using? I have two NAS boxes, both in HP Proliant boxes which only take 4 drives. The one is 4x2 TB (Seagates!) and the other is 4x4 TB (WD Red).
According to the manual I should be using Z2? I would prefer to maximise the available disk space. At present Nas1 has 4.4TB and Nas2 has 7.5TB, but it includes some stuff from Nas1 which I mirrored before upgrading Freenas on that box. I kept it there when Nas1 started beeping. The files are assorted media files collected over the years. It's not business critical but would be a pain to replace. I am also supposed to be backing up this PC with my work to one of the boxes...

3. Re "Uh, I have bad news for you about freenas2. That pool is a striped 4-disk set." Arf, I was about to ask why the type of vdev wasn't displayed on that pool... now I know.
I was shocked when I read that ... since I thought I had set up Nas2 the same way as Nas1 (ie Z1) and can't for the life of me think why I would have chosen striping...

Perhaps the devs can be more explicit in the zpool status messages about which storage mechanism is in use? Printing "raidz1-0 ONLINE 0 0 0" means something to the devs and people familiar with the message, but to me it meant nothing at the time, except to make me wonder why Nas2 status didn't have anything similar. More explicit messages would help.

Which brings us to the rebuilds of both boxes... clearly I should be doing Nas2 first, but where do I park 7.5 TB (or 5 TB for that matter)? Online is NOT an option here in South Africa.... we don't have the bandwidth. My ADSL upstream is limited to around 128Kbps IIRC, even if I had TBs of space somewhere to use.

So I started investigating tape drives and was suitably floored by the prices of things like HP StoreEver LTO-6 Ultrium 6250 SAS External Tape Drive/S-Buy at USD2500, particularly when the price here, from online companies I'm not sure I trust, are even more outrageous at R68k for the LTO 6 and R36k for the LTO 5 while the WD Arkeia RA4300 was even more depressing. Looks like we don't get the S-Buy prices from HP here in SA (which I could probably afford but it will be painful and tricky to justify to myself).

I accept that having a tape backup drive is A Good Thing but the price ...

Then I thought of using another HP Proliant box, which is destined to be an Elastix box (including security cameras), and buying 4x4TB WD Purple drives, set up Nas3, and use that to save things while rebuilding. It will be expensive but cheaper than the tapes, but don't think I'm ever going to need to that much space for video surveillance footage (or even if Elastix will handle 4 drives). It also leaves me without proper backups... I originally thought using Freenas would be the last step but given what I've been pointed to read in the last few days I have been forced to change my thinking ...

Thanks, Ian
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
"Raid5 vs RaidZ1" it wasn't against you in particular, it's just the the "straw that broke the camel's back".

At least now you know your data is at risk but "I would prefer to maximise the available disk space." so you've done the choice to prioritize space over redundancy, ok, no problem (but it's your choice so don't complain if you lose the pool) ;)

"Printing "raidz1-0 ONLINE 0 0 0" means something to the devs and people familiar" it's not FreeNAS's devs fault, it's ZFS's devs fault. But, I've checked and it's not displayed ine the web GUI so you might want to fill a feature request to ask to add the RAID type in the storage tab in the GUI :)

For me I looked on the used market for older LTO drives a few weeks ago and LTO-4 prices aren't that bad. With tape backup the thing to know is that the drive costs are high but the tapes are very cheap. So if you don't backup often and/or don't have several servers to backup it's less expensive to use HDDs ;)
 
Last edited:

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
Hi Gentlemen (Ian assumes you are all gentlemen)

This is true neither of the project nor the forum. I am disinterested in this respect but it would be more inclusive to avoid that false assumption!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
1. Re Raid5 vs RaidZ1: Please remember I'm a web developer (as opposed to a storage specialist) who set up the first NAS box a few years ago. From what I remember of that, was that I was using RAID5 -style *as opposed to* mirroring or striping. I had forgotten ZFS has its own variant. Apologies for any blood pressures raised .. :)

Right, but the problem is that people ALSO try to do things like layering ZFS on top of a hardware RAID5 (bad idea!) and misusing the terms leads to confusion.

In your specialty, if someone came in and started asking questions about how to fix a PHP problem but they were actually writing in Node, ...

For me, at least, I've been around here about as long as anyone, and the technical accuracy thing has been a problem for the whole time. You just happened to be the guy who misused on a day when I was already arrrrghing and had some free time.

2. Re "If you do RAIDZ1 with ZFS you are just taking major risks of losing all of your data.", what then SHOULD I be using? I have two NAS boxes, both in HP Proliant boxes which only take 4 drives. The one is 4x2 TB (Seagates!) and the other is 4x4 TB (WD Red).
According to the manual I should be using Z2? I would prefer to maximise the available disk space. At present Nas1 has 4.4TB and Nas2 has 7.5TB, but it includes some stuff from Nas1 which I mirrored before upgrading Freenas on that box. I kept it there when Nas1 started beeping. The files are assorted media files collected over the years. It's not business critical but would be a pain to replace. I am also supposed to be backing up this PC with my work to one of the boxes...

I feel your pain there. We have some 1U 4 drive storage units as well. If you have "pain to replace" or "business critical" data, the thing you need to contemplate is that RAIDZ1 means that if a disk fails, you have actually lost redundancy, so any further problems - even a single sector failing to read - means that you might be losing data. So you have to contemplate whether your backup strategy would be sufficient for recovery purposes, or if the annoyance of losing files or losing a pool warrants RAIDZ2.
 

Iandoug

Dabbler
Joined
Mar 5, 2015
Messages
20
Correction to my last post. I had it wrong. Bidule0hm has it right. Those drives are 4k drives, but you are using a 512 byte sector size in ZFS. The only way to fix it is to destroy the pool and recreate it. If you are okay with the performance as it is (and you probably are fine) I'd leave it like it is. But when you decide you want to expand the pool someday instead of adding a new vdev make a new pool so that it uses 4k blocks and 4k drives. ;)

Sorry to resuscitate such an old thread, but I've finally gotten around to attending to the NAS box that has 512 k sectors instead of 4k.

After transferring everything elsewhere, I installed the latest FreeNAS on the internal flash drive, deleted the pool and created a new one, but it still appears to be 512.

Code:
[ian@freenas] /mnt/nas1/ian# smartctl -q noserial -a /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Seagate Barracuda 7200.14 (AF)
Device Model:  ST2000DM001-1ER164
Firmware Version: CC25
User Capacity:  2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical

Code:
[ian@freenas] /mnt/nas1/ian# smartctl -q noserial -a /dev/ada1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Seagate Barracuda Green (AF)
Device Model:  ST2000DL003-9VT166
Firmware Version: CC32
User Capacity:  2,000,398,934,016 bytes [2.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  5900 rpm


The remaining 2 drives are same as second one above.

So is this normal? Ie the drives at a hardware level only support 512?

The second drive above is now giving errors, and I need to replace it, but neither existing model number is still in production, so I need a "compatible" drive.

The (relevant, AFAIK) errors are:
Code:
  5 Reallocated_Sector_Ct  0x0033  100  100  036  Pre-fail  Always  -  160
197 Current_Pending_Sector  0x0012  100  099  000  Old_age  Always  -  16
198 Offline_Uncorrectable  0x0010  100  099  000  Old_age  Offline  -  16
Code:

But if I read this correctly, then the error appeared 39 hours after First Life, and for all I know it could have been me messing about with what was then a new box:

Code:
Error 11 occurred at disk power-on lifetime: 36763 hours (1531 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 ab ff ff ff 4f 00  39d+09:48:48.802  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00  39d+09:48:48.802  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  39d+09:48:48.693  READ LOG EXT
  60 00 ab ff ff ff 4f 00  39d+09:48:44.913  READ FPDMA QUEUED
  60 00 ab ff ff ff 4f 00  39d+09:48:44.913  READ FPDMA QUEUED
So question is, is this a problem (ie not a transient thing that can be "cleared"), meaning I must replace the drive, which has been working fine forever, and did not cause problems when another drive died? Also given that new drives are 4k, will they still accept logical 512 sectors like the first drive above? I'm probably going to replace the drive with a Seagate NAS drive (just to maintain some 'compatibility' in the box). Thanks for any advice :-) Cheers, ian
 
Status
Not open for further replies.
Top