Unable to identify drive with sas2ircu

Status
Not open for further replies.

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
That is unfortunate.
Can you schedule down-time for this system?
I had asked what the purpose was, partly to find out how big the impact would be to taking it offline.
Are the enclosures you have situated so that you can see the drive serial numbers relatively easily?
I have a top-loaded system where I can slide the unit out, open the lid and look at all the serial numbers directly.
This is probably going to take a manual hunt for the missing serial number, but you have a list of all the working serial numbers, so when you find a serial number that isn't in your list, you know that is the bad drive.
This is a rough situation to be in. It sure looks like someone set you up for failure. Sorry.
The pool with the failed drive, named 'cRaid', from what you showed us, has 8TB drives and it should have a capacity around 599 TB, but you should keep usage below 475 TB if you go by the 80% rule, but you can push it closer to 90% without significant performance loss.
The real problem with both of these pools is the span of disks.
If it were possible, all the data needs to be moved from this system so the pools can be reconfigured. All of it could be in one large pool, but it needs to be broken into multiple vdevs (virtual devices) which is where the redundancy comes from.
You could also fill those 60 bay enclosures with more disks to add additional capacity.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Rough estimate, in today's dollars, this is around $55,584 just in drives and over 710TB of usable capacity. Data that valuable really needs better redundancy to keep it safe.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If it were possible, all the data needs to be moved from this system so the pools can be reconfigured.
I'd say that it needs to be possible. This is an imminent disaster that could fail if someone sneezed next to it (I'm only half-joking with the sneeze part).

You'll want to use this calculator to get a good idea of how many and what size disks you'll need for a proper setup - choose RAIDZ2 and set it to 10 disks, which is the recommended maximum vdev width.
https://forums.freenas.org/index.php?resources/zfs-raid-size-and-reliability-calculator.49/
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I see why the previous guy was fired. For that kind of money there needs to be a support contract and a company backing the product.
 

jinno

Dabbler
Joined
Nov 28, 2017
Messages
13
I'd say that it needs to be possible. This is an imminent disaster that could fail if someone sneezed next to it (I'm only half-joking with the sneeze part).

You'll want to use this calculator to get a good idea of how many and what size disks you'll need for a proper setup - choose RAIDZ2 and set it to 10 disks, which is the recommended maximum vdev width.
https://forums.freenas.org/index.php?resources/zfs-raid-size-and-reliability-calculator.49/


Agreed...I will attempt to replace the drive this weekend (short term) and set action plans to redo the whole configuration as soon as possible. Thanks everyone for your help.

Will keep you updated.

Cheers,
Jinno
 

jinno

Dabbler
Joined
Nov 28, 2017
Messages
13
I see why the previous guy was fired. For that kind of money there needs to be a support contract and a company backing the product.

Once again, you are correct. For that kind of money and the amount of mission critical data involved. The person was not fired, he was a consultant. I am their first full-time IT person and I am looking into a support contract.

Best,
Jinno
 

jinno

Dabbler
Joined
Nov 28, 2017
Messages
13
Question...As I mentioned the chassis are not fully populated with drives. Theoretically, can I not add additional drives and designate them as hot spares?

Thanks in advance,
Jinno
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Theoretically, can I not add additional drives and designate them as hot spares?
You can, and that will somewhat mitigate your risk. The problem with that is that the drives still need to resilver into the pool, and with the enormous vdevs you have, that will take quite a while. It's definitely better than nothing, but it's still likely to leave you exposed for a week or two while resilvering completes.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
How is this going along?
 

jinno

Dabbler
Joined
Nov 28, 2017
Messages
13
Well, our backup system consisted of a 2 drive LTO6 Loader and AWS. Not ideal for backing up that much data. We upgraded to a 50 slot 3 drive Tape Library. Unfortunately there were some setbacks (I was hoping to get this done during the holiday) and it's still not fully functional. The idea was use this to do a full backup since we don't really have one. Then tackle the RAID issue.
 

jinno

Dabbler
Joined
Nov 28, 2017
Messages
13
That is unfortunate.
Can you schedule down-time for this system?
I had asked what the purpose was, partly to find out how big the impact would be to taking it offline.
Are the enclosures you have situated so that you can see the drive serial numbers relatively easily?
I have a top-loaded system where I can slide the unit out, open the lid and look at all the serial numbers directly.
This is probably going to take a manual hunt for the missing serial number, but you have a list of all the working serial numbers, so when you find a serial number that isn't in your list, you know that is the bad drive.
This is a rough situation to be in. It sure looks like someone set you up for failure. Sorry.
The pool with the failed drive, named 'cRaid', from what you showed us, has 8TB drives and it should have a capacity around 599 TB, but you should keep usage below 475 TB if you go by the 80% rule, but you can push it closer to 90% without significant performance loss.
The real problem with both of these pools is the span of disks.
If it were possible, all the data needs to be moved from this system so the pools can be reconfigured. All of it could be in one large pool, but it needs to be broken into multiple vdevs (virtual devices) which is where the redundancy comes from.
You could also fill those 60 bay enclosures with more disks to add additional capacity.


Sorry, I did not answer some of your previous questions.
No, I would need to remove the drives to see the serial numbers.
The schedule downtime was the holiday, but I couldn't get all the pieces together by that then.
Yes, it is a not an ideal situation, but we decided to tackle the backup situation first.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I sure wish you would have responded earlier. If you can shutdown the system, you can pull all the drives and determine the serial number / slot number relationship. I have maps for the big systems I am tending.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The schedule downtime was the holiday, but I couldn't get all the pieces together by that then.
Yes, it is a not an ideal situation, but we decided to tackle the backup situation first.
Out of curiosity, what ever happened with this?
 
Status
Not open for further replies.
Top