Unable to identify drive with sas2ircu

Chris Moore · Nov 29, 2017

That is unfortunate.
Can you schedule down-time for this system?
I had asked what the purpose was, partly to find out how big the impact would be to taking it offline.
Are the enclosures you have situated so that you can see the drive serial numbers relatively easily?
I have a top-loaded system where I can slide the unit out, open the lid and look at all the serial numbers directly.
This is probably going to take a manual hunt for the missing serial number, but you have a list of all the working serial numbers, so when you find a serial number that isn't in your list, you know that is the bad drive.
This is a rough situation to be in. It sure looks like someone set you up for failure. Sorry.
The pool with the failed drive, named 'cRaid', from what you showed us, has 8TB drives and it should have a capacity around 599 TB, but you should keep usage below 475 TB if you go by the 80% rule, but you can push it closer to 90% without significant performance loss.
The real problem with both of these pools is the span of disks.
If it were possible, all the data needs to be moved from this system so the pools can be reconfigured. All of it could be in one large pool, but it needs to be broken into multiple vdevs (virtual devices) which is where the redundancy comes from.
You could also fill those 60 bay enclosures with more disks to add additional capacity.

Chris Moore · Nov 29, 2017

Rough estimate, in today's dollars, this is around $55,584 just in drives and over 710TB of usable capacity. Data that valuable really needs better redundancy to keep it safe.

Ericloewe · Nov 29, 2017

Chris Moore said:
If it were possible, all the data needs to be moved from this system so the pools can be reconfigured.

I'd say that it needs to be possible. This is an imminent disaster that could fail if someone sneezed next to it (I'm only half-joking with the sneeze part).

You'll want to use this calculator to get a good idea of how many and what size disks you'll need for a proper setup - choose RAIDZ2 and set it to 10 disks, which is the recommended maximum vdev width.
https://forums.freenas.org/index.php?resources/zfs-raid-size-and-reliability-calculator.49/

SweetAndLow · Nov 29, 2017

I see why the previous guy was fired. For that kind of money there needs to be a support contract and a company backing the product.

jinno · Nov 30, 2017

Ericloewe said:
I'd say that it needs to be possible. This is an imminent disaster that could fail if someone sneezed next to it (I'm only half-joking with the sneeze part).

You'll want to use this calculator to get a good idea of how many and what size disks you'll need for a proper setup - choose RAIDZ2 and set it to 10 disks, which is the recommended maximum vdev width.
https://forums.freenas.org/index.php?resources/zfs-raid-size-and-reliability-calculator.49/

Agreed...I will attempt to replace the drive this weekend (short term) and set action plans to redo the whole configuration as soon as possible. Thanks everyone for your help.

Will keep you updated.

Cheers,
Jinno

jinno · Nov 30, 2017

SweetAndLow said:
I see why the previous guy was fired. For that kind of money there needs to be a support contract and a company backing the product.

Once again, you are correct. For that kind of money and the amount of mission critical data involved. The person was not fired, he was a consultant. I am their first full-time IT person and I am looking into a support contract.

Best,
Jinno

jinno · Dec 1, 2017

Question...As I mentioned the chassis are not fully populated with drives. Theoretically, can I not add additional drives and designate them as hot spares?

Thanks in advance,
Jinno

danb35 · Dec 1, 2017

jinno said:
Theoretically, can I not add additional drives and designate them as hot spares?

You can, and that will somewhat mitigate your risk. The problem with that is that the drives still need to resilver into the pool, and with the enormous vdevs you have, that will take quite a while. It's definitely better than nothing, but it's still likely to leave you exposed for a week or two while resilvering completes.

Ericloewe · Jan 8, 2018

How is this going along?

jinno · Jan 10, 2018

Well, our backup system consisted of a 2 drive LTO6 Loader and AWS. Not ideal for backing up that much data. We upgraded to a 50 slot 3 drive Tape Library. Unfortunately there were some setbacks (I was hoping to get this done during the holiday) and it's still not fully functional. The idea was use this to do a full backup since we don't really have one. Then tackle the RAID issue.

jinno · Jan 10, 2018

Chris Moore said:
That is unfortunate.
Can you schedule down-time for this system?
I had asked what the purpose was, partly to find out how big the impact would be to taking it offline.
Are the enclosures you have situated so that you can see the drive serial numbers relatively easily?
I have a top-loaded system where I can slide the unit out, open the lid and look at all the serial numbers directly.
This is probably going to take a manual hunt for the missing serial number, but you have a list of all the working serial numbers, so when you find a serial number that isn't in your list, you know that is the bad drive.
This is a rough situation to be in. It sure looks like someone set you up for failure. Sorry.
The pool with the failed drive, named 'cRaid', from what you showed us, has 8TB drives and it should have a capacity around 599 TB, but you should keep usage below 475 TB if you go by the 80% rule, but you can push it closer to 90% without significant performance loss.
The real problem with both of these pools is the span of disks.
If it were possible, all the data needs to be moved from this system so the pools can be reconfigured. All of it could be in one large pool, but it needs to be broken into multiple vdevs (virtual devices) which is where the redundancy comes from.
You could also fill those 60 bay enclosures with more disks to add additional capacity.

Sorry, I did not answer some of your previous questions.
No, I would need to remove the drives to see the serial numbers.
The schedule downtime was the holiday, but I couldn't get all the pieces together by that then.
Yes, it is a not an ideal situation, but we decided to tackle the backup situation first.

Chris Moore · Jan 10, 2018

I sure wish you would have responded earlier. If you can shutdown the system, you can pull all the drives and determine the serial number / slot number relationship. I have maps for the big systems I am tending.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

Chris Moore · Jun 22, 2018

jinno said:
The schedule downtime was the holiday, but I couldn't get all the pieces together by that then.
Yes, it is a not an ideal situation, but we decided to tackle the backup situation first.

Out of curiosity, what ever happened with this?

Important Announcement for the TrueNAS Community.

Unable to identify drive with sas2ircu

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

Ericloewe

Server Wrangler

SweetAndLow

Sweet'NASty

jinno

Dabbler

jinno

Dabbler

jinno

Dabbler

danb35

Hall of Famer

Ericloewe

Server Wrangler

jinno

Dabbler

jinno

Dabbler

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

Similar threads

Important Announcement for the TrueNAS Community.

Unable to identify drive with sas2ircu

Hall of Famer

Hall of Famer

Server Wrangler

Sweet'NASty

Dabbler

Dabbler

Dabbler

Hall of Famer

Server Wrangler

Dabbler

Dabbler

Hall of Famer

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Unable to identify drive with sas2ircu"

Similar threads