Unsufficient Replicas Error

og-soulja

Cadet
Joined
Oct 17, 2021
Messages
5
Dear Team,
Firstly, i'm not familiar with FreeNAS and need your assistance ASAP as there is a big issue in our system.
It seems that Pools are not active and when i want to import pools with zpool import commandç i have seen some mirrors are unavailable and has Unsufficient Replicas Error.
I really need your assistance. I have attached a screenshot of shell. Sorry i don't even know how to copy the codes from Shell. :oops:

Ver: FreeNAS-11.2-U7

Please ask me if you need to have further information.
 

Attachments

  • 3.JPG
    3.JPG
    47.4 KB · Views: 149
  • 1.JPG
    1.JPG
    99.6 KB · Views: 161
  • 2.JPG
    2.JPG
    94.3 KB · Views: 162

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
First - do you have a backup?
Other than that I can't help you as I have no idea what hardware you have (other than a boatload of disks)
However as you have lost several disks at once I would be looking at cabling / power / backplanes rather than disk failure (not that several disks can't fail at once)
 

og-soulja

Cadet
Joined
Oct 17, 2021
Messages
5
Unfortunately i don't have a backup.
If i remember correctly we have assigned spare disks for each pool while installing.

*There is 12g sas cable and i have also checked the cabling.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
Clue: Have a look at my signature
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
The action required here is simple... you must reconnect at least one drive each from mirrors 24 and 25 or you won't be getting anything back without a recovery tool (and even with a tool, only some of it would be recoverable).

Your pool is a very odd structure. One very wide RAIDZ3 VDEV and 7 mirrors in the same pool... are those mirrors special VDEVs?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
My suspicion, based on very little evidence is that the pool started as a wide pool and its been expanded by mirrors over time. This has now come to bite the OP in a big way potentially (I still suspect the issue is his non-existent hardware)

Not much we can help with though as I suspect that the info he has given us is incomplete anyway.
 

og-soulja

Cadet
Joined
Oct 17, 2021
Messages
5
I agree with you because i don't think it's not possible to lost 7 Harddisks at the same time.
My biggest challange is that i don't have much idea about both Freenas and hardware.
The worst thing is this place is a laboratory and there is approximately 400TBs of patient data.
Guess how much pressure i have from the management now. :(

Please tell me what more information you need and how to proceed?

Thanks in advance.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
170
Please tell me what more information you need and how to proceed?

Actual hardware configuration would be most useful. Which controller(s) are used and how the drives are connected. The first thing will be to look at the single item which is in the path connecting to the missing disks, but not in the path connecting to the still available disks. This item will likely be at fault.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
So - based in Ankara
As @AlexGG says you need to locate the "failed" disks. What are they plugged into. Trace the cabling back

We will need the full hardware specs of the device, including the motherboard, any HBA's / Disk Controllers, Disk Drive Models and even the details of the case / shelves the disks are in.

You will need to, without unplugging anything (yet) identify the failed disks physically and then trace back how they are connected to the computer. Then compare where the failed drives are attached and try and find a common part - SAS Cable, HBA, Backplane, Expander - whatever is in there.

Once you have that info you can move drives around to see if the fault moves with them (its safe to move drives around with TrueNAS due to the logical identifiers used on the disks rather than /dev/ada1 etc). Just be careful that you can revert back to where everything was placed initially.

Use
Code:
glabel status
to cross reference the gptid (logical identifier) with the device - it might help you work out which drive is which (at least for the working drives)
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
To be honest given the nature of the data and your apparent lack of knowledge I would look at getting some help on site (sort of thing I would love to do - but UK to Ankara is not exactly local). Who supplied the system / built the system? Can they help?

Incidentally if the data is recovered then your Management will need to put their hand in their pockets and produce a fair amount of cash as backing up 400+TB is not going to be done with an off the shelf cheap NAS.
RAID is NOT a backup and if this is critical data it needs to be backed up - just in case this exact situation happens.
 

og-soulja

Cadet
Joined
Oct 17, 2021
Messages
5
I absolutely agree with you.
The system was build by a Field Service Engineer and he also built all the things with a limited knowledge about network. That's why we are in trouble now :smile:
I have also Service Desk background and not so familiar with FreeNas as i said before but firstly i need to recover all the data and provide a better solution to the management team such as getting a consultancy services from a company or partnership etc.

I have never seen the server physically as i work in head office but i will be at the site for today.
Until then all information regarding the hardware is this (Please see the attachment)
I will give more information when i arrive.

@NugentS ,
Do you want me to bring the system to UK :smile:
 

Attachments

  • Capture.JPG
    Capture.JPG
    102.9 KB · Views: 127

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
LOL - whilst the concept is amusing I don't think shipping a broken system from Ankara (or thereabouts) to the UK comes under the heading of a good idea.

As @sretalla says - you need to recover a minimum of two disks - its just that locating the specific disks may be interesting. A lot will depend on I suspect if you have spare slots to move drives around. Possibly even spare PCIe slots to install another HBA to plug disks into or even a spare machine with an HBA to test whether disks appear on it.

I am of course assuming that the hardware has been specified correctly and that HBA's are using IT Mode.

You also need to consider now how you are going to backup 400TB of data to an alternative system given that HDD's are limited to 18-20TB - so you would need 22 or so drives - and that's without resiliency. SO actually would need something like ( I suggest)
vdev 1 8 drives+3 (Z3)
vdev 2 8 drives+3 (Z3)
vdev 3 8 drives+3 (Z3)
vdev 4 8 drives+3 (Z3)
That's 44 drives giving 581TB useable space ish
I think I would probably design that as a server with lots of PCIe slots for external HBA's with 4 external shelves (one vdev per shelf) for the drives themselves. Any drive slots in the main chassis would be used for emergency / expansion space. OR a supermicro 60 bay top loader as long as the DC its in is seriously chilled

Or you could use
vdev 1: 6 drives+3 (z3)
vdev 2: 6 drives+3 (z3)
using 50TB SSD's from Nimbus at $12,500 each (approximately) - giving 590TB of space ish

or

vdev 1: 6 drives+3 (z3) using 100TB SSD's from Nimbus at $40K each

or - the sensible option - talk to IXSystems and ask them for a 500+TB system design and resilient hardware to boot

Hmm - I wonder how long 400TB would take to copy
 

og-soulja

Cadet
Joined
Oct 17, 2021
Messages
5
Please see the attachments which give more information about the system.

Server:HP ProLiant ML350 Gen9
Each Disk: 14 TB SAS 12Gb/s - Western Digital
LSI SAS3 MPT Controller
60/60 storage is full so there is no space to move any disk.
 

Attachments

  • MicrosoftTeams-image (8).jpg
    MicrosoftTeams-image (8).jpg
    95.4 KB · Views: 115
  • MicrosoftTeams-image (5).jpg
    MicrosoftTeams-image (5).jpg
    157.8 KB · Views: 120
  • MicrosoftTeams-image (3).jpg
    MicrosoftTeams-image (3).jpg
    95.4 KB · Views: 112
  • MicrosoftTeams-image (7).jpg
    MicrosoftTeams-image (7).jpg
    184.2 KB · Views: 106
  • MicrosoftTeams-image (4).jpg
    MicrosoftTeams-image (4).jpg
    270.8 KB · Views: 120

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
That's running MPT SAS3 version 6 firmware? Yikes. Try updating it to version 16, except you probably need some directions on how to do that, hmm. This won't fix any truly broken drives, but maybe we'll luck out and it hasn't actually failed unrecoverably, and maybe it is even just a controller communication issue. There could be value in powering off the host, letting it rest a minute or two, and then powering it back on. On the other hand, sometimes that causes old drives to die... so, well, hard to say which is the right call.

If you can boot into FreeNAS, we need to be able to identify the failed disks and then query them to check their status using smartctl.

There is also the option to run something like solnet-array-test v2, which, if the NAS has Internet connectivity, you can get from the command line with

Code:
# fetch ftp://snarchive.sol.net/incoming/solnet-array-test-v2.sh


which is a nondestructive read test shell script designed to work on a large number of drives.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
The drives, based on a sample of one aren't that old (2019)

Its difficult to know what to do here given the scale of the issue and the scale of the issue if the data is borked or gets borked during troubleshooting

OP - if you click on storage/disks how many disks can you see vs how many are in the machine. Hopefully the numbers are the same. Also can you see the serial numbers against the disks. You may have to note each serial number against its physical location which should allow location of the "failed" disks and see if there is anything common to the fails.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
BTW - I am not familiar with the hardware. Is that just a disk shelf with SAS Expander with cabling to the server above that supplies CPU, Memory and NICs or is that the whole thing? If it is just a shelf, how many cables from the HBA to the shelf?

Until I saw that last picture I hadn't got my head around the scale of the problem. I suspect that tracing cabling is going to be difficult.
Is the ML350 Under Maintenance?
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
170
I don't think we are looking at seven failed disks. Two mirrors failing at the same time are required to arrive where we are, even if each mirror has a pre existing failure already. Having two simultaneous failures at these specific positions, on top of previous quite specific coincidence required, not very likely. I think we are looking for some common component failure.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,946
I agree but finding it is going to be a ballache though as (for example) I don't see any stickies on that box saying what serial numbers are where
 
Top