SOLVED Fear of loosing my pool

ECC

Explorer
Joined
Nov 8, 2020
Messages
65
Hello,
before I tell you my problem, I would like to give you the background story to my problem:
I had an 8x 18TB RAIDZ2 pool, which was running fine for quite some time now. The pool was filled with 80%, so I decided to upgrade my pool with another raidz2 8x18TB VDEV, to double my existing storage and double performance.

Before my upgrade, I was running those 8 drives inside a fractal case with 8 bays, connected to a HBA card. Due to the missing of another 8 bays, I decided to run those new 8 drives inside of 2x external 4bay enclosures, each with one fan in front. I extended the SATA power cables from my PSU inside the case and connected the new 8 drives to a new LSI SAS 9200-16e with 2x SFF8088 to SATA cables.

I was lazy and didn't do a full scan/smart of my new 18TB drives, so I just imported them and started to rebalance my pool (move data between datasets).

This huge file transfer operation was running for about 2 days straight, without any problems, until this afternoon: 2 of my new drives showed many write and read errors (about 50-100), the pool status was degraded.
I stopped the file transfer operation, restarted the server and tried to do a short smart test of all drives simultaneously. During the test i noticed, that at least 1 drive was re spinning over and over again, so I aborted this test.

Then i pulled each of those new 18TB HDDs, connected them to my home pc and did a short smart test and a short read performance test. Everything looked fine, none of the drives were re spinning.

Then I switched out, all SATA cables, connected to different SATA power cables, connected to a different HBA. Nevertheless, each change didn't solve the problem with re spinning disks....
On top of all, after the change of the HBA, I thought it was not re spinning every time, so i started a scrub. At first, it was running fine, but then the re spinning process began again and 1 disk didn't appear at all, so my pool was stated unavailable.

I'm very frustrated, because i don't have a full backup of my pool and much data on this pool isn't backed up.
How would you try to rescue this data? There is a lot panic now, so before I do anything stupid, I would like to ask you guys first. Thank you in advance.
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
What are the 2 bay extensions?
 

ECC

Explorer
Joined
Nov 8, 2020
Messages
65

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Well as a RAidZ pool you cannot remove the extra vdev
@Heracles - I believe the drives are attached to an LSI 9200-16e
The 1000W PSU should be enough - however how many drives are you running off a single SATA connector - are you perhaps starving the drives of power?
 

ECC

Explorer
Joined
Nov 8, 2020
Messages
65
how many drives are you running off a single SATA connector - are you perhaps starving the drives of power?
During most of the time 8 drives. After the errors, i reduced it to 4 drives. The psu sata cables connect up to 4 drives to one sata psu connector.
 

ECC

Explorer
Joined
Nov 8, 2020
Messages
65

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

Morris

Contributor
Joined
Nov 21, 2020
Messages
120
During most of the time 8 drives. After the errors, i reduced it to 4 drives. The psu sata cables connect up to 4 drives to one sata psu connector.
SATA back planes should have multiple SATA power connectors and/or mulex connectors. At least two and better 3 should be connected and not from the same cable.

Provide popper power, put all your drives back in and let the resilver complete.

Good luck!
 

MrGuvernment

Patron
Joined
Jun 15, 2017
Messages
268
What model is your PSU? Should tell the specs usually of what rails may be shared and what isnt...

I'm very frustrated, because i don't have a full backup of my pool and much data on this pool isn't backed up.

Lets hope this is not a tough lesson learned for you and it all works out fine, but either way, let this be a stark reminder, ALWAYS have backups before you start working on systems that contain data you prefer not to lose.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
So - you aren't using SATA extension cables - just normal PSU to SATA cables - in which case we can probably say that power isn't the issue.

To be honest - the problem you are describing DOES sound like power issues. But 1000W should be enough for 16 HDD's. As @MrGuvernment says what model is the PSU? Is is a Straight Power 11 or 12? Are you perhaps overloading one rail? Can you run one PSU cable to each of the external disk cages?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Refer to this regarding PSU sizing.
 

ECC

Explorer
Joined
Nov 8, 2020
Messages
65
Is is a Straight Power 11 or 12?
11
ALWAYS have backups before you start working on systems that contain data you prefer not to lose.
you're absolutly right. I was lazy…
Are you perhaps overloading one rail? Can you run one PSU cable to each of the external disk cages?


I found the solution: I had a cheapish sata power extension cable for 16 drives. Somehow it worked for 2 days and then it resulted in all the problems described above.

I replaced it and now everything seems fine. I did a scrub and a full smart test without any errors.

So hopefully this will endure, until i can make my backup. Thanks for your help
 
Top