Help with degraded pool

tenknas · Apr 5, 2022

Just recently moved all my data from a storage server I have running Windows storage pool on HW raid5.

This is from a datacenter and got a server with 24 x 10TB drives, and my config was using raidz1. Now after a few days of moving files and completed moving all the files Ive redirected all the services that need to read this data. After about less than a week, this is now my pool status.

What should I do? 2 drives are "Degraded" and another is "FAULTED". Altho the data is still intact, I just dunno what to do with this. I still have the old window server tho, so data can be moved again, but its gonna take days again to move files and redo configuration on all the other services that read the data.

Running Truenas Scale 22.02.0.1 right now.

NugentS · Apr 5, 2022

1st - Z1 for that many 10TB drives is insane. You should be using multiple vdevs of 12 wide at most in Z2 minimum. So as a minimum 2 vdevs Z2 12 wide would be a start. Resilver time will likley be horrible. Just cos you can do something doesn't make it a good idea

Look at the smart stats for sdx - can you see why its faulted? Replace that drive and hope you can resilver. However make a backup first!!!!!!!
Once (and if) sdx is replaced sucessfully you will probably need to do the same for sdf and sdd, which means you have to resilver 3 times.

BTW how full is the pool? If its fairly empty then that will make the resilver go quicker.

Question - Are you running regular SMART tests

Honestly I would go back to the working Windows Server, trash this pool, replace the three disks and the rebuild the pool in a sensible manner then recopy the data and redo the configurations

sretalla · Apr 5, 2022

Big danger there... I hope you have a backup for anything you're not wanting to lose on that pool.

You'll need to start by replacing the faulted drive, then deal with the other two, but there's some likelihood you'll have some corrupt/impacted files unable to be recovered with that situation.

tenknas · Apr 5, 2022

I might have not highlighted this quite well. This is a test for us, the data on here are all backed up on the previous server where data came from and another offsite backup. We only are moving stuff to try out Scale and probably gonna move into scale + truecommand and go clustered storage with glusterf once that goes out of beta. Data compose like 10% of our total data which our services can work without, altho kinda annoying for the users to lose these parts but not a deal breaker and minor nuisance with reverting back to the old windows server.

Im just confused how this all went south really quick. All three drives just went bad real quick. I dunno if its a scale thing or just the DC gave me 3 bad drives from the start and should have ran test on them.

Probably yeah we might just reconfigure the whole system for better redundancy.

Also, asking the question for purposes of knowing what to do in case this happens when we fully commit to truenas for everything production.

Anyway, I might have missed it on the documentation, but does "degraded" as status in the pool mean its still working but should be replaced soon? Conversely, is "faulted" means its dead already and should be removed asap as well? Seems data is still working tho, so maybe it can still be fixed by replacing them one by one?

NugentS · Apr 5, 2022

Replace one by one as you only have 1 parity (and that ain't working)
Most here say you should stress test any new drive before putting it into service.
Personally, with any HDD (I haven't figured out what to do with SSD's yet) I do the following:
1. Open a tmux / screen session with a window for each disk
2. Run a long smart test - look at the results
3. Run badblocks destructively (take a bit less than a week on a 10TB) - look at the results
4. Run a long smart test - look at the results

If the drive passes that, then it should be good.

Degraded is still working (with errors) - but has serious issues. Faulted is ejected from the array. You are now running with no parity and ZFS has no way of fixing bit rot - although that is the least of your issues at this point. Given that you also have two degraded disks as @sretalla says you may have some corruption already and resilvering that much disk space with no parity three times is mathematically quite likley to throw an error and corrupt more data. Effectively you are running Raid 0 with two dodgy disks at this point (sort of). If any disk fails you can say goodbye to the pool

sretalla · Apr 5, 2022

Perhaps if you share more details on your hardware and specifically how you have attached those 24 drives, we might be able to see something that would explain the errors.

Can you also have a look (and share with us if you like) the output from dmesg

sretalla · Apr 5, 2022

In case you were unaware, you're currently running 24 disks with the IOPS of only one... so 24 disks giving you something like 100-300 IOPS.

As soon as one disk fails, you have no more redundancy, so that's 23 disks running as RAID0 (but still the IOPS of one disk) until resilvering can finish.

Anyway, to recommend what you should be doing (without really understanding your workload at all), at least you should change to 2 x 12 disks RAIDZ2, which will mean you can lose at least one disk in each VDEV and still have redundancy during resilver.

Yes, that means you have 3 additional disks capacity lost to parity, but your chances of a 2 disk failure killing your pool goes from 100% to 0% and you double your pool's IOPS at the same time.

Etorix · Apr 5, 2022

tenknas said:
Im just confused how this all went south really quick. All three drives just went bad real quick. I dunno if its a scale thing or just the DC gave me 3 bad drives from the start and should have ran test on them.

Errors could be a bad cable. Anyway, with en excessively large vdev (10-12 is the recommended practical maximal) and an unsecure geometry, things are expected to go south VERY quickly.

tenknas said:
Anyway, I might have missed it on the documentation, but does "degraded" as status in the pool mean its still working but should be replaced soon? Conversely, is "faulted" means its dead already and should be removed asap as well? Seems data is still working tho, so maybe it can still be fixed by replacing them one by one?

Correct, except that it's more like "degraded" should be fixed asap while "faulted" should have been fixed yesterday (if you got any early warning…).

If it's just test data that is already backed up elesewhere, delete the pool, test the drives and recreate a safer pool.

tenknas · Apr 5, 2022

Thanks for the tips guys. yeah we'll actually just trash this pool and start over after DC replaces drives.

Would appreciate your help and opinion on the resetup tho.

On the initial testing on the drives. Altho I dont think we got a week to test all the disk before using them because assuming disk is bad and DC replaces bad drives means thats another week of testing new drives. Maybe there is a way to test them all quicker and we can follow also your recommendation with 2 x vdev on RaidZ2?

Initially also we didnt want to waste a lot of disks since we already have an off-site mirror of the files and, again its a test of the truenas scale system and it's feature sets and not really about redundancy as of the moment. Also we expect to move into clustered storage having 2 x replicated data, so that would use up a lot more of the usable space.

Assuming our data would span about 5 storage servers all having 24 x 10TB drives and 2 x 1TB NVMe drives as cache drives, what should be the recommended configuration of the pool? We prefer cost savings in terms of more free space rather than having bulletproof resilience in redundancy, since we already are mirroring this to an off site backup and actually planning on doing a gdrive copy of the whole data. Data is about 500TB and growing and we might be adding more storage servers as needed.

Generally this is the config of each storage server

Dual E5-2630L v2
128GB ram
24 x 10 TB Enterprise Drives
2 x 1TB NVMe (Cache Drives)
120gb SSD OS drive
10gbps nic for public networking
10gbps nic for private networking

Note: would it be better to just open a new thread for this tho? Let me know

Here's the dmesg if youre interested in checking @sretalla

Code:

[161534.286976] sd 0:0:73:0: [sdx] tag#832 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=13s
[161534.298357] sd 0:0:73:0: [sdx] tag#832 Sense Key : Medium Error [current] [descriptor]
[161534.307351] sd 0:0:73:0: [sdx] tag#832 Add. Sense: Unrecovered read error
[161534.315135] sd 0:0:73:0: [sdx] tag#832 CDB: Read(16) 88 00 00 00 00 03 0d e2 d3 40 00 00 02 00 00 00
[161534.325304] blk_update_request: critical medium error, dev sdx, sector 13117871120 op 0x0:(READ) flags 0x4700 phys_seg 28 prio class 0
[161534.339530] zio pool=tank vdev=/dev/disk/by-partuuid/f678be3b-bb5d-44d2-8ecd-04c4378b5286 error=61 type=1 offset=6714202357760 size=352256 flags=40080c80
[310738.427283] sd 0:0:53:0: [sdd] tag#195 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=13s
[310738.438717] sd 0:0:53:0: [sdd] tag#195 Sense Key : Medium Error [current] [descriptor]
[310738.447268] sd 0:0:53:0: [sdd] tag#195 Add. Sense: Unrecovered read error
[310738.454737] sd 0:0:53:0: [sdd] tag#195 CDB: Read(16) 88 00 00 00 00 03 bd eb 00 18 00 00 02 00 00 00
[310738.464842] blk_update_request: critical medium error, dev sdd, sector 16071197128 op 0x0:(READ) flags 0x4700 phys_seg 7 prio class 0
[310740.193421] zio pool=tank vdev=/dev/disk/by-partuuid/ed91247a-a624-4ae2-8ade-c7e937cf5a45 error=61 type=1 offset=8226305159168 size=311296 flags=40080c80
[379550.691796] md123: detected capacity change from 2147418624 to 0
[379550.698402] md: md123 stopped.
[379551.166287] md124: detected capacity change from 2147418624 to 0
[379551.172950] md: md124 stopped.
[379551.615351] md125: detected capacity change from 2147418624 to 0
[379551.621980] md: md125 stopped.
[379552.060557] md126: detected capacity change from 2147418624 to 0
[379552.067213] md: md126 stopped.
[379552.746509] md127: detected capacity change from 2147418624 to 0
[379552.753195] md: md127 stopped.
[379555.506865] md/raid1:md127: not clean -- starting background reconstruction
[379555.514487] md/raid1:md127: active with 2 out of 2 mirrors
[379555.520601] md127: detected capacity change from 0 to 2147418624
[379555.815657] md/raid1:md126: not clean -- starting background reconstruction
[379555.823376] md/raid1:md126: active with 2 out of 2 mirrors
[379555.829541] md126: detected capacity change from 0 to 2147418624
[379555.928954] md/raid1:md125: not clean -- starting background reconstruction
[379555.939840] md/raid1:md125: active with 2 out of 2 mirrors
[379555.948466] md125: detected capacity change from 0 to 2147418624
[379556.047103] md/raid1:md124: not clean -- starting background reconstruction
[379556.054930] md/raid1:md124: active with 2 out of 2 mirrors
[379556.061195] md124: detected capacity change from 0 to 2147418624
[379556.161253] md/raid1:md123: not clean -- starting background reconstruction
[379556.172304] md/raid1:md123: active with 2 out of 2 mirrors
[379556.181160] md123: detected capacity change from 0 to 2147418624
[379556.354483] md: resync of RAID array md127
[379556.463994] Adding 2097084k swap on /dev/mapper/md127.  Priority:-2 extents:1 across:2097084k FS
[379556.570400] md: resync of RAID array md126
[379556.671998] Adding 2097084k swap on /dev/mapper/md126.  Priority:-3 extents:1 across:2097084k FS
[379556.783309] md: resync of RAID array md125
[379556.892062] Adding 2097084k swap on /dev/mapper/md125.  Priority:-4 extents:1 across:2097084k FS
[379557.004856] md: resync of RAID array md124
[379557.104041] Adding 2097084k swap on /dev/mapper/md124.  Priority:-5 extents:1 across:2097084k FS
[379557.222903] md: resync of RAID array md123
[379557.332837] Adding 2097084k swap on /dev/mapper/md123.  Priority:-6 extents:1 across:2097084k FS
[379567.283256] md: md127: resync done.
[379567.838208] md: md125: resync done.
[379568.187649] md: md124: resync done.
[379568.257195] md: md126: resync done.
[379568.836915] md: md123: resync done.
[379576.078101] md123: detected capacity change from 2147418624 to 0
[379576.084839] md: md123 stopped.
[379576.552911] md124: detected capacity change from 2147418624 to 0
[379576.559629] md: md124 stopped.
[379577.040500] md125: detected capacity change from 2147418624 to 0
[379577.047255] md: md125 stopped.
[379577.760770] md126: detected capacity change from 2147418624 to 0
[379577.767525] md: md126 stopped.
[379578.456204] md127: detected capacity change from 2147418624 to 0
[379578.462993] md: md127 stopped.
[379581.003481] md/raid1:md127: not clean -- starting background reconstruction
[379581.011264] md/raid1:md127: active with 2 out of 2 mirrors
[379581.017489] md127: detected capacity change from 0 to 2147418624
[379581.569128] md/raid1:md126: not clean -- starting background reconstruction
[379581.576927] md/raid1:md126: active with 2 out of 2 mirrors
[379581.584635] md126: detected capacity change from 0 to 2147418624
[379581.881545] md/raid1:md125: not clean -- starting background reconstruction
[379581.889399] md/raid1:md125: active with 2 out of 2 mirrors
[379581.895818] md125: detected capacity change from 0 to 2147418624
[379582.187164] md/raid1:md124: not clean -- starting background reconstruction
[379582.195007] md/raid1:md124: active with 2 out of 2 mirrors
[379582.201243] md124: detected capacity change from 0 to 2147418624
[379582.492981] md/raid1:md123: not clean -- starting background reconstruction
[379582.501234] md/raid1:md123: active with 2 out of 2 mirrors
[379582.508645] md123: detected capacity change from 0 to 2147418624
[379582.880752] md: resync of RAID array md127
[379583.028642] Adding 2097084k swap on /dev/mapper/md127.  Priority:-2 extents:1 across:2097084k FS
[379583.138962] md: resync of RAID array md126
[379583.224673] Adding 2097084k swap on /dev/mapper/md126.  Priority:-3 extents:1 across:2097084k FS
[379583.338454] md: resync of RAID array md125
[379583.448670] Adding 2097084k swap on /dev/mapper/md125.  Priority:-4 extents:1 across:2097084k FS
[379583.702060] md: resync of RAID array md124
[379583.812667] Adding 2097084k swap on /dev/mapper/md124.  Priority:-5 extents:1 across:2097084k FS
[379584.033285] md: resync of RAID array md123
[379584.136679] Adding 2097084k swap on /dev/mapper/md123.  Priority:-6 extents:1 across:2097084k FS
[379593.724045] md: md127: resync done.
[379593.731681] md: md126: resync done.
[379594.255358] md: md125: resync done.
[379594.523045] md: md124: resync done.
[379594.880755] md: md123: resync done.
[379611.734348] md123: detected capacity change from 2147418624 to 0
[379611.741121] md: md123 stopped.
[379612.216098] md124: detected capacity change from 2147418624 to 0
[379612.222972] md: md124 stopped.
[379612.683063] md125: detected capacity change from 2147418624 to 0
[379612.689916] md: md125 stopped.
[379613.164807] md126: detected capacity change from 2147418624 to 0
[379613.171706] md: md126 stopped.
[379613.866358] md127: detected capacity change from 2147418624 to 0
[379613.873218] md: md127 stopped.
[379616.899465] md/raid1:md127: not clean -- starting background reconstruction
[379616.907292] md/raid1:md127: active with 2 out of 2 mirrors
[379616.913619] md127: detected capacity change from 0 to 2147418624
[379616.984532] md/raid1:md126: not clean -- starting background reconstruction
[379616.995462] md/raid1:md126: active with 2 out of 2 mirrors
[379617.004114] md126: detected capacity change from 0 to 2147418624
[379617.108184] md/raid1:md125: not clean -- starting background reconstruction
[379617.115992] md/raid1:md125: active with 2 out of 2 mirrors
[379617.122264] md125: detected capacity change from 0 to 2147418624
[379617.227256] md/raid1:md124: not clean -- starting background reconstruction
[379617.238289] md/raid1:md124: active with 2 out of 2 mirrors
[379617.247091] md124: detected capacity change from 0 to 2147418624
[379617.317660] md/raid1:md123: not clean -- starting background reconstruction
[379617.328742] md/raid1:md123: active with 2 out of 2 mirrors
[379617.337601] md123: detected capacity change from 0 to 2147418624
[379617.513388] md: resync of RAID array md127
[379617.621509] Adding 2097084k swap on /dev/mapper/md127.  Priority:-2 extents:1 across:2097084k FS
[379617.737435] md: resync of RAID array md126
[379617.849526] Adding 2097084k swap on /dev/mapper/md126.  Priority:-3 extents:1 across:2097084k FS
[379617.961397] md: resync of RAID array md125
[379618.109536] Adding 2097084k swap on /dev/mapper/md125.  Priority:-4 extents:1 across:2097084k FS
[379618.229319] md: resync of RAID array md124
[379618.405551] Adding 2097084k swap on /dev/mapper/md124.  Priority:-5 extents:1 across:2097084k FS
[379618.529836] md: resync of RAID array md123
[379618.621534] Adding 2097084k swap on /dev/mapper/md123.  Priority:-6 extents:1 across:2097084k FS
[379628.363900] md: md126: resync done.
[379628.458096] md: md127: resync done.
[379628.823365] md: md124: resync done.
[379628.913891] md: md125: resync done.
[379629.423250] md: md123: resync done.
[379655.362652] md123: detected capacity change from 2147418624 to 0
[379655.369508] md: md123 stopped.
[379655.865787] md124: detected capacity change from 2147418624 to 0
[379655.872650] md: md124 stopped.
[379656.343379] md125: detected capacity change from 2147418624 to 0
[379656.350238] md: md125 stopped.
[379656.818719] md126: detected capacity change from 2147418624 to 0
[379656.825643] md: md126 stopped.
[379657.488865] md127: detected capacity change from 2147418624 to 0
[379657.495632] md: md127 stopped.
[380144.372437] md/raid1:md127: not clean -- starting background reconstruction
[380144.380082] md/raid1:md127: active with 2 out of 2 mirrors
[380144.386322] md127: detected capacity change from 0 to 2147418624
[380144.934894] md/raid1:md126: not clean -- starting background reconstruction
[380144.942536] md/raid1:md126: active with 2 out of 2 mirrors
[380144.948599] md126: detected capacity change from 0 to 2147418624
[380145.114607] md/raid1:md125: not clean -- starting background reconstruction
[380145.122216] md/raid1:md125: active with 2 out of 2 mirrors
[380145.128304] md125: detected capacity change from 0 to 2147418624
[380145.420385] md/raid1:md124: not clean -- starting background reconstruction
[380145.428060] md/raid1:md124: active with 2 out of 2 mirrors
[380145.434259] md124: detected capacity change from 0 to 2147418624
[380145.728300] md/raid1:md123: not clean -- starting background reconstruction
[380145.735947] md/raid1:md123: active with 2 out of 2 mirrors
[380145.742202] md123: detected capacity change from 0 to 2147418624
[380146.130490] md: resync of RAID array md127
[380146.278346] Adding 2097084k swap on /dev/mapper/md127.  Priority:-2 extents:1 across:2097084k FS
[380146.412501] md: resync of RAID array md126
[380146.770372] Adding 2097084k swap on /dev/mapper/md126.  Priority:-3 extents:1 across:2097084k FS
[380146.886622] md: resync of RAID array md125
[380146.998374] Adding 2097084k swap on /dev/mapper/md125.  Priority:-4 extents:1 across:2097084k FS
[380147.107160] md: resync of RAID array md124
[380147.210401] Adding 2097084k swap on /dev/mapper/md124.  Priority:-5 extents:1 across:2097084k FS
[380147.349351] md: resync of RAID array md123
[380147.462389] Adding 2097084k swap on /dev/mapper/md123.  Priority:-6 extents:1 across:2097084k FS
[380165.171607] md: md127: resync done.
[380165.182884] md: md125: resync done.
[380165.976358] md: md123: resync done.
[380167.088499] md: md124: resync done.
[380167.556502] md: md126: resync done.
[385592.175996] sd 0:0:55:0: [sdf] tag#867 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=13s
[385592.188049] sd 0:0:55:0: [sdf] tag#867 Sense Key : Medium Error [current]
[385592.196320] sd 0:0:55:0: [sdf] tag#867 Add. Sense: Unrecovered read error
[385592.204449] sd 0:0:55:0: [sdf] tag#867 CDB: Read(16) 88 00 00 00 00 00 49 1d d5 c8 00 00 02 00 00 00
[385592.214583] blk_update_request: critical medium error, dev sdf, sector 1226692416 op 0x0:(READ) flags 0x4700 phys_seg 12 prio class 0
[385594.517609] zio pool=tank vdev=/dev/disk/by-partuuid/5ffab4f3-30ad-4cae-892a-833043636ec7 error=61 type=1 offset=625918251008 size=1044480 flags=40080cb0
[388300.309894] md123: detected capacity change from 2147418624 to 0
[388300.316461] md: md123 stopped.
[388300.962261] md124: detected capacity change from 2147418624 to 0
[388300.968830] md: md124 stopped.
[388301.729484] md125: detected capacity change from 2147418624 to 0
[388301.736155] md: md125 stopped.
[388302.349097] md126: detected capacity change from 2147418624 to 0
[388302.355722] md: md126 stopped.
[388303.118015] md127: detected capacity change from 2147418624 to 0
[388303.124647] md: md127 stopped.
[388309.875477] md/raid1:md127: not clean -- starting background reconstruction
[388309.882971] md/raid1:md127: active with 2 out of 2 mirrors
[388309.889132] md127: detected capacity change from 0 to 2147418624
[388310.437294] md/raid1:md126: not clean -- starting background reconstruction
[388310.444866] md/raid1:md126: active with 2 out of 2 mirrors
[388310.450934] md126: detected capacity change from 0 to 2147418624
[388310.741720] md/raid1:md125: not clean -- starting background reconstruction
[388310.749275] md/raid1:md125: active with 2 out of 2 mirrors
[388310.755326] md125: detected capacity change from 0 to 2147418624
[388310.918135] md/raid1:md124: not clean -- starting background reconstruction
[388310.925707] md/raid1:md124: active with 2 out of 2 mirrors
[388310.931798] md124: detected capacity change from 0 to 2147418624
[388311.226759] md/raid1:md123: not clean -- starting background reconstruction
[388311.234360] md/raid1:md123: active with 2 out of 2 mirrors
[388311.240487] md123: detected capacity change from 0 to 2147418624
[388311.637219] md: resync of RAID array md127
[388311.753768] Adding 2097084k swap on /dev/mapper/md127.  Priority:-2 extents:1 across:2097084k FS
[388311.886121] md: resync of RAID array md126
[388312.093673] Adding 2097084k swap on /dev/mapper/md126.  Priority:-3 extents:1 across:2097084k FS
[388312.246239] md: resync of RAID array md125
[388312.385662] Adding 2097084k swap on /dev/mapper/md125.  Priority:-4 extents:1 across:2097084k FS
[388312.524179] md: resync of RAID array md124
[388312.689688] Adding 2097084k swap on /dev/mapper/md124.  Priority:-5 extents:1 across:2097084k FS
[388312.816937] md: resync of RAID array md123
[388312.921718] Adding 2097084k swap on /dev/mapper/md123.  Priority:-6 extents:1 across:2097084k FS
[388327.882393] md: md124: resync done.
[388327.995602] md: md127: resync done.
[388328.854257] md: md126: resync done.
[388329.324601] md: md125: resync done.
[388329.695389] md: md123: resync done.

sretalla · Apr 5, 2022

OK, that output is concerning on a number of levels... a little more detail on the hardware would be good, particularly about how the disks are connected (what RAID/HBA card(s) are you using? what power supply?)

tenknas · Apr 5, 2022

sretalla said:
OK, that output is concerning on a number of levels... a little more detail on the hardware would be good, particularly about how the disks are connected (what RAID/HBA card(s) are you using? what power supply?)

I dont have info on the power supply. Here's the raid card tho

Code:

lspci | grep -i raid
03:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)

sretalla · Apr 5, 2022

Is that all you have?

I'm not sure if it's possible for that model, but if you want to use it properly with ZFS, you need to get it into the IT firmware (or change it for a proper HBA that can use IT firmware):

What's all the noise about HBA's, and why can't I use a RAID controller?

This is relevant to FreeNAS and TrueNAS CORE. Some parts of it might also be relevant to Scale, but I don't really know how reliable the Linux drivers are. 1) An HBA is a Host Bus Adapter. This is a controller that allows SAS and SATA devices to be attached to, and communicate directly with...

www.truenas.com

NugentS · Apr 5, 2022

As @sretalla is about to tell you that is NOT a reccomended HBA Card for ZFS.

edit: he got there first

sretalla · Apr 5, 2022

The reason I ask on the power supply is you seem to be dropping disks... which may imply brownouts (an insufficiently strong power source).

NugentS · Apr 5, 2022

Can we please have a full hardware spec please - it seems that you might have some serious hardware system design issues causing these problems.

As for pool layout - the wider the vdev the longer the resilver time in the event of an issue. Thus mirrors resilver very quickly (although they "cost" 50% of your available capacity.)

Without knowing your full use case / design criteria I would be doing a of minimum 2 vdevs, 12 wide with Z2. Another option is 4 vdevs, 6 wide with Z2 - but depending on use case thats a lot of resiliency. A mximum width of 12 in a vdev is considered sensible.

Question - what are these cache drives - and what do you think they do? "Cache" in ZFS is a much more nuanced scenario than with some other filing systems / OS's. Some cache functions do not work in some use cases or may even slow things down - although your hardware seems sufficient

Question - What is your use case? I am assuming SMB shares as you have implied - but are these used for anything else?

tenknas · Apr 5, 2022

So do I have to ask to replace the card for something else or can I just do some configuration on it at bios level?

I'll inquire what the exact details on the PSU. But here's an initial look into it, dunno how accurate tho

Code:

sudo dmidecode --type 39
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.

Handle 0x0091, DMI type 39, 22 bytes
System Power Supply
        Power Unit Group: 1
        Location: To Be Filled By O.E.M.
        Name: To Be Filled By O.E.M.
        Manufacturer: To Be Filled By O.E.M.
        Serial Number: To Be Filled By O.E.M.
        Asset Tag: To Be Filled By O.E.M.
        Model Part Number: To Be Filled By O.E.M.
        Revision: To Be Filled By O.E.M.
        Max Power Capacity: Unknown
        Status: Not Present
        Type: Switching
        Input Voltage Range Switching: Auto-switch
        Plugged: Yes
        Hot Replaceable: No
        Input Voltage Probe Handle: 0x008D
        Cooling Device Handle: 0x008F
        Input Current Probe Handle: 0x0090

Handle 0x0092, DMI type 39, 22 bytes
System Power Supply
        Power Unit Group: 2
        Location: To Be Filled By O.E.M.
        Name: PWS-741P-1R
        Manufacturer: SUPERMICRO
        Serial Number: P741PCC47WN0191
        Asset Tag: N/A
        Model Part Number: PWS-741P-1R
        Revision: 1.1
        Max Power Capacity: 740 W
        Status: Present, OK
        Type: Switching
        Input Voltage Range Switching: Auto-switch
        Plugged: Yes
        Hot Replaceable: No

NugentS · Apr 5, 2022

My reading of that is that you have a single 740W PSU and not a dual - but I am unfamiliar with the reporting method so I could be way out. I think for 24 HDD's I would prefer to see a dual PSU of approx the same rating (each) or perhaps even a bit more.

I think its a replace job - I often use the following link for decisions about HBA's

Top Picks for FreeNAS HBAs (Host Bus Adapters)

We have our top picks for getting fast and reliable FreeNAS HBAs (host bus adapters) for SAS and SATA, using proven options for FreeNAS and ZFS

www.servethehome.com

But the @sretalla link to an on forum resource is just as relavent

I am guessing that you have a SAS expander backplane running from a single 8 Lane LSI Card. In which case I would be looking for a LSI SAS 93xx or 92xx (the 93xx is better, but more expensive, but better in the long term). Flash the card to IT mode with the right firmware and proceed from there.

tenknas · Apr 5, 2022

NugentS said:
My reading of that is that you have a single 740W PSU and not a dual - but I am unfamiliar with the reporting method so I could be way out. I think for 24 HDD's I would prefer to see a dual PSU of approx the same rating or perhaps even a bit more.

I think its a replace job - I often use the following link for decisions about HBA's

Top Picks for FreeNAS HBAs (Host Bus Adapters)

We have our top picks for getting fast and reliable FreeNAS HBAs (host bus adapters) for SAS and SATA, using proven options for FreeNAS and ZFS

www.servethehome.com

I am guessing that you have a SAS expander backplane running from a single 8 Lane LSI Card. In which case I would be looking for a LSI SAS 93xx or 92xx (the 93xx is better, but more expensive, but better in the long term). Flash the card to IT mode with the right firmware and proceed from there.

Ill see what the data center can do about the PSU and the card.

Regards the LSI Card tho, since we're just renting Im not sure its possible to get the parts unless I send them one myself, I'll ask.

Important Announcement for the TrueNAS Community.

Help with degraded pool

Dabbler

MVP

Powered by Neutrality

Dabbler

MVP

Powered by Neutrality

Powered by Neutrality

Wizard

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

MVP

Powered by Neutrality

MVP

Dabbler

MVP

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Help with degraded pool"

Similar threads