Drive Recognition Issue? - Seagate Exos 2x14

Hazer

Cadet
Joined
Jan 11, 2024
Messages
9
Environment: TrueNAS-SCALE-23.10.1
HW Config:
AMD EPYC 7551P CPU
Supermicro H11SSL-I Motherboard (348 GB RAM slotted, 1x 4Tb TeamData NVME in M.2)
Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
Supermicro NVME Carrier board (4x TeamData 1Tb NVME Drives)


I've ordered a 45Drives HL15 case, planned on populating with 12x Seagate Exos 2x14 Mach.2 drives. I have the hardware and the drives, figured I'd test and verify the drives as they are Ebay purchased. Bought a SFF-8643 to 4x SFF-8482 with power breakout cable. Expected to test 4 drives at a time. Connected the first 4 drives and I'm running into some oddities that I didn't expect.

I expected to see 8 new drives in the system, sda through sdh. In the GUI, after several restarts trying to troubleshoot things, the best I see is below:
1705034262821.png


It appears to show the "bottom" half of each drive only. When I check lsblk I do see all the drives as I would expect.
1705034393556.png


I was able via the CLI to build them into a RAIDZ2 pool, then export from the CLI so that I could Import in the GUI... and as shown below the results are a bit odd with respect to the members of this pool...
1705034534226.png


Have I missed something or misconfigured something that I'm not seeing? I *REALLY* do not want to have to build and manage the array from the CLI and only be able to effectively be able to SMART Test half the spindles in the array... I've tried searching for various solutions about the GUI not showing all the drives expected and most of them seem to trace back to questionable HW choices (RAID vs HBA or USB Enclosures usually) but as far as I can tell there should be no reason that the GUI is not picking up these drives as expected unless there's something funky with the breakout cable but then why is the OS actually seeing the drives and ZFS happy with an array of them? I even ran a test writing out 1Tb of data across that array (honestly was mostly wanting to test speed) without any issues....
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
Did you try doing a sudo wipefs --all [drive path] on the new (used) disks? If not why don't you do that to each one of them and then see if they show up in the GUI properly. There are things that can be left on used drives that can cause issues with reuse especially when buying from ebay. Use lsblk to make sure you have the correct drive as you don't want to accidentially wipe any existing boot or existing data drives. The Quick Wipe in Truenas does not remove partition info from a drive or the array info if the drive was a member of a previous raid array. I find it good practice to do this to all drives I buy used to make sure any array superblock, info, or partitions, are wiped leaving a clean slate for Truenas.
 

Hazer

Cadet
Joined
Jan 11, 2024
Messages
9
After export/destroying the pool from the GUI I ran wipefs for sda through sdh (with an accidental attempt on sdv due to typo), there was no return from the commands so I'm assuming that's an exit code 0 and everything was good... I rebooted the server to see if it would pick the drives up. Unfortunately there is no change... the top spindle for each drive still does not show up in the GUI while the bottom spindle for each drive is present.

I did attempt to add the 4tb NVME that was unassigned in into the pool as a cache device via the GUI last night while testing things and the GUI didn't have any issues manipulating the pool, it's just that I have no ability to interact with the top half of the drives via the GUI.... :|
 

Hazer

Cadet
Joined
Jan 11, 2024
Messages
9
Looking at data for the disks in the GUI.... does anyone know if the Serial Number is being used as an identifier in any way? The top and bottom spindles for these disks would have the same SN which would kind of explain why only the bottom half shows up.... "He who writes last writes best"...
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
Truenas uses the serial number as the identifier for the disks, and each disk has to have a unique serial number to be identified properly. You are going to have to provide more detailed info on the specific drive type. The Seagate Exos 2x14 Mach.2 drives are dual spindle drives. On the drives you bought SAS or SATA? They present differently to systems depending upon type of drive. From the Seagate drives FAQ
For the SAS configuration, each actuator is assigned to a logical unit number (LUN 0 and LUN 1). For example, one 18TB SAS drive will present itself to the operating system as two 9TB devices that the operating system can address independently, as it would with any other HDD.
In your case your 14TB drives would present as 2x7TB drives if they are SAS drives. The x2 SAS drives are not compatible with any SAS failover setups.

SATA configuration will present itself to the operating system as one logical device since SATA does not support the concept of LUNs. The user must be aware that the first 50% of the logical block addresses (LBAs) on the device correspond to one actuator and the second 50% of the LBAs correspond to the other actuator.

In Linux Kernel 5.19, the independent ranges can be found in /sys/block/<device>/ queue/independent_access_range. There is one sub-directory per actuator, starting with the primary at “0.” The “nr_sectors” field reports how many sectors are managed by this actuator, and “sector” is the offset for the first sector. Sectors then run contiguously to the start of the next actuator’s range

The dual spindle drives especially in SATA type also just may not be compatible with Truenas Core or Scale at this point; especially since Truenas does not like hardware tricks or multiple drives with the same serial number. I have seen where other TN users have tried dual spindle drives and Truenas only sees half of each drive.

You may have to manually set the sector offsets to be able to use the drives as mentioned in the quote above if the LUNs are not reconized or they are SATA drives. A true SAS backplane with SAS drives should be able see the drive LUNs then you should be able to setup each LUN in the vdevs as desired. I have no experience with dual spindle drives so I can't really help any further.
 

Hazer

Cadet
Joined
Jan 11, 2024
Messages
9
Truenas uses the serial number as the identifier for the disks, and each disk has to have a unique serial number to be identified properly. You are going to have to provide more detailed info on the specific drive type. The Seagate Exos 2x14 Mach.2 drives are dual spindle drives. On the drives you bought SAS or SATA? They present differently to systems depending upon type of drive. From the Seagate drives FAQ

In your case your 14TB drives would present as 2x7TB drives if they are SAS drives. The x2 SAS drives are not compatible with any SAS failover setups.

Yeah, these are the SAS drives. Everything I'm seeing now does make sense if the disks are using the SN as part of the unique identification via the GUI for each 'drive'... It explains why the bottom half of each is visible given that it's going to query in sequence and the bottom spindle would clobber the top spindle data...

The dual spindle drives especially in SATA type also just may not be compatible with Truenas Core or Scale at this point; especially since Truenas does not like hardware tricks or multiple drives with the same serial number. I have seen where other TN users have tried dual spindle drives and Truenas only sees half of each drive.

You may have to manually set the sector offsets to be able to use the drives as mentioned in the quote above if the LUNs are not reconized or they are SATA drives. A true SAS backplane with SAS drives should be able see the drive LUNs then you should be able to setup each LUN in the vdevs as desired. I have no experience with dual spindle drives so I can't really help any further.

The OS sees the LUNs it looks like it's just down to how the GUI tracks these things... a little disappointing but an understandable problem/edge case in this scenario. Darn. I was kinda wanting to see TN report back 24 available disks when I got the HL15 in and did final assembly... Looks like I'll have to manually provision the zpool and figure out an os-level smart monitor....

I will go ahead and submit a PFR about the data collision on the GUI... It would be nice if it worked with the drives, but it does seem like a pretty fringe failure in the grand scheme of things.
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
Viewing some info. The drives are usually only used in large cloud data centers or with very specific applications and on hardware designed for them. Only a few Broadcom HBA cards of 9600, 9500, 9400 Series Host Bus Adapters according to Seagate fully support the drives and they have to be on the most recent firmware version. The 9300 cards are supposed to be compatible (whatever that means) and RAID is not currently supported. RAID-in-HBA mode is to be supported in 9600 generation cards only. I don't think you will get the drives to work properly or at all with any other cards, interface devices, or onboard RAID controllers

Other manufactures with supported cards are Microchip and ATTO See This document

If you do happen to get something to work in Truenas Scale then each spindle of each drive (each LUN) also has to be on a different vdev to avoid conflicts.
 

Hazer

Cadet
Joined
Jan 11, 2024
Messages
9
If you do happen to get something to work in Truenas Scale then each spindle of each drive (each LUN) also has to be on a different vdev to avoid conflicts.
Why would they need to be separated? That would cause replacement issues if I'm understanding how this is supposed to function. I created the below config with my test rig:

Code:
  
pool: tank
state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0


Each raidz2 chunk is 2 Drives, the top and bottom of each. If I lose one of the sdx (which are individual spindles) then I can start a resilver to a new Mach.2 drive, then remove the Band Disk (which removes the bad spindle AND the good one next to it) and then resilver again to the new disk.... Bit of a pain, but I get redundancy and with more spindles achieve better throughput....

I'm still waffling a bit on if it's going to be 3 vdev by 4 Drives (8 disks) or 4 vdev by 3 Drives (6 disks) for the final configuration.... I'm leaning to the 4x3 idea... I'd 'lose' 25% of my overall storage capacity to redundancy but the smaller vdevs would resilver faster and I'd effectively be able to lose up to 4 drives in the array (with a little luck in location) without losing data...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Only a few Broadcom HBA cards of 9600, 9500, 9400 Series Host Bus Adapters according to Seagate fully support the drives and they have to be on the most recent firmware version. The 9300 cards are supposed to be compatible (whatever that means) and RAID is not currently supported. RAID-in-HBA mode is to be supported in 9600 generation cards only
RAID aside, they should really not pose any trouble to an HBA. SCSI devices have been doing this sort of thing for almost four decades now, so the software support is all still there.

The only difficult technical hurdle is the TrueNAS really likes to track serial numbers to do its thing, and the disks report the same serial number on both LUNs. General availability of these things seems to have outpaced any work iX may have done to support them - frankly, I'm somewhat surprised, because several people have showed up with these things recently, despite them being announced as a specialty thing for specific customers, at least for now.

Why would they need to be separated?
Well, if you have the same disk serving both LUNs to a single vdev, that's two pieces of the stripe per disk, instead of one. That breaks the reliability model - disk fails, you lose two pieces instead of one. It's also wasteful in terms of IOPS, because the whole point is that these disks can do double the IOPS, but if the two LUNs are in the same vdev, they are effectively synchronized and thus the whole disk cannot seek two things independently.
 

Hazer

Cadet
Joined
Jan 11, 2024
Messages
9
Well, if you have the same disk serving both LUNs to a single vdev, that's two pieces of the stripe per disk, instead of one. That breaks the reliability model - disk fails, you lose two pieces instead of one. It's also wasteful in terms of IOPS, because the whole point is that these disks can do double the IOPS, but if the two LUNs are in the same vdev, they are effectively synchronized and thus the whole disk cannot seek two things independently.

That does make sense... essentially what I said about the "replace, resilver, remove, resilver" still happens it s just the resizvers are to 2 different vdevs instead of the same vdev... The downside I could point to in this case would be if you lose the Disk instead of just a spindle then you've degraded 2 vdevs in the pool at the same time.... Guess the question is which is more likely, spindle motor/head failure vs drive motor/electronics failure?

This will make mapping out the spindle assignment a bit more interesting if you're doing more than 2 vdevs in a pool like I'm looking at doing... Probably should have paid more attention when the infra guys at work were complaining about the wiring of the netapp shelves cause I get the feeling my vdev wrapping is gonna look similar... heheh
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
downside I could point to in this case would be if you lose the Disk instead of just a spindle then you've degraded 2 vdevs in the pool at the same time.... Guess the question is which is more likely, spindle motor/head failure vs drive motor/electronics failure?
That's the sort of micromanagement I avoid at all costs - it's nearly impossible to quantify and ultimately has no impact. Even more so in this case, because you have to replace a whole disk.
The standard mitigation options apply:
  • Replace the disk without first removing it, to not actually lose redundancy (obviously does not apply to dead disks)
  • Move to a higher RAIDZ levei (on a properly-maintained server, it would take a crazy unreliable set of disks to bring down a RAID 3 vdev)
  • Reduce vulnerability windows with hot spares, if RAIDZ3 isn't good enough or you have multiple vdevs to serve with a single spare
  • Always keep cold spares on hand
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
RAID aside, they should really not pose any trouble to an HBA. SCSI devices have been doing this sort of thing for almost four decades now, so the software support is all still there
That came from Seagate. See This document Maybe that is old news now and things have changed. I gathered from their own documentation that there is massive trickery going on in the drive.

I also I see and agree to no advantage in most applications to use dual spindle drives and I see a lot of real disadvantages, complexity, possible configuration issues, easier loss of data, and so on. Kind of sounds like an engineering intern was given a summer project to accomplish and someone in marketing thought it was a great idea. These drives have been commented on in these community forums in the past but I did not see results from any actual use.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Properly implemented, dual actuator drives could give a performance boost. For example, if the LBAs were interleaved such that contiguous reads or writes hit both actuators in sequence, perhaps on cylinder boundary, that can allow the drive to seek in advance on the next actuator and prevent the next cylinder seek delays.

Of course, the dual actuator roll over points would be mostly hidden from the OS, (aka Drive Managed-Dual Actuator). But, that is more or less what the SATA version does now. You have to "guess" based on 1/2 of the available storage and assume it is correct.

But, hey, no one asked me, The Performance Expert, to implement dual actuators :smile:.
 

Hazer

Cadet
Joined
Jan 11, 2024
Messages
9
It seems like I have some work to do once my case arrives and I can build out the server before I can migrate my data over to it and retire the old box.... And all because I was wanting to saturate a 10gbit link with spinning rust... hehe
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
came from Seagate. See This document Maybe that is old news now and things have changed. I gathered from their own documentation that there is massive trickery going on in the drive.
Hmm... I think that's because they lumped in the SATA drives, which would indeed be terrible and incompatible. Several people have showed up with the SAS models and easily gotten as far as having the OS recognize everything - the only issue seems to be the middleware.
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
@Arwen said
Properly implemented, dual actuator drives could give a performance boost. For example, if the LBAs were interleaved such that contiguous reads or writes hit both actuators in sequence, perhaps on cylinder boundary, that can allow the drive to seek in advance on the next actuator and prevent the next cylinder seek delays.

I don't think they don't work that way from what I'm reading about them. The drive has 2 spindles and the blocks are contiguous, with roughly 0-49% is LUN 0 and 50-100 is the second 50% as LUN 1 Each LUN is considered a separate drive by the electronics and uses a separate actuator. A 14TB drive is thus seen as 2 7TB drives not as one 14TB drive. Think of it as 2 physical drives within one drive frame. Though the blocks are contiguous, the blocks are only contiguous within each LUN and there is no crossover between LUNs as that would be crossing drive boundaries onto another independent actuator. That would defeat the purpose having 2 spindles. The performance gain is that both 7TB LUNs can be accessed at or nearly at the same time. So while a write/read is happening to blocks in LUN 0 a different unrelated read/write can be happening to blocks in LUN1 increasing overall performance. The tradeoff is pretty much everything already listed. That's okay where your a cloud or maybe a database provider and you have designed systems around the drives, but most others won't see the gains over the trouble and risk of keeping it running. It would be simpler, easier, more data secure and maybe cheaper in the long run to use SSD drives for performance gains

In Linux Kernel 5.19, the independent ranges can be found in /sys/block/<device>/ queue/independent_access_range. There is one sub-directory per actuator, starting with the primary at “0.” The “nr_sectors” field reports how many sectors are managed by this actuator, and “sector” is the offset for the first sector. Sectors then run contiguously to the start of the next actuator’s range.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@PhilD13 - I doubt their are 2 spindles. Just one spinning spindle, with a stack of platters. Now if you mean a 2 separate spindles for ACTUATORS that move heads, yes, their are 2.

But, yes, allowing separate access via different LUNs can improve performance in a different way.

It all depends on what is desired. If they show up as 2 LUNs and people use them improperly, (like same RAID-Zx vDev or Mirror set), then it's a recipe for disaster.


Some time back, we had a person want redundancy AND high degree of space efficiency out of a single or few disks. So that person partitioned a disk, (or disks), into several partitions. Then used those partitions to make a RAID-Zx vDev. If they had used "copies=2" on a striped pool, space efficiency would have been less than 50%. But, with the partitions and RAID-Zx vDev they got higher than 80%. Of course, loss of a disk was catastrophic. Yet, loss of a block was recoverable even with a single disk.

The dual actuator disks introduce similar issues because if you use both halves in a single 2 way Mirror vDev, you get block redundancy. But not disk redundancy.
 

Hazer

Cadet
Joined
Jan 11, 2024
Messages
9
Properly implemented, dual actuator drives could give a performance boost. For example, if the LBAs were interleaved such that contiguous reads or writes hit both actuators in sequence, perhaps on cylinder boundary, that can allow the drive to seek in advance on the next actuator and prevent the next cylinder seek delays.

Of course, the dual actuator roll over points would be mostly hidden from the OS, (aka Drive Managed-Dual Actuator). But, that is more or less what the SATA version does now. You have to "guess" based on 1/2 of the available storage and assume it is correct.

But, hey, no one asked me, The Performance Expert, to implement dual actuators :smile:.

So speaking of performance, now that I have the HL15 in and built things out.... Here's some data.. ;)

First one is a NVME Stripe SSDs to the HDs.... Second one is HDs to SSD along with the pool layout.... There was no size difference between 3 vdev and 4 so I opted for 4 vdev....
Code:
root@mikoshi:/mnt/Storage/tmp# dd if=/mnt/Apps/testingdata.out of=./testing bs=256k
968842+1 records in
968842+1 records out
253976125440 bytes (254 GB, 237 GiB) copied, 194.523 s, 1.3 GB/s
root@mikoshi:/mnt/Storage/tmp# dd if=./testdata.tar of=/mnt/Apps/testing.out bs=256k
968842+1 records in
968842+1 records out
253976125440 bytes (254 GB, 237 GiB) copied, 249.131 s, 1.0 GB/s
root@mikoshi:/mnt/Storage/tmp#

  pool: Storage
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        Storage     ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdo     ONLINE       0     0     0
            sdm     ONLINE       0     0     0
            sdq     ONLINE       0     0     0
            sdu     ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sds     ONLINE       0     0     0
            sdx     ONLINE       0     0     0
          raidz2-2  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdl     ONLINE       0     0     0
            sdr     ONLINE       0     0     0
            sdw     ONLINE       0     0     0
          raidz2-3  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdk     ONLINE       0     0     0
            sdn     ONLINE       0     0     0
            sdp     ONLINE       0     0     0
            sdt     ONLINE       0     0     0
            sdv     ONLINE       0     0     0
 
Top