How to find real serial numbers on MDD SAS drives

fredbourdelier

Dabbler
Joined
Sep 11, 2022
Messages
27
I've got a DELL R720/PERCH330 flashed to IT mode running on SCALE 22.12.3, supporting 12 10TB MDD Enterprise SAS 12G/s 7200 model MDD10TSAS25672E on RAID Z3/single pool. Separate boot pool on rear drive bays 2@DELL SAS 1.2TB 10k in redundancy.

One of the drives on the Z3 pool logged 78 errors, so I (being a newbie) assumed that the drives were numbered sequentially in TNAS and proceeded to replace the wrong unit. Fortunately it is a Z3 RAID, so it calmly proceeded to resilver the new drive and not lose any data (score 1 for ZFS). I turned the system off and took pictures of all the drives' SNs. Now, I'm trying to identify the drive by serial number, and finding that every attempt to elicit a serial number from TNAS doesn't match the SN printed on the drive.

I bought 13 drives from MDD directly, and all at the same time, so serial numbers look something like QV1ZDLR3, QV1ZDLR4, etc. TNAS comes up with a serial number 0000EVL70000C0422424 for sdf, which isn't mapping to the drive SN at all. If I look at the next drive (sdg) its SN is 00004QEE0000C021P2UM, then 00000B8Q0000C019750B on the next one, and so on. If the SNs mapped to any form of reality, they should also be sequential, just like the actual drives.

The drive model number resembles a Seagate part number, and looking at Seagate drives from the same vintage, the SN format matches the MDD units - not the TNAS report. Unlike the Seagate units, the MDD don't have QR codes, just a standard barcode with a serial number.
1709499778336.png

Seagate
1709500585722.png

MDD

Disk Info from GUI:
Disk Size: 9.1 TiB
Transfer Mode: Auto
Serial: 0000EVL70000C0422424

Model: OOS10000G
Rotation Rate: 7200 RPM
Type: HDD
HDD Standby: 60
Description:N/A
(I ran SMART tests on the drive, I can't get it to fail on cue):
Completed S.M.A.R.T. Tests: 2
Last Long Test: SUCCESS


Result from SmartCTL:
root@ZZZ-SITE-F-1[~]# smartctl -a /dev/sdf | grep -i serial
Serial number: 0000EVL70000C0422424

Result of ZPOOL:
root@ZZZ-SITE-F-1[~]# zpool status
pool: POOL_MAIN
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: resilvered 51.0G in 00:08:03 with 0 errors on Sun Mar 3 08:11:55 2024
config:

NAME STATE READ WRITE CKSUM
POOL_MAIN ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
a1629d7d-e14e-497f-9b62-46cc28545aac ONLINE 0 0 0
d9b2fb63-118d-4c51-9db1-16a2b65eccef ONLINE 0 0 0
0750d55d-bab9-432c-b794-8bbdc5d14092 ONLINE 0 0 0
aa181e19-ad87-4a89-bae5-74dc62b41c5f ONLINE 0 0 0
2db70b84-d4f8-4120-8f87-1142c0f6ecce ONLINE 0 0 0
7bd5fab7-74aa-42d8-90c7-795e141a9c8d ONLINE 0 0 1
0663faa0-92ad-4438-8a50-756a78bd9d9a ONLINE 0 0 0
b26ca909-5568-4cf8-a95a-c54189d564e8 ONLINE 0 0 0
83dfad59-d2c5-4614-a71a-8219ced0b44d ONLINE 0 0 0
b5bb52fd-dcef-4c36-9058-17af72d72546 ONLINE 0 0 0
d29319dd-6242-4ee5-9c2a-de9476bfa1d3 ONLINE 0 0 0
bd9fd809-76af-448e-93a6-fea504746ef4 ONLINE 0 0 0

errors: No known data errors

pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:00:21 with 0 errors on Thu Jan 25 03:45:23 2024
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdj3 ONLINE 0 0 0
sdk3 ONLINE 0 0 0

errors: No known data errors

Has anyone successfully mapped MDD SNs to TNAS reports? TIA!
 

fredbourdelier

Dabbler
Joined
Sep 11, 2022
Messages
27
The drive is experiencing more errors as before - now the pool is degraded even though no data loss has occurred, but it needs to be replaced. If I can't find which drive matches which SN, I'll just have to replace each drive in turn? What will TNAS think if I suddenly put another copy of drive #1 into slot #2 after pulling drive #2? Is it going to get confused?


root@ZZZ-SITE-F-1[~]# smartctl -x /dev/sdf
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.107+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:
Product: OOS10000G
Revision: OOS1
Compliance: SPC-5
User Capacity: 10,000,831,348,736 bytes [10.0 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c500ca523f23
Serial number: 0000EVL70000C0422424
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Mar 11 03:02:27 2024 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature: 18 C
Drive Trip Temperature: 60 C

Manufactured in week 18 of year 2022
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 92
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 1657
Elements in grown defect list: 0

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 144.212 0
write: 0 0 348 348 348 256.628 0

Non-medium error count: 0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed - 979 - [- - -]
# 2 Background long Completed - 16 - [- - -]

Long (extended) Self-test duration: 58813 seconds [980.2 minutes]

Background scan results log
Status: no scans active
Accumulated power on time, hours:minutes 1947:42 [116862 minutes]
Number of background scans performed: 0, scan progress: 0.00%
Number of background medium scans performed: 0

Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 0
number of phys = 1
phy identifier = 0
attached device type: expander device
attached reason: SMP phy control function
reason: power on
negotiated logical link rate: phy enabled; 6 Gbps
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=1
SAS address = 0x5000c500ca523f21
attached SAS address = 0x500056b36789abff
attached phy identifier = 6
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
relative target port id = 2
generation code = 0
number of phys = 1
phy identifier = 1
attached device type: no device attached
attached reason: unknown
reason: unknown
negotiated logical link rate: phy enabled; unknown
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000c500ca523f22
attached SAS address = 0x0
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
May I suggest wiping the drive before you attempt to put it back in the pool - that should deal with any replacement issues
 
Top