Resource icon

Troubleshooting disk format warnings in TrueNAS SCALE 7.4.5

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
Synopsis
In today's world, reliable data storage is the most valuable asset, treat it accordingly and always purchase your CMR hard drives from reputable sellers. If you cannot successfully perform the formatting procedure detailed below, it usually means your hard drive has custom firmware applied or is just defective. You are welcome to share your found fixes in this thread, I will add them as reference for other users.

This guide is part of my Bluefin Recommended Settings and Optimizations guide, I recommend you go through all sections to validate if you don't experience other issues.

ZFS and Advanced Format Hard Drives
I'm going to look at the differences from a ZFS usage perspective, since we are dealing with Scale. Each hard drive has two type of sectors:
  • physical - 512, 520, or 528-byte for common drives or 4096, 4112, 4160, and 4224-byte for Advanced Format drives
  • logical - 512, 520, or 528-byte for common drives or 512 emulation (512e) and 4K native (4Kn) for Advanced Format drives
Logical sector is the smallest unit of write that the hard drive can accept, obtained by logically dividing a 4K sector into 8 parts and can be modified if hard drive firmware allows it. Physical sector is the unit for which read and write operations to the hard drive are completed and cannot be modified.
Most hardware and software components adopt hard drives configured around the 512-byte sector. To solve this problem, a 4K physical sectors hard drive is coupled with 512-byte conversion firmware emulation, so that the 4K physical sectors used in Advanced Format are translated into 8 traditional 512-byte logical sectors compatible with operating systems. In other words, the hard drive must perform extra mechanical steps in the form of reading a 4K sector, modifying the contents and then writing the data. This process is called a read-modify-write (RMW) cycle. The 4K native hard drive configured with 4K physical sectors and 4K logical sectors, there is no emulation layer in place.

Example of logical and physical sectors present in an AF 512 emulation (512e) hard drive:
Code:
# fdisk -l /dev/sda
Disk /dev/sda: 7.28 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: HUH728080ALE601
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

TrueNAS Scale uses by default the ashift=12 flag to define the 4096-byte sectors, when a new pool vdev is created. Basically, ashift tells ZFS what's the underlying physical block size used by your hard drives. It is measured in bits:
  • ashift=9 - 512B sectors, used by common drives
  • ashift=12 - 4K sectors, used by AF hard drives
  • ashift=13 - 8K sectors used by some modern SSDs
Example of ashift value set on default pool:
Code:
# zpool get ashift default
NAME     PROPERTY  VALUE   SOURCE
default  ashift    12      local

Based on my own tests, confirmed also by @NugentS, there are no real performance benefits formatting the 512e logical sectors of a hard drive to 4Kn, but as @HoneyBadger mentioned in this post, is preferred to avoid the read-modify-write cycle, if your disk can be formatted to 4Kn logical sectors. Both 512e and 4Kn logical sector hard drives can co-exist in a pool vdev.

Known Issues
Some users get this warning message, after Bluefin upgrade:
Code:
Disk(s): sda ... are formatted with Data Integrity Feature (DIF) which is unsupported.

Another case is disks are not being seen by pool, yet the pool reports no errors. You might deal with one or a combination of both issues listed below. Bluefin's newer kernel/ZFS version makes available these feature checks and throws a warning, which explains why you did not see it before.

T10 Protection Information
T10-PI is an extension of the existing T10 SCSI Block Commands specification, covering communication between SCSI controllers and storage devices, Protection Information (PI) adds an extra 8 bytes of information to the 512-byte sectors typical of enterprise hard drives.
Example of T10-PI Type 1 protected disk:
Code:
# sg_readcap -l /dev/sda
Read Capacity results:
   Protection: prot_en=1, p_type=0, p_i_exponent=0 [type 1 protection]

Example of T10-PI Type 2 protected disk:
Code:
# sg_readcap -l /dev/sda
Read Capacity results:
   Protection: prot_en=1, p_type=1, p_i_exponent=0 [type 2 protection]

Example of non T10-PI protected disk:
Code:
# sg_scan -i /dev/sda
/dev/sda: scsi0 channel=0 id=0 lun=0 [em]
    ATA       HUH728080ALE601   0003 [rmb=0 cmdq=1 pqual=0 pdev=0x0]

Branded Disks
Linux does not support 520-byte sectors, unless the drive is formatted with DIF and installed into a DIF-capable HBA. This explains why OEM branded HDDs or SSDs from major storage vendors are "enhanced" with 520-byte sectors. In rare cases, you might also find disk sectors extended to 528-byte, with custom proprietary firmware.
Example of 520-byte sectors disk:
Code:
# sg_format -v /dev/sda
    NETAPP    X287_S15K5288A15  3P01   peripheral_type: disk [0x0]
      PROTECT=0
      Unit serial number: 00028BF3
      LU name: 5000cca254d9cd21
    mode sense(10) cdb: [5a 00 01 00 00 00 00 00 fc 00]
Mode Sense (block descriptor) data, prior to changes:
  Number of blocks=0 [0x0]
  Block size=520 [0x208]

Formatting Procedure
  • User is responsible for any destructive actions, if you're not sure of something, ask before you execute any formatting commands
  • All commands must be executed as root user
  • For related inquiries or questions, please post the sg_format output, using [CODE]command[/CODE]

    1673131886970.png
Disk is Part of a Pool
In UI, go to Storage > Pool Name > Manage Devices and take the disk offline:

1672356822831.png


The safest way is to run the sg_format commands directly from Scale server console, but you can also use the tmux procedure detailed below, to avoid losing connectivity to server.
Start a tmux session and execute your format command, Enter is part of command, do not remove it:
Code:
# tmux new-session -ds format
# tmux send-keys -t format 'time sg_format -v [your format flag] /dev/sda' Enter

Print the current format status:
Code:
# tmux capture-pane -pt format

To attach to format session, run:
Code:
# tmux attach -t format

To detach from current session, press control+b then d.

Initial Troubleshooting
Run the sg_format command without formatting options, to troubleshoot the T10-PI and 520-byte sectors details:
Code:
# sg_format -v /dev/sda

T10 Protection Information
To remove the T10-PI protection, run the sg_format command with following formatting options, time measures the command execution time:
Code:
# time sg_format -v -F /dev/sda

Branded Disks
To format a 520-byte sectors disk, run the sg_format command with following formatting options:
Code:
# time sg_format -v -F -s 512 /dev/sda

Advanced Format Disks
To format a 512e to 4Kn sectors disk, run the sg_format command with following formatting options:
Code:
# time sg_format -v -F -s 4096 /dev/sda

T10 Protection Information and Branded Disks
If sg_format reports you have both 520-byte sectors and the T10-PI protection enabled, you need to disable first the T10-PI protection with sg_format -v -F, then run sg_format -v -F -s 512 to format the sectors. Example where both issues are present:
Code:
# sg_format -v /dev/sda
    NETAPP    X287_S15K5288A15  3P01   peripheral_type: disk [0x0]
      PROTECT=2
      Unit serial number: 00028BF3
      LU name: 5000cca254d9cd21
    mode sense(10) cdb: [5a 00 01 00 00 00 00 00 fc 00]
Mode Sense (block descriptor) data, prior to changes:
  Number of blocks=0 [0x0]
  Block size=520 [0x208]

# time sg_format -v -F /dev/sda
# time sg_format -v -F -s 512 /dev/sda
No, you cannot combine all flags into one command, sg_format will prohibit this operation.

Disk Replacement
Back in UI, forcibly Replace the disk:

1672357218613.png


The replacement will trigger an automatic resilvering, the process will take several hours.

HGST Ultrastar Firmware (for Advanced Users)
HGST uses proprietary firmware on their Ultrastar models and Western Digital provides upon request a proprietary tool used for performing low-level maintenance on compatible disk drives, such as conversion to 4K native sectoring. I created a resource with the software I used to format my HGST Helium hard drives from 512e to 4Kn logical sectors.

It is strongly recommended to contact Western Digital and provide your hard drive serial number, to make sure you obtain a compatible low-level maintenance tool. Using the wrong tool could permanently brick your device. The steps listed below will apply to any of their proprietary formatting tools, always check the documentation to make sure the Hugo related commands are still valid.

Since my disks are OEM, WD did not provided any support, so I had to find a solution by myself.
With the procedure listed below, I successfully performed a format on one disk at the time on all devices, directly on my Scale server.

Refer to included PDF file or run hugo format -h for all command flags, for some reason man hugo is missing some of them:
Code:
USAGE:

   format  {-m <model number> ... |-s <serial number> ... |-g <device path>
           ... } [--danger-zone] [--simple-progress] [--hide-progress] [-p
           <protection type>] [-c] [--fastformat] [--merge] [-n <number of
           blocks>] [-b <block size>] [--no-sg] [--no-ad] [--no-mr]
           [--no-serial] [-h]

Where:
   -m <model number>,  --model <model number>  (accepted multiple times)
     (OR required)  Format all devices of specified model number
         -- OR --
   -s <serial number>,  --serial <serial number>  (accepted multiple times)
     (OR required)  Format device specified by serial number
         -- OR --
   -g <device path>,  --target <device path>  (accepted multiple times)
     (OR required)  Operate on targets with device handles specified by
     this option. See README for more detail.
   --danger-zone
     Flag tells the application that you know you are going to destroy your
     data with this command and will not prompt the user.
   --simple-progress
     Prevent the display of the progress bar screen, useful when running
     commands from a script. (same as hide-progress flag)
   --hide-progress
     Prevent the display of the progress bar screen, useful when running
     commands from a script.
   -p <protection type>,  --protection <protection type>
     Specify a type of Protection Information (0,1,2,3)
   -c,  --media-compatibility-check
     Perform media compatibility check
   --fastformat
     Set Fast Format
   --merge
     Merge G-List and P-List
   -n <number of blocks>,  --numblocks <number of blocks>
     Specify number of blocks to Format. Default: Current size. Specifying
     'max' will format to maximum number of blocks supported by the
     device.
   -b <block size>,  --blocksize <block size>
     Format the device to a specified Block size (512, 4096). Additional
     special case sizes for SAS drives only include: 520, 528, 4112, 4160,
     and 4224. Special cases may not be supported by your specific
     firmware
   --no-sg
     Do not use the SG driver
   --no-ad
     Do not use the AD driver
   --no-mr
     Do not use the MR driver
   --no-serial
     Do not use the Serial driver
   -h,  --help
     Displays usage information and exits.


Once Hugo is installed as detailed into resource, take the disk offline into UI.

Wipe the hard drive signatures, the process is instant:
Code:
# wipefs -af /dev/sda

Start a tmux session and format the hard drive with Hugo, the process takes many hours:
Code:
# tmux new-session -ds sda
# tmux send-keys -t sda 'hugo format --danger-zone --simple-progress -b 4096 -p 0 -g /dev/sda' Enter
# tmux capture-pane -pt sda

Using the --fastformat option will result in corrupted sectors which make surface little after the disk is back online. The safest way is to proceed with the standard long format. Once the format finished, kill the tmux session:
Code:
# tmux kill-session -t sda

Verify the 4Kn logical sectors with fdisk:
Code:
# fdisk -l /dev/sda
Disk /dev/sdc: 7.28 TiB, 8001563222016 bytes, 1953506646 sectors
Disk model: HUH728080ALE601
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Back in UI, forcibly Replace the disk.

The replacement will trigger an automatic resilvering, the process will take several hours.

Firmware Update
Newer HGST firmware might be available at HDDGuru, search on page for your disk model number (i.e. HUH728080ALE601). Files are zipped with an incompatible Linux format, so I created a simple Python script allowing you to download and unzip the firmware on your TrueNAS server.
Paste the following code into your terminal and press Enter, it will create the /tmp/extract.py file:
Code:
cat > /tmp/extract.py <<EOF
import urllib.request as request
import shutil
import sys
import zipfile

zip_file = sys.argv[1]
request_url = '/'.join(['https://files.hddguru.com/download/Firmware%20updates/Hitachi', zip_file])
print(' '.join(['Downloading', zip_file, '...']))
try:
    with request.urlopen(request_url) as response, open(zip_file, 'wb') as file:
        shutil.copyfileobj(response, file)
except:
    print('Download error')
else:
    print('OK')

print(' '.join(['Extracting', zip_file, '...']))
try:
    with zipfile.ZipFile(zip_file, 'r') as zip_ref:
        zip_ref.extractall('.')
except:
    print('File extraction error')
else:
    print('OK')
EOF

Execute the following commands to download and extract the firmware (A4GNVD05.zip used as example for HUH728080ALE601 model):
Code:
# cd /tmp
# python extract.py A4GNVD05.zip
Downloading A4GNVD05.zip ...
OK
Extracting A4GNVD05.zip ...
OK
# ls -lah A4GNVD05*
-rw-r--r-- 1 root root 1.9M Dec 13 03:09 A4GNVD05.bin
-rw-r--r-- 1 root root 1.1M Dec 13 03:09 A4GNVD05.zip
I will update the flashing procedure with Hugo, when I have some time.
 
Last edited:

bonfire62

Dabbler
Joined
Nov 27, 2022
Messages
21
fdisk shows the following, which does not indicate the problem:
Code:
➜  ~ fdisk -l /dev/sdy
Disk /dev/sdy: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: H7240AS60SUN4.0T
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B62224CF-3DC5-4BC8-A142-476BE12125F3


after reading https://talesinit.blogspot.com/2015/11/formatted-with-type-2-protection-huh.html on the reformatting process, the extra byte(or bytes) show up in the smart data:
Code:
➜  ~ smartctl -i /dev/sdy
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.142+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              H7240AS60SUN4.0T
Revision:             A3A0
Compliance:           SPC-4
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
Formatted with type 1 protection           <<<<<HERE>>>>>
8 bytes of protection information per logical block
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca073425558
Serial number:        001533E5GX8X        PEH5GX8X
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Fri Dec 16 11:55:44 2022 MST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
 

bonfire62

Dabbler
Joined
Nov 27, 2022
Messages
21
Code:
Read Capacity results:
   Protection: prot_en=1, p_type=0, p_i_exponent=0 [type 1 protection]
   Logical block provisioning: lbpme=0, lbprz=0
   Last LBA=7814037167 (0x1d1c0beaf), Number of logical blocks=7814037168
   Logical block length=512 bytes
   Logical blocks per physical block exponent=0
   Lowest aligned LBA=0
Hence:
   Device size: 4000787030016 bytes, 3815447.8 MiB, 4000.79 GB, 4.00 TB
 

bonfire62

Dabbler
Joined
Nov 27, 2022
Messages
21
I'm currently reverted back to Angelfish, in which kubernetes and subsequently docker is broken. I would guess based on others in the forum that this is the host permission setting not being available in angelfish, but it looks like it has persisted through the downgrade somehow. (the
Code:
truenas29# cli
[truenas29]> app kubernetes update validate_host_path=false
is not settable in angelfish, but I suspect that is the problem. I'm guessing that it would be better to remove/re-silver the drives in angelfish and get the pool working correctly, rather than try to re-upgrade and then try to solve it?
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
Thanks for the useful info @bonfire62. You cannot properly revert back to Angelfish. Do you have the time to invest on fixing the disks? You can do it in both Angelfish or Bluefin, but since you are already in Angelfish, let's proceed.

I'm still waiting for @rollee, @vroger and @MR.B to report which case applies for them. Guys, please post your zpool status -v poolname, grep --color 'sector size' /var/log/messages, sg_scan -i /dev/sgX and sg_readcap -l /dev/sgX output for one of DIF affected disks, I want to see the differences between DIF and T10.
 
Last edited:

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
I forgot to mention, can you please run:
Code:
# zpool status -v main_pool
# grep --color 'sector size' /var/log/messages

I want to see if you find anything in Angelfish, in Bluefin you will definitely find traces.

Linux cannot read disks formatted with 520-byte sectors, therefore you need to perform both procedures just to be on safe side, it will not hurt to run the two sg_format commands:
Code:
# sg_format -F -s 512 /dev/sgX

Detailed command:
Code:
# sg_format --format --size=512 /dev/sgX

The goal is to see how sg_format command acts in Scale, with a pool disk unmounted/offline. I guess you are the guinea pig for this procedure, please post all details! :cool:
 
Last edited:

jporrata

Cadet
Joined
Oct 30, 2022
Messages
1
I forgot to mention, can you please run:
Code:
# zpool status -v main_pool
# grep --color 'sector size' /var/log/messages

I want to see if you find anything in Angelfish, in Bluefin you will definitely find traces.

Linux cannot read disks formatted with 520-byte sectors, therefore you need to perform both procedures just to be on safe side, it will not hurt to run the two sg_format commands:
Code:
# sg_format -Ff 0 -s 512 /dev/sg0

Detailed command:
Code:
# sg_format --format --fmtpinfo=0 --size=512 /dev/sg0

The goal is to see how sg_format command acts in Scale, with a pool disk unmounted/offline. I guess you are the guinea pig for this procedure, please post all details! :cool:

zpool status -v main_pool
root@saturn[~]# zpool status -v Pool00
pool: Pool00
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub canceled on Fri Dec 16 05:06:47 2022
config:

NAME STATE READ WRITE CKSUM
Pool00 DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
243a5b1b-df86-4dfe-92f9-f82ebcf8f9d3 ONLINE 0 0 0
3aa850d7-3d30-4708-a33b-45fc3b781c95 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
92f4016c-ac65-482e-94b4-6f2917b2c6ee ONLINE 0 0 0
5c26b062-44cc-4a86-b318-bd2febf67d05 FAULTED 0 27 0 too many errors
mirror-2 ONLINE 0 0 0
c71506c6-0d2a-4679-9998-da67418af9a0 ONLINE 0 0 0
93175346-1b3e-40d6-a87b-1e5d98081b99 ONLINE 0 0 0

errors: No known data errors
grep --color 'sector size' /var/log/messages
root@saturn[~]# grep --color 'sector size' /var/log/messages
root@saturn[~]#
sg_format --format --fmtpinfo=0 --size=512 /dev/sdc
root@saturn[~]# sg_format --format --fmtpinfo=0 --size=512 /dev/sdc
IBM-XIV HUS724020ALS61A4 NYL4 peripheral_type: disk [0x0]
<< supports protection information>>
Unit serial number: P6JT413V
LU name: 5000cca0289c9c40
Mode Sense (block descriptor) data, prior to changes:
Number of blocks=3907029168 [0xe8e088b0]
Block size=512 [0x200]

A FORMAT UNIT will commence in 15 seconds
ALL data on /dev/sdc will be DESTROYED
Press control-C to abort

A FORMAT UNIT will commence in 10 seconds
ALL data on /dev/sdc will be DESTROYED
Press control-C to abort

A FORMAT UNIT will commence in 5 seconds
ALL data on /dev/sdc will be DESTROYED
Press control-C to abort

Format unit has started
the format is taking a really long time. If anything, interesting comes out of it I can update you. Do you need any more info? This is being done on some IBM branded HGST drives that I can't make an array from.
 

bonfire62

Dabbler
Joined
Nov 27, 2022
Messages
21
zpool status -v main_pool
Code:
➜  ~ zpool status -v main_pool
  pool: main_pool
 state: ONLINE
  scan: resilvered 44.4G in 00:06:12 with 0 errors on Thu Dec  8 06:55:08 2022
config:

        NAME                                      STATE     READ WRITE CKSUM
        main_pool                                 ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            e247cb3d-c112-4512-b7e6-42d9042ec50e  ONLINE       0     0     0
            80b988a2-51f4-4d1e-bed0-e2da6f927a1e  ONLINE       0     0     0
            2dd83464-4d07-4120-9e52-a36297da750b  ONLINE       0     0     0
            b97b7869-82fa-427b-ba0b-e963782594bb  ONLINE       0     0     0
            72841d02-3430-4625-8257-21240750c47c  ONLINE       0     0     0
          raidz1-1                                ONLINE       0     0     0
            5227000f-c83c-4a71-ac02-6427f2d8f480  ONLINE       0     0     0
            1d1f4ff1-b820-4212-9977-cb7c1b3e7d5b  ONLINE       0     0     0
            3426d6ae-4e32-4435-a3cd-886e4acd2dbd  ONLINE       0     0     0
            9c9a84f7-05df-429b-9d92-9fdbe430f41c  ONLINE       0     0     0
            e450d04f-4b57-4369-b3f7-15718d6f0014  ONLINE       0     0     0
          raidz1-2                                ONLINE       0     0     0
            378675d2-b2c3-4b79-9cb2-7cdaafe13477  ONLINE       0     0     0
            ee2df7fd-2323-47b6-8c4a-85e432d04891  ONLINE       0     0     0
            9f77da00-49d4-4573-af4b-6e5e174c7d69  ONLINE       0     0     0
            a0dbcf49-4838-4edb-9f7d-229b7c367d63  ONLINE       0     0     0
            e6036a5f-7cfe-403b-9d95-8936b1e90811  ONLINE       0     0     0
          raidz1-3                                ONLINE       0     0     0
            f3e9e7d0-107a-4571-ad0e-f4b65f0e34b1  ONLINE       0     0     0
            9ad3ecc7-d0dc-4958-ab88-3d745708a1c3  ONLINE       0     0     0
            84ef0063-c6fa-4b36-a7ca-639aef512d61  ONLINE       0     0     0
            2719606c-b2a0-485c-acd3-72933a3ccf62  ONLINE       0     0     0
            d247e45c-07e6-426c-ba7a-4f0d36a0a11f  ONLINE       0     0     0


grep --color 'sector size' /var/log/messages

yields nothing for me

Edit: I apologize, I stated before that one I did not create, that was incorrect. The four above are correct. Previously on upgrade it was showing 17 drives unallocated to a pool. Now it is showing the pool mounted correctly. To be clear, I did not change any of the drive's formatting yet. Going to look to see if I can find anything else that is incorrect but the pool now shows correctly.
 

bonfire62

Dabbler
Joined
Nov 27, 2022
Messages
21
I still show this error
Code:

CRITICAL​

Disk(s): sdy, sdab, sdz, sdw, sdv, sdu, sds, sdo are formatted with Data Integrity Feature (DIF) which is unsupported.​

2022-12-15 23:00:04 (America/Los_Angeles)
 

bonfire62

Dabbler
Joined
Nov 27, 2022
Messages
21
sg_map
Code:
/dev/sg0
/dev/sg1  /dev/sda
/dev/sg2  /dev/sdb
/dev/sg3  /dev/sr0
/dev/sg4  /dev/sdc
/dev/sg5  /dev/sdd
/dev/sg6  /dev/sde
/dev/sg7  /dev/sdf
/dev/sg8  /dev/sdg
/dev/sg14
/dev/sg15  /dev/sdm
/dev/sg16  /dev/sdn
/dev/sg17  /dev/sdo
/dev/sg18  /dev/sdp
/dev/sg19  /dev/sdq
/dev/sg20  /dev/sdr
/dev/sg21  /dev/sds
/dev/sg22  /dev/sdt
/dev/sg23  /dev/sdu
/dev/sg24  /dev/sdv
/dev/sg25  /dev/sdw
/dev/sg26  /dev/sdx
/dev/sg27  /dev/sdy
/dev/sg28  /dev/sdz
/dev/sg29  /dev/sdaa
/dev/sg30  /dev/sdab
/dev/sg31


Code:
➜  media lsblk -o NAME,PARTUUID,PATH,FSTYPE /dev/sdy
NAME   PARTUUID                             PATH      FSTYPE
sdy                                         /dev/sdy
├─sdy1 a26b9a79-198f-4cb0-aee4-8b9650010498 /dev/sdy1 linux_raid_member
└─sdy2 ee2df7fd-2323-47b6-8c4a-85e432d04891 /dev/sdy2 zfs_member


Code:
/dev/sdy: scsi11 channel=0 id=11 lun=0 [em]
    HGST      H7240AS60SUN4.0T  A3A0 [rmb=0 cmdq=1 pqual=0 pdev=0x0]


Code:
sg_readcap -l /dev/sdy
Read Capacity results:
   Protection: prot_en=1, p_type=0, p_i_exponent=0 [type 1 protection]
   Logical block provisioning: lbpme=0, lbprz=0
   Last LBA=7814037167 (0x1d1c0beaf), Number of logical blocks=7814037168
   Logical block length=512 bytes
   Logical blocks per physical block exponent=0
   Lowest aligned LBA=0
Hence:
   Device size: 4000787030016 bytes, 3815447.8 MiB, 4000.79 GB, 4.00 TB
 
Last edited:

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
Your disk partuuid is ee2df7fd-2323-47b6-8c4a-85e432d04891, part of:
Code:
          raidz1-2                                ONLINE       0     0     0
            378675d2-b2c3-4b79-9cb2-7cdaafe13477  ONLINE       0     0     0
            ee2df7fd-2323-47b6-8c4a-85e432d04891  ONLINE       0     0     0   <<<<< affected disk
            9f77da00-49d4-4573-af4b-6e5e174c7d69  ONLINE       0     0     0
            a0dbcf49-4838-4edb-9f7d-229b7c367d63  ONLINE       0     0     0
            e6036a5f-7cfe-403b-9d95-8936b1e90811  ONLINE       0     0     0

and mapped as /dev/sg27 /dev/sdy. Take the disk offline from pool and run:
Code:
# sg_format --format --fmtpinfo=0 --size=512 /dev/sg27

Please report the output and completion result. We will put the disk back into pool, once we know the result.

The disk format takes a long time, use tmux to avoid losing connectivity to server:
Code:
# tmux
# sg_format --format --fmtpinfo=0 --size=512 /dev/sg27

To detach from current session while formatting the disk, press control+b then d.

To attach back to current session, run:
Code:
# tmux attach

It will show the screen, then you can detach again. Should take an hour or more for format. Once the format is finished, run:
Code:
# tmux capture-pane -pS -10000 > format.log
# tmux kill-session
# cat format.log

format.log will contain the format command output, useful for further troubleshooting.
 
Last edited:

bonfire62

Dabbler
Joined
Nov 27, 2022
Messages
21
nohup & and disown worked on the second try. I accidentally ctrl+c'd the first format, but it's reporting format in progress and appending output to nohup.out still, plus activity led is blinking away. I'll report back tomorrow morning.
Code:
➜  media cat nohup.out
    mode sense(10) cdb: [5a 00 01 00 00 00 00 00 fc 00]
mode sense(10):
Descriptor format, current; Sense key: Not Ready
Additional sense: Logical unit not ready, format in progress
  Descriptor type: Information: Valid=0 (-> vendor specific) 0x0000000000000000
  Descriptor type: Command specific: 0x0000000000000000
  Descriptor type: Sense key specific: Progress indication: 0.99%
  Descriptor type: Field replaceable unit code: 0x4
  Descriptor type: Block commands: Incorrect Length Indicator (ILI) clear
  Descriptor type: Vendor specific [0x80]
    f5 04
  Descriptor type: VMODE SENSE (10) command: Device not ready, type: sense key
    HGST      H7240AS60SUN4.0T  A3A0   peripheral_type: disk [0x0]
      PROTECT=1
      << supports protection information>>
      Unit serial number: 001533E5GX8X        PEH5GX8X
      LU name: 5000cca073425558
    mode sense(10) cdb: [5a 00 01 00 00 00 00 00 fc 00]
block count maxed out, set <<longlba>>
    mode sense(10) cdb: [5a 10 01 00 00 00 00 00 fc 00]
    Format unit cdb: [04 18 00 00 00 00]
 

Daisuke

Contributor
Joined
Jun 23, 2011
Messages
1,041
nohup & and disown worked on the second try.
nohup will not prevent to stop or kill a process when a terminal disconnects, please use tmux. Once the format is completed, run the commands to compare with previous output:
Code:
# sg_scan -i /dev/sg27
# sg_readcap -l /dev/sg27

For next disk format, use time to get a real estimate how long it takes to format a disk:
Code:
# time sg_format --format --fmtpinfo=0 --size=512 /dev/sg30

Example, it takes 0.002s total for sg_scan command:
Code:
# time sg_scan -i /dev/sg0
/dev/sg0: scsi0 channel=0 id=0 lun=0
    ATA       HUH728080ALE601   0003 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
sg_scan -i /dev/sg0  0.00s user 0.00s system 73% cpu 0.002 total

I'll reply back tomorrow, once we get the info and see if we can proceed with bringing the disk online.
 
Last edited:

bonfire62

Dabbler
Joined
Nov 27, 2022
Messages
21
Code:
media sg_map /dev/sg0
/dev/sg1 /dev/sda
dev/sg2  /dev/sdb
/dev/sg3 /dev/sr0
/dev/sg4 /dev/sdc
/dev/sg5 /dev/sdd
/dev/sg6 /dev/sde
/dev/sg7 /dev/sdf
/dev/sg8 /dev/sdg
/dev/sg14
/dev/sg15 /dev/sdm
/dev/sg16 /dev/sdn
/dev/sg17 /dev/sdo
/dev/sg18 /dev/sdp
/dev/sg19 /dev/sdq
/dev/sg20 /dev/sdr
/dev/sg21 /dev/sds
/dev/sg22 /dev/sdt
/dev/sg23 /dev/sdu
/dev/sg24 /dev/sdv
/dev/sg25 /dev/sdw
/dev/sg26 /dev/sdx
/dev/sg27 /dev/sdy
/dev/sg28 /dev/sdz
/dev/sg29 /dev/sdaa
/dev/sg30 /dev/sdab
/dev/sg31

➜ media lsblk -o NAME,PARTUUID,PATH,FSTYPE /dev/sdy NAME PARTUUID PATH FSTYPEsdy /dev/sdy
│
├─sdy1
│ a26b9a79-198f-4cb0-aee4-8b9650010498 /dev/sdy1
│ linux_└─sdy2
     ee2df7fd-2323-47b6-8c4a-85e432d04891 /dev/sdy2
                                                  zfs_me

➜  media sg_scan -i /dev/sg27                           /dev/sg27: scsi11 channel=0 id=11 lun=0
    HGST      H7240AS

60SUN4.0T  A3A0 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
➜  media sg_readcap -l /dev/sg27                        Read Capacity results:
   Protection: prot_en=0, p_type=0, p_i_exponent=0
   Logical block provisioning: lbpme=0, lbprz=0
   Last LBA=7814037167 (0x1d1c0beaf), Number of logical blocks=7814037168
   Logical block length=512 bytes
   Logical blocks per physical block exponent=0
   Lowest aligned LBA=0
Hence:
   Device size: 4000787030016 bytes, 3815447.8 MiB, 4000.79 GB, 4.00 TB



Looks like the PI is gone, going to make sure truenas isn't throwing the DIF perror and reattach, resilver

Online, but no option to resilver. I think I'll need to "replace" the drive instead
 
Last edited:
Top