local storage SR failed with Currently unreadable (pending) sectors

uberwebguru

Explorer
Joined
Jul 23, 2013
Messages
97
I was just working on migrating all my VMs from a old XCP-ng host to a new XCP-ng host and the old XCP-ng started to have issues. I could not migrate any more VMs, so i restarted the host

Now i get error `Host Unavailable` and upon checking there is issue with the local storage SR. There is local SR comprises of 2x 3TB SATA drives in RAID 0 and have worked fine for 6 years

error is

Code:
248 Currently unreadable (pending) sectors on /dev/sda


When i checked the SR in XOA and XCP-ng center, i can see the SR shows has disconnected

here is what `df -h` shows

Code:
[21:26 xenserver102 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G   56K  3.9G   1% /dev/shm
tmpfs           3.9G  9.3M  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda1        18G  1.9G   15G  12% /
xenstore        3.9G     0  3.9G   0% /var/lib/xenstored
/dev/sda5       3.9G  1.2G  2.5G  33% /var/log
tmpfs           786M     0  786M   0% /run/user/0


and `fdisk -l`

Code:
[21:26 xenserver102 ~]# fdisk -l
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: gpt
Disk identifier: D9D5790A-64EC-4285-B239-8C40C100C4E6


#         Start          End    Size  Type            Name
 1     46139392     83888127     18G  Microsoft basic
 2      8390656     46139391     18G  Microsoft basic
 3     87033856   5860533134    2.7T  Linux LVM
 4     83888128     84936703    512M  BIOS boot
 5         2048      8390655      4G  Microsoft basic
 6     84936704     87033855      1G  Linux swap

Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/mapper/VG_XenStorage--25765a4e--a243--8580--5579--c039e886c339-VHD--536253f4--b012--4679--a88f--1718821630c7: 343 MB, 343932928 bytes, 671744 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


xcp-85f9ef84-a48b-4300-a332-55ee1d92ec27-image.png


xcp-98ea1614-cfeb-4f9f-a2c6-3f1db2af22e9-image.png


xcp-193b2ee9-cc69-4268-a055-857dec32ffe7-image.png


xcp-c77ef648-40a6-45b6-8c73-a1b7eb7fede8-image.png



I just have a few very important VMs left to migrate off the server, how do i get to fix this error to migrate over what is left?

Thanks
 

uberwebguru

Explorer
Joined
Jul 23, 2013
Messages
97
>
Code:
smartctl -x /dev/sda


Code:
[21:26 xenserver102 ~]# smartctl -x /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.19.0+1] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    Z50028WT
LU WWN Device Id: 5 000c50 066803d9b
Firmware Version: CC43
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Jul  4 21:27:24 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is:   Disabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (   97) seconds.
Offline data collection
capabilities:              (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 325) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x1085)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   118   095   006    -    198974816
  3 Spin_Up_Time            PO----   093   093   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    74
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    16
  7 Seek_Error_Rate         POSR--   088   060   030    -    687700625
  9 Power_On_Hours          -O--CK   038   038   000    -    54613
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    74
183 Runtime_Bad_Block       -O--CK   099   099   000    -    1
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   001   001   000    -    478
188 Command_Timeout         -O--CK   100   099   000    -    0 0 3
189 High_Fly_Writes         -O-RCK   090   090   000    -    10
190 Airflow_Temperature_Cel -O---K   066   037   045    Past 34 (0 2 38 20 0)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    49
193 Load_Cycle_Count        -O--CK   100   100   000    -    1433
194 Temperature_Celsius     -O---K   034   063   000    -    34 (0 11 0 0 0)
197 Current_Pending_Sector  -O--C-   099   098   000    -    248
198 Offline_Uncorrectable   ----C-   099   098   000    -    248
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    54018h+11m+46.014s
241 Total_LBAs_Written      ------   100   253   000    -    270712761119
242 Total_LBAs_Read         ------   100   253   000    -    70969713068650
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL     VS    4496  Device vendor specific log
0xa8       GPL,SL  VS     129  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    5176  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      10  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xc4       GPL,SL  VS       5  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 478 (device log contains only the most recent 20 errors)
    CR     = Command Register
    FEATR  = Features Register
    COUNT  = Count (was: Sector Count) Register
    LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    LH     = LBA High (was: Cylinder High) Register    ]   LBA
    LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    LL     = LBA Low (was: Sector Number) Register     ]
    DV     = Device (was: Device/Head) Register
    DC     = Device Control Register
    ER     = Error register
    ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 478 [17] occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 53 00 00 00 00 8d 55 71 90 00 00  Error: UNC at LBA = 0x8d557190 = 2371187088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 02 c1 00 00 8d 55 7b cf 40 00  1d+23:55:31.250  READ FPDMA QUEUED
  60 00 00 05 40 00 00 8d 55 76 8f 40 00  1d+23:55:31.250  READ FPDMA QUEUED
  60 00 00 02 c0 00 00 8d 55 73 cf 40 00  1d+23:55:31.250  READ FPDMA QUEUED
  60 00 00 05 3f 00 00 8d 55 6e 90 40 00  1d+23:55:31.240  READ FPDMA QUEUED
  60 00 00 00 01 00 00 8d 55 6e 8f 40 00  1d+23:55:31.221  READ FPDMA QUEUED

Error 477 [16] occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 53 00 00 00 00 8d 55 71 90 00 00  Error: UNC at LBA = 0x8d557190 = 2371187088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 02 c1 00 00 8d 55 7b cf 40 00  1d+23:54:58.888  READ FPDMA QUEUED
  60 00 00 05 40 00 00 8d 55 76 8f 40 00  1d+23:54:58.888  READ FPDMA QUEUED
  60 00 00 02 c0 00 00 8d 55 73 cf 40 00  1d+23:54:58.888  READ FPDMA QUEUED
  60 00 00 05 3f 00 00 8d 55 6e 90 40 00  1d+23:54:58.887  READ FPDMA QUEUED
  61 00 00 00 38 00 00 ab bf 98 90 40 00  1d+23:54:58.871  WRITE FPDMA QUEUED

Error 476 [15] occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 53 00 00 00 00 8d 55 71 90 00 00  Error: UNC at LBA = 0x8d557190 = 2371187088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 02 c1 00 00 8d 55 7b cf 40 00  1d+23:54:22.779  READ FPDMA QUEUED
  60 00 00 05 40 00 00 8d 55 76 8f 40 00  1d+23:54:22.779  READ FPDMA QUEUED
  60 00 00 02 c0 00 00 8d 55 73 cf 40 00  1d+23:54:22.779  READ FPDMA QUEUED
  60 00 00 05 3f 00 00 8d 55 6e 90 40 00  1d+23:54:22.777  READ FPDMA QUEUED
  60 00 00 00 01 00 00 8d 55 6e 8f 40 00  1d+23:54:22.764  READ FPDMA QUEUED

Error 475 [14] occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 53 00 00 00 00 8d 55 71 90 00 00  Error: UNC at LBA = 0x8d557190 = 2371187088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 05 40 00 00 47 27 36 08 40 00  1d+23:53:43.911  READ FPDMA QUEUED
  60 00 00 05 40 00 00 a0 8f ec 40 40 00  1d+23:53:43.909  READ FPDMA QUEUED
  60 00 00 05 80 00 00 a0 8f e6 c0 40 00  1d+23:53:43.908  READ FPDMA QUEUED
  61 00 00 00 08 00 00 ab c2 bf 78 40 00  1d+23:53:43.880  WRITE FPDMA QUEUED
  61 00 00 00 50 00 00 00 3c 81 b8 40 00  1d+23:53:43.876  WRITE FPDMA QUEUED

Error 474 [13] occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 53 00 00 00 00 8d 55 71 90 00 00  Error: UNC at LBA = 0x8d557190 = 2371187088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 02 c1 00 00 8d 55 7b cf 40 00  1d+23:52:52.111  READ FPDMA QUEUED
  60 00 00 05 40 00 00 8d 55 76 8f 40 00  1d+23:52:52.111  READ FPDMA QUEUED
  60 00 00 02 c0 00 00 8d 55 73 cf 40 00  1d+23:52:52.111  READ FPDMA QUEUED
  60 00 00 05 3f 00 00 8d 55 6e 90 40 00  1d+23:52:52.110  READ FPDMA QUEUED
  60 00 00 00 03 00 00 11 80 e0 00 40 00  1d+23:52:52.094  READ FPDMA QUEUED

Error 473 [12] occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 53 00 00 00 00 8d 55 71 90 00 00  Error: UNC at LBA = 0x8d557190 = 2371187088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 02 c1 00 00 8d 55 7b cf 40 00  1d+23:51:45.766  READ FPDMA QUEUED
  60 00 00 05 40 00 00 8d 55 76 8f 40 00  1d+23:51:45.766  READ FPDMA QUEUED
  60 00 00 02 c0 00 00 8d 55 73 cf 40 00  1d+23:51:45.766  READ FPDMA QUEUED
  60 00 00 05 3f 00 00 8d 55 6e 90 40 00  1d+23:51:45.764  READ FPDMA QUEUED
  60 00 00 00 01 00 00 8d 55 6e 8f 40 00  1d+23:51:45.751  READ FPDMA QUEUED

Error 472 [11] occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 53 00 00 00 00 8d 55 71 90 00 00  Error: WP at LBA = 0x8d557190 = 2371187088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 00 00 08 00 00 ab c0 10 60 40 00  1d+23:50:47.612  WRITE FPDMA QUEUED
  61 00 00 00 20 00 00 ab be 59 78 40 00  1d+23:50:47.612  WRITE FPDMA QUEUED
  61 00 00 00 08 00 00 ab be 09 f0 40 00  1d+23:50:47.612  WRITE FPDMA QUEUED
  61 00 00 00 08 00 00 ab bd 7c 68 40 00  1d+23:50:47.612  WRITE FPDMA QUEUED
  61 00 00 00 08 00 00 04 e0 09 b0 40 00  1d+23:50:47.564  WRITE FPDMA QUEUED

Error 471 [10] occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 53 00 00 00 00 8d 55 71 90 00 00  Error: UNC at LBA = 0x8d557190 = 2371187088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 00 02 c1 00 00 8d 55 7b cf 40 00  1d+23:50:12.814  READ FPDMA QUEUED
  60 00 00 05 40 00 00 8d 55 76 8f 40 00  1d+23:50:12.814  READ FPDMA QUEUED
  60 00 00 02 c0 00 00 8d 55 73 cf 40 00  1d+23:50:12.814  READ FPDMA QUEUED
  60 00 00 05 3f 00 00 8d 55 6e 90 40 00  1d+23:50:12.814  READ FPDMA QUEUED
  60 00 00 00 01 00 00 8d 55 6e 8f 40 00  1d+23:50:12.765  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    34 Celsius
Power Cycle Min/Max Temperature:     20/38 Celsius
Lifetime    Min/Max Temperature:     12/63 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2           37  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS




anyway to recover this or be able to copy files over?

Code:
[06:27 xenserver102 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G   56K  3.9G   1% /dev/shm
tmpfs           3.9G  9.3M  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda1        18G  1.9G   15G  12% /
xenstore        3.9G     0  3.9G   0% /var/lib/xenstored
/dev/sda5       3.9G  899M  2.8G  25% /var/log
tmpfs           786M     0  786M   0% /run/user/0
[06:27 xenserver102 ~]# fdisk -l
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: gpt
Disk identifier: D9D5790A-64EC-4285-B239-8C40C100C4E6


#         Start          End    Size  Type            Name
 1     46139392     83888127     18G  Microsoft basic
 2      8390656     46139391     18G  Microsoft basic
 3     87033856   5860533134    2.7T  Linux LVM
 4     83888128     84936703    512M  BIOS boot
 5         2048      8390655      4G  Microsoft basic
 6     84936704     87033855      1G  Linux swap

Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/mapper/VG_XenStorage--25765a4e--a243--8580--5579--c039e886c339-VHD--536253f4--b012--4679--a88f--1718821630c7: 343 MB, 343932928 bytes, 671744 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

[06:27 xenserver102 ~]# ls -lha /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 400 Jul  5 06:19 .
drwxr-xr-x 9 root root 180 Jul  5 06:18 ..
lrwxrwxrwx 1 root root   9 Jul  5 06:18 ata-ST3000DM001-1ER166_Z50028PK -> ../../sdb
lrwxrwxrwx 1 root root   9 Jul  5 06:18 ata-ST3000DM001-1ER166_Z50028WT -> ../../sda
lrwxrwxrwx 1 root root  10 Jul  5 06:18 ata-ST3000DM001-1ER166_Z50028WT-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Jul  5 06:18 ata-ST3000DM001-1ER166_Z50028WT-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Jul  5 06:23 ata-ST3000DM001-1ER166_Z50028WT-part3 -> ../../sda3
lrwxrwxrwx 1 root root  10 Jul  5 06:18 ata-ST3000DM001-1ER166_Z50028WT-part4 -> ../../sda4
lrwxrwxrwx 1 root root  10 Jul  5 06:18 ata-ST3000DM001-1ER166_Z50028WT-part5 -> ../../sda5
lrwxrwxrwx 1 root root  10 Jul  5 06:18 ata-ST3000DM001-1ER166_Z50028WT-part6 -> ../../sda6
lrwxrwxrwx 1 root root  10 Jul  5 06:19 dm-name-VG_XenStorage--25765a4e--a243--8580--5579--c039e886c339-VHD--536253f4--b012--4679--a88f--1718821630c7 -> ../../dm-0
lrwxrwxrwx 1 root root  10 Jul  5 06:19 dm-uuid-LVM-FPI1stmLpq1L8z14BeRbiFjPIX6NGR4dcx0isbrMXEw4nuhBJBPf3SkxOiP3aU3f -> ../../dm-0
lrwxrwxrwx 1 root root   9 Jul  5 06:18 wwn-0x5000c50066803b7d -> ../../sdb
lrwxrwxrwx 1 root root   9 Jul  5 06:18 wwn-0x5000c50066803d9b -> ../../sda
lrwxrwxrwx 1 root root  10 Jul  5 06:18 wwn-0x5000c50066803d9b-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Jul  5 06:18 wwn-0x5000c50066803d9b-part2 -> ../../sda2
lrwxrwxrwx 1 root root  10 Jul  5 06:23 wwn-0x5000c50066803d9b-part3 -> ../../sda3
lrwxrwxrwx 1 root root  10 Jul  5 06:18 wwn-0x5000c50066803d9b-part4 -> ../../sda4
lrwxrwxrwx 1 root root  10 Jul  5 06:18 wwn-0x5000c50066803d9b-part5 -> ../../sda5
lrwxrwxrwx 1 root root  10 Jul  5 06:18 wwn-0x5000c50066803d9b-part6 -> ../../sda6

[06:27 xenserver102 ~]# smartctl -A /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.19.0+1] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   095   006    Pre-fail  Always       -       223108216
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       77
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       16
  7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       687758515
  9 Power_On_Hours          0x0032   038   038   000    Old_age   Always       -       54621
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       77
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       478
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       0 0 3
189 High_Fly_Writes         0x003a   090   090   000    Old_age   Always       -       10
190 Airflow_Temperature_Cel 0x0022   072   037   045    Old_age   Always   In_the_past 28 (0 2 28 28 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       52
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1439
194 Temperature_Celsius     0x0022   028   063   000    Old_age   Always       -       28 (0 11 0 0 0)
197 Current_Pending_Sector  0x0012   099   098   000    Old_age   Always       -       248
198 Offline_Uncorrectable   0x0010   099   098   000    Old_age   Offline      -       248
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       54026h+19m+29.250s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       270716439183
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       70969716310068
 
Last edited:
Joined
Jun 2, 2019
Messages
591
Hate to be the bearer of bad news, but your data is likely toast. Why?

1. RAID 0 <--- no redundancy
2. 6+ years in service (6.23 years to be exact)
3. Desktop class drive in a server application
4. Never ran a single drive self test during the drive service life
5. I'm guessing you have no backups

You could try running an extended self test to see what results you get

Code:
smartctl -t long /dev/sda
 
Last edited:

uberwebguru

Explorer
Joined
Jul 23, 2013
Messages
97
Hate to be the bearer of bad news, but your data is likely toast. Why?

1. RAID 0 <--- no redundancy
2. 6+ years in service (6.23 years to be exact)
3. Desktop class drive in a server application
4. Never ran a single drive self test during the drive service life
5. I'm guessing you have no backups
what can i do to "attempt" recovering?

i hear one can overwrite the bad sectors etc

here to look for solution not to get told the obvious

the machine boots the OS and i have shown outputs of commands above

was running fine until i tried to reboot
 

uberwebguru

Explorer
Joined
Jul 23, 2013
Messages
97
Code:
smartctl -t long /dev/sda


here is result


Code:
[07:56 xenserver102 ~]# smartctl -t long /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.19.0+1] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 325 minutes for test to complete.
Test will complete after Tue Jul  5 13:21:32 2022

Use smartctl -X to abort test.

[08:24 xenserver102 ~]# smartctl -l selftest /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.19.0+1] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     54622         2371187088
 
Last edited:
Joined
Jun 2, 2019
Messages
591
Code:
smartctl -a /dev/sda
 

uberwebguru

Explorer
Joined
Jul 23, 2013
Messages
97
Code:
[08:27 xenserver102 ~]# smartctl -a /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.19.0+1] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1ER166
Serial Number:    Z50028WT
LU WWN Device Id: 5 000c50 066803d9b
Firmware Version: CC43
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Jul  5 08:33:23 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121)    The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline
data collection:         (   97) seconds.
Offline data collection
capabilities:              (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 325) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x1085)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   095   006    Pre-fail  Always       -       223229904
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       77
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       16
  7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       687767865
  9 Power_On_Hours          0x0032   038   038   000    Old_age   Always       -       54623
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       77
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       478
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       0 0 3
189 High_Fly_Writes         0x003a   090   090   000    Old_age   Always       -       10
190 Airflow_Temperature_Cel 0x0022   072   037   045    Old_age   Always   In_the_past 28 (0 2 28 27 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       52
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1439
194 Temperature_Celsius     0x0022   028   063   000    Old_age   Always       -       28 (0 11 0 0 0)
197 Current_Pending_Sector  0x0012   099   098   000    Old_age   Always       -       248
198 Offline_Uncorrectable   0x0010   099   098   000    Old_age   Offline      -       248
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       54028h+20m+39.123s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       270717375167
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       70969716312989

SMART Error Log Version: 1
ATA Error Count: 478 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 478 occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 c1 ff ff ff 4f 00   1d+23:55:31.250  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00   1d+23:55:31.250  READ FPDMA QUEUED
  60 00 c0 ff ff ff 4f 00   1d+23:55:31.250  READ FPDMA QUEUED
  60 00 3f ff ff ff 4f 00   1d+23:55:31.240  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00   1d+23:55:31.221  READ FPDMA QUEUED

Error 477 occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 c1 ff ff ff 4f 00   1d+23:54:58.888  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00   1d+23:54:58.888  READ FPDMA QUEUED
  60 00 c0 ff ff ff 4f 00   1d+23:54:58.888  READ FPDMA QUEUED
  60 00 3f ff ff ff 4f 00   1d+23:54:58.887  READ FPDMA QUEUED
  61 00 38 ff ff ff 4f 00   1d+23:54:58.871  WRITE FPDMA QUEUED

Error 476 occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 c1 ff ff ff 4f 00   1d+23:54:22.779  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00   1d+23:54:22.779  READ FPDMA QUEUED
  60 00 c0 ff ff ff 4f 00   1d+23:54:22.779  READ FPDMA QUEUED
  60 00 3f ff ff ff 4f 00   1d+23:54:22.777  READ FPDMA QUEUED
  60 00 01 ff ff ff 4f 00   1d+23:54:22.764  READ FPDMA QUEUED

Error 475 occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00   1d+23:53:43.911  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00   1d+23:53:43.909  READ FPDMA QUEUED
  60 00 80 ff ff ff 4f 00   1d+23:53:43.908  READ FPDMA QUEUED
  61 00 08 ff ff ff 4f 00   1d+23:53:43.880  WRITE FPDMA QUEUED
  61 00 50 b8 81 3c 40 00   1d+23:53:43.876  WRITE FPDMA QUEUED

Error 474 occurred at disk power-on lifetime: 54608 hours (2275 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 c1 ff ff ff 4f 00   1d+23:52:52.111  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00   1d+23:52:52.111  READ FPDMA QUEUED
  60 00 c0 ff ff ff 4f 00   1d+23:52:52.111  READ FPDMA QUEUED
  60 00 3f ff ff ff 4f 00   1d+23:52:52.110  READ FPDMA QUEUED
  60 00 03 ff ff ff 4f 00   1d+23:52:52.094  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     54622         2371187088

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Joined
Jun 2, 2019
Messages
591
Based on the first bad block and your partition table, it looks like it’s in the following partition
87033856 5860533134 2.7T Linux LVM

If you have another drive of equal size or larger, you might be able to perform a bare metal copy of the bad disk to a new disk, and hope that it can read the bad blocks. Assuming the bad region is localized and the bare metal copy fails at the same location, you can start the bare metal copy from the opposite end of the drive and hope the bad sectors are not anything critical. I have done this on drives that had a failing swap space. There are many rescue CDs that can do this. I have done it in the past with G4L, but it’s been ages since I have had to do it. Just make sure you are copying in the right direction.

 
Last edited:
Joined
Jun 2, 2019
Messages
591
can i try to overwrite the bad sectors/blocks?
There are posts on various forums of how to calculate the sector and try to reallocate the sectors, but step one is to perform a bare metal disk image in the event it goes wrong.
 
Joined
Jun 2, 2019
Messages
591
any link on how to do the bare metal disk image?
Find a disk cloning bootable CD image and clone the bad drive to a new one of equal or larger side. There are simply too many disk cloning/rescue CD images to give you a cook book tutorial
 
Top