Rebuilding a pool - Snapshots?

Shrdlu

Dabbler
Joined
Mar 23, 2019
Messages
20
Hello all,

I have a pool with 2 x SSD in mirror. It is called Apps and holds the dataset Iocage. The only jail is for Plex which has 1 mountpoint into another pool.

I want to completely destroy and rebuild the pool but i am unsure how best to back up the data. I am thinking that need to do the following:

1) Snapshot the pool
2) Disconnect and wipe the drives
3) Re-add the drives in a new mirror
4) Use the snapshot form (1) to clone to a new dataset
5) Remake the mounting point

Is this correct? A snapshot of a wiped disk seems like it should not work?


The reason for doing this is that only 1 of the 2 disks is giving a SMART test. Both disks are the same brand, model and size. Originally the pool was a single SSD and at a later date I added a second disk in mirror using a guide on here. The pool works fine, but the lask of SMART test on one of them has me worried there is an underlying problem that has not shown itself. Hence the rebuild from scratch.
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
Step 2 will destroy all the data !!!
 

Shrdlu

Dabbler
Joined
Mar 23, 2019
Messages
20
So what is the alternative method to back up the data to rebuild the pool/mirror?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
Is this correct? A snapshot of a wiped disk seems like it should not work?
You're right, it won't work at all.

The reason for doing this is that only 1 of the 2 disks is giving a SMART test.
If you mean that you are seeing SMART tests reporting errors on one of the disks, you don't need to wipe the pool just replace that disk. (rebuilding the pool on the two disks you have ... one being bad... isn't a solution).
 

Dan Tudora

Patron
Joined
Jul 6, 2017
Messages
276
hello
1) Snapshot the pool
good
but you must, repeat MUST send snapshot to somewhere, another pool/disk with USB connection/other FN server /everywhere
BTW like in some rules of FORUM you must provide HW/SW you have to try to HELP YOU
you must try replication from GUI to another "recipient" and after that back
 

Shrdlu

Dabbler
Joined
Mar 23, 2019
Messages
20
@sretalla it is more that one of the drives is not running a SMART test at all from what i can figure out.

I use the script ZPool & SMART status report with FreeNAS config backup to produce a daily report on the system. For the two drives in question (ada2 and ada3) the report looks like this:
Screenshot 2020-12-18 144327.jpg


The smartctl -a for the drives give the following outputs:

ada2
Code:
root@XXXX[~]# smartctl -a /dev/ada2
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37480G
Serial Number:    50026B76837E6E8C
LU WWN Device Id: 5 0026b7 6837e6e8c
Firmware Version: SBFKK1B3
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Dec 18 13:40:33 2020 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Total time to complete Offline
data collection:                (65535) seconds.
Offline data collection
capabilities:                    (0x00)         Offline data collection not supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2701
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       8
170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/8
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       35 (Average 21)
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       6
194 Temperature_Celsius     0x0022   029   037   000    Old_age   Always       -       29 (Min/Max 20/37)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   097   097   000    Old_age   Offline      -       97
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       2868
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       2631
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       934
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       21
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       35
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       237504


ada3
Code:
root@XXXX[~]# smartctl -a /dev/ada3
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37480G
Serial Number:    50026B7683A8DCFB
LU WWN Device Id: 5 0026b7 683a8dcfb
Firmware Version: SBFKB1E1
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Dec 18 13:42:42 2020 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (65535) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  30) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   000   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       2817
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       1
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       1
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       13
170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/9
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       22 (Average 9)
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       1
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       6
194 Temperature_Celsius     0x0022   071   061   000    Old_age   Always       -       29 (Min/Max 20/39)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       1
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       1
231 SSD_Life_Left           0x0000   001   001   000    Old_age   Offline      -       99
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       3718
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       2712
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       870
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       9
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       22
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       152672

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      2733         -
# 2  Short offline       Completed without error       00%      2565         -
# 3  Extended offline    Completed without error       00%      2398         -
# 4  Short offline       Completed without error       00%      2397         -
# 5  Short offline       Completed without error       00%      2229         -
# 6  Short offline       Completed without error       00%      2061         -
# 7  Short offline       Completed without error       00%      1893         -
# 8  Short offline       Completed without error       00%      1725         -
# 9  Extended offline    Completed without error       00%      1678         -
#10  Short offline       Completed without error       00%      1557         -
#11  Short offline       Completed without error       00%      1388         -
#12  Short offline       Completed without error       00%      1220         -
#13  Short offline       Completed without error       00%      1147         -
#14  Short offline       Completed without error       00%      1123         -
#15  Short offline       Completed without error       00%      1099         -
#16  Short offline       Completed without error       00%      1075         -
#17  Short offline       Completed without error
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK, so I see that you're showing errors on both, although they should not be fatal and ZFS will be protecting your data from corruption even with those errors.

You should certainly run a manual SMART control (or fix the scheduled ones) for ada2 (smartctl -t long /dev/ada2) and have a look after a while to see the results again.
zpool status -v should confirm if you have any errors in your data (I suspect you will see none)

The right process would be to just replace the drives one at a time if you decide they are both bad.
 

Shrdlu

Dabbler
Joined
Mar 23, 2019
Messages
20
Ok this is getting odd.

When running smartctl -t long /dev/ada2 i get the following

Code:
root@xxxx[~]# smartctl -t long /dev/ada2
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Self-test functions not supported

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Command "Execute SMART Extended self-test routine immediately in off-line mod


But smartctl -t long /dev/ada3 works fine. They are both the same make and model of SSD :/
 
Top