Hard Drive Degrade: Upgrading to ZFS2 - Strategy

Gamer0126

Dabbler
Joined
Mar 24, 2015
Messages
25
So this is a continueation of my following issue: link

I haven't found any issues with the Hard Drives after another round of Long SMART tests. The drives are still in a degraded state and I have no clue why. I am now stuck with either buying similar replacement Hard Drives, or invest into a pool with ZFS2, especially with this experience, Larger Drives and ones with CMR . I have decided if I go this route I will buy: WD Red Plus 8TB

First I am posting this in this Section hoping more eyes will see my current issue in hopes there is some solution to my current issue.

Second, here is my plan to move over the data (below), I am asking If there are any better ways to move the data than my current plan?


  1. On my Motherboard I have 6 extra Sata Ports where I can connect three 8TB drives.
  2. Create a temporary pool (TempPool) ,
  3. move the data temporarily to that pool (TempPool) ,
  4. kill my current pool (Degraded Pool),
  5. Replace current drives with New Drives
  6. create New Pool (NewPoolZFS2) with New Drives (WD 8TB x 6/CMR)
  7. Transfer the data to that pool (NewPoolZFS)
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Whats ZFS2?
WD Red Plus are CMR so should work a lot better than your current ancient SMR drives
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I am asking If there are any better ways to move the data than my current plan?
I could give you a few answers however you have not provided enough data for us to provide a single good answer.

1. Assumption you still have a RAIDZ1.
2. Have you corrected the data errors in your pool yet? (delete corrupt files)
errors: Permanent errors have been detected in the following files:
/mnt/NAS2/iocage/jails/qbittorrent/root/Downloads/PS2 Pack-4/Godof War (USA).7z
<0x1ba>:<0x18a044>

3. And then clear the degraded pool status?
4. How much data is stored that must be transferred? And is all of it mandatory to transfer, can you thin it out some if it's a huge amount of data?
5. I would stop writing data to your NAS with the SMR drives installed in order to limit further corruption. Reading is fine, writing is not.

Possible options:
1. If you wanted to just replace the SMR drives with CMR drives, well that is a simple resilver process but you would still have a RAIDZ1 setup.
2. I would not use any of the SMR drives for any data storage to move data around.
3. In my opinion, the best way to move this data IF you wanted to retain all the setup and paths would be to build a new system (even if temporary) and then replicate the data to the new server. You can use Rsync (probably the easiest) or Replication. This would allow you to create that RAIDZ2 and then migrate all that data. Creating a temporary system is not a good option if you do not have the means to do this, however I would go this route, you could use any old computer that has six SATA ports, at least 8GB RAM (ECC/NON-ECC), and an Ethernet port. Realtek NIC is not good but it might be slower or it could crash, this is why we don't like RealTek NICs. Once the data has been replicated, export your pool, move the drives to your NAS and power it up and import the pool. Hopefully all is working. And I may have some of those steps wrong, you should research it.

Cheers
 

Gamer0126

Dabbler
Joined
Mar 24, 2015
Messages
25
I could give you a few answers however you have not provided enough data for us to provide a single good answer.

1. Assumption you still have a RAIDZ1.
2. Have you corrected the data errors in your pool yet? (delete corrupt files)
errors: Permanent errors have been detected in the following files:
/mnt/NAS2/iocage/jails/qbittorrent/root/Downloads/PS2 Pack-4/Godof War (USA).7z
<0x1ba>:<0x18a044>

3. And then clear the degraded pool status?
4. How much data is stored that must be transferred? And is all of it mandatory to transfer, can you thin it out some if it's a huge amount of data?
5. I would stop writing data to your NAS with the SMR drives installed in order to limit further corruption. Reading is fine, writing is not.

Possible options:
1. If you wanted to just replace the SMR drives with CMR drives, well that is a simple resilver process but you would still have a RAIDZ1 setup.
2. I would not use any of the SMR drives for any data storage to move data around.
3. In my opinion, the best way to move this data IF you wanted to retain all the setup and paths would be to build a new system (even if temporary) and then replicate the data to the new server. You can use Rsync (probably the easiest) or Replication. This would allow you to create that RAIDZ2 and then migrate all that data. Creating a temporary system is not a good option if you do not have the means to do this, however I would go this route, you could use any old computer that has six SATA ports, at least 8GB RAM (ECC/NON-ECC), and an Ethernet port. Realtek NIC is not good but it might be slower or it could crash, this is why we don't like RealTek NICs. Once the data has been replicated, export your pool, move the drives to your NAS and power it up and import the pool. Hopefully all is working. And I may have some of those steps wrong, you should research it.

Cheers

1. Assumption you still have a RAIDZ1.
Yes it is still up, degraded, but still running. I am able to access the data.
2. Have you corrected the data errors in your pool yet? (delete corrupt files)
errors: Permanent errors have been detected in the following files:
/mnt/NAS2/iocage/jails/qbittorrent/root/Downloads/PS2 Pack-4/Godof War (USA).7z
<0x1ba>:<0x18a044>
I have not. I have no problem deleting the file, but I wasn't sure if it would solve the issue or not. Or how to correct the data errors either.
3. And then clear the degraded pool status?
How can I do that I thought scrubbing the pool would fix it. I ran a scrub once I replaced the drives which failed SMART testing, but all the other drives are still degraded even though they have all passed SMART Long testing without errors.
4. How much data is stored that must be transferred? And is all of it mandatory to transfer, can you thin it out some if it's a huge amount of data?
It is about 20TB of data. It was around 30TB and I did thin it out.
5. I would stop writing data to your NAS with the SMR drives installed in order to limit further corruption. Reading is fine, writing is not.
I have done this. The server is limited to only being read.


Thank you for the response. I have begun the moving process to a new Pool. I created a new Pool that is in a stripe config just for the time being. I started the replication process however I received the following error:

Code:
* Replication " NAS2/repfolder,...,NAS2/aftfolder - Temp/temp"
failed: resume token contents: nvlist version: 0 object = 0x18a044 offset =
0x99ec0000 bytes = 0xbcd5b305e60 toguid = 0xc31ddbb83be0c178 toname =
NAS2/nas2@auto-2024-02-02_15-47 compressok = 1 warning: cannot
send 'NAS2/nas2@auto-2024-02-02_15-47': Input/output error
cannot receive resume stream: checksum mismatch or incomplete stream.
Partially received snapshot is saved. A resuming stream can be generated on
the sending system by running: zfs send -t
1-1096b97e59-f8-789c636064000310a500c4ec50360710e72765a526973030b82c9000abc1904f4b2b4e2d01c9bc9909936743924faa2c492d06d209c946d167b931f597e4a79766a63030541c7c60bde3b6ec610f24794eb07c5e626e2a03839b7f90bf9f8ba39f63b0917e5a7e517e5e4a625e62b19143626949beae91819189ae811110c51b9aea9a9843ddc1cd80f057727e6e41516a71717e36031c0000bd362844



Ok I am really at a loss as how to fix this issue because now I cannot move over the data. Any help or suggestions greatly appreciated.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I have never used replication so I am not the correct person to help you out here. Either do some more internet surfing on how to replicate of wait until someone can help you.

I would recommend you backup your data which you are trying to do before trying to fix any data errors. You may need to delete those few files that are corrupt to complete this.

Once you have deleted the corrupt files, the command zpool clear poolname and the problem should be cleared. You can run a scrub after that to ensure all is good if you desire.
 

Gamer0126

Dabbler
Joined
Mar 24, 2015
Messages
25
I have never used replication so I am not the correct person to help you out here. Either do some more internet surfing on how to replicate of wait until someone can help you.

I would recommend you backup your data which you are trying to do before trying to fix any data errors. You may need to delete those few files that are corrupt to complete this.

Once you have deleted the corrupt files, the command zpool clear poolname and the problem should be cleared. You can run a scrub after that to ensure all is good if you desire.
Thanks for your guidance it has helped. All my drives are now online however I still have the Pool status in the GUI as unhealthy. Not sure why. I did the clear command and the scrub. I notice two new files popped up. One of the files doesn't even exist so I am again at a loss. I am going to do one more final scrub and I hope it will clear the issues below. There is no list of meta data error at this point online file errors.


Code:
      pool: NAS2
     state: ONLINE
    status: One or more devices has experienced an error resulting in data
            corruption.  Applications may be affected.
    action: Restore the file in question if possible.  Otherwise restore the
            entire pool from backup.
       see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
      scan: scrub repaired 0B in 08:51:21 with 9 errors on Sun Feb  4 01:55:33 2024
    config:
    
            NAME                                            STATE     READ WRITE CKSUM
            NAS2                                     ONLINE       0     0 0
              raidz1-0                                      ONLINE       0     0 0
                gptid/40e0d18e-b87c-11ee-9d62-0cc47a6c7ce8  ONLINE       0     0 0
                gptid/a137c8b6-902a-11e6-9369-0cc47a6c7ce8  ONLINE       0     0 0
                gptid/23e26fbe-ff02-11ec-80d3-0cc47a6c7ce8  ONLINE       0     0 0
                gptid/72f00759-79c5-11ed-8f2f-0cc47a6c7ce8  ONLINE       0     0 0
                gptid/a3608b0f-902a-11e6-9369-0cc47a6c7ce8  ONLINE       0     0 0
                gptid/7cc276f3-b376-11ee-a269-0cc47a6c7ce8  ONLINE       0     0 0
                gptid/b25e6353-b447-11ee-ac0b-0cc47a6c7ce8  ONLINE       0     0 0
                gptid/a5bb4a90-902a-11e6-9369-0cc47a6c7ce8  ONLINE       0     0 0
    
    errors: Permanent errors have been detected in the following files:
            gptid/a5bb4a90-902a-11e6-9369-0cc47a6c7ce8  ONLINE       0     0 0

        errors: Permanent errors have been detected in the following files:
    
            NAS2/nas2@auto-2024-02-02_15-47:/Movies/Home (2015)/Home.2015.1080p.BluRay.DTS-HD.MA.7.1.x264-LEGi0N.mkv
            NAS2/iocage/jails/qbittorrent/root@auto-2024-02-02_15-47:/Downloads/PS2 Pack-4/God of War (USA).7z

[\Code]
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
You have two corrupt files as listed above:
Home.2015.1080p.BluRay.DTS-HD.MA.7.1.x264-LEGi0N.mkv
God of War (USA).7z

You can try to copy these two files, then delete them and run zpool clear again.
 

Gamer0126

Dabbler
Joined
Mar 24, 2015
Messages
25
Thanks for all the help my pool has started the replication process to my temp new pool.

The file issues listed below:
Permanent errors have been detected in the following files:
/mnt/NAS2/iocage/jails/qbittorrent/root/Downloads/PS2 Pack-4/Godof War (USA).7z
along with other files


This was fixed by deleting listed files with errors then following this procedure:
I have no idea why completing the full scrub did not clear the error, after clearing the zpool.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I have no idea why completing the full scrub did not clear the error, after clearing the zpool.
Because it doesn't. I've never seen that happen automatically.
 

Gamer0126

Dabbler
Joined
Mar 24, 2015
Messages
25
I was replicating late last night and it errored out stating this is the file that errored out:

errors: Permanent errors have been detected in the following files:

NAS2/nas2@auto-2024-02-02_15-47:/Movies/Home (2015)/Home.2015.1080p.BluRay.DTS-HD.MA.7.1.x264-LEGi0N.mkv



I was wondering how to get access to the location in the underlined section. I deleted the original file but it looks like it is a snap shot that is wrong, honestly I have no idea how to get to this location. Any help would be appreciated.

Thanks,
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
You have two corrupt files as listed above:
Home.2015.1080p.BluRay.DTS-HD.MA.7.1.x264-LEGi0N.mkv
God of War (USA).7z

You can try to copy these two files, then delete them and run zpool clear again.
I think those files are both in snapshots - which means they cannot be deleted. The snapshot needs to be deleted
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I think those files are both in snapshots
Since i don't take snapshots (I probably should), what is the tale tail? The date string?
nas2@auto-2024-02-02_15-47:


I'll try to remember these things.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
ZFS names snapshots by appending the snapshot name to the dataset name with an @ separating them.
 
Top