Force delete files/directories on destination of Cloud Sync Task upon IO errors

nasterd

Cadet
Joined
Mar 15, 2024
Messages
1
Hello all you fine people.

This is my first thread on this forum so apologies in advance if everything is not formatted as it should.

As the title suggests, I have been running TrueNAS Scale on my old HP ProLiant MicroServer G7 N54L (16GB ECC Memory - 3x 6TB HHD), and it has been working very well for basic apps such as Plex and Immich, albeit slow.

One problem I have been having that I cannot fix from the start is when it comes to the cloud sync task I have set up to my AWS S3 bucket (default storage type). There are always errors, either from corrupted files on transfer, or files being altered by the system while the sync task is running.

Here is a breakdown:
  • Cloud sync task runs every night at midnight
  • Takes 4-6 hours before completing with a failed state due to errors (e.g. md5 hash differ)
    • See spoiler for detailed error logs
  • Although task ends in a failed state, all the data is uploaded to S3 - can access/download them
  • Due the the IO errors, the job is not deleting directories as there were IO errors/not deleting files as there were IO errors
  • This is causing my S3 bucket to continue growing, well larger than what is currently on my TrueNAS device
    • 1.47TB on TrueNAS
    • ~2TB+ in S3 bucket
Although I am happy that all of my files are being uploaded, the result is not really a backup that I could simply pull and be back up an running in the case that something happens on my TrueNAS system.

Snippet of error code:
<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting files as there were IO errors <3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting directories as there were IO errors <3>ERROR : Attempt 3/3 failed with 5 errors and: corrupted on transfer: md5 hash differ "ec64612388c9539d91030c4fa0f4779f" vs "9866f31e1a737f7109bec42c3ab33943" <6>INFO : Transferred: 9.202 GiB / 9.202 GiB, 100%, 2.275 MiB/s, ETA 0s Errors: 5 (retrying may help) Checks: 1352301 / 1352301, 100% Transferred: 1684 / 1684, 100% Elapsed time: 5h50m25.3s Failed to sync with 5 errors: last error was: corrupted on transfer: md5 hash differ "ec64612388c9539d91030c4fa0f4779f" vs "9866f31e1a737f7109bec42c3ab33943"

See spoiler for errors of last 5 sync jobs:

<6>INFO : ix-applications/app_migrations.json: Updated modification time in destination
<6>INFO : Share/.DS_Store: Copied (replaced existing)
<5>NOTICE: ix-applications/k3s/data/current: Can't follow symlink without -L/--copy-links
<5>NOTICE: ix-applications/k3s/server/kine.sock: Can't transfer non file/directory
<5>NOTICE: ix-applications/k3s/server/node-token: Can't follow symlink without -L/--copy-links
<5>NOTICE: ix-applications/k3s/server/agent-token: Can't follow symlink without -L/--copy-links
<6>INFO :
Transferred: 18.008 KiB / 1.833 GiB, 0%, 0 B/s, ETA -
Checks: 8 / 28, 29%
Transferred: 1 / 17, 6%
... 653790 more lines ...
* ix-applications/k3s/ag…data_set_0/output_3.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_1.pb: checking
* ix-applications/k3s/ag…ong_example/model.onnx: checking
* ix-applications/k3s/ag…st_wrap_pad/model.onnx: checking
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking

<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting files as there were IO errors
<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting directories as there were IO errors
<3>ERROR : Attempt 3/3 failed with 3 errors and: failed to open source object: open /mnt/BigBoi/ix-applications/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/643/fs/run/postgresql/.s.PGSQL.5432.lock: no such file or directory
Failed to sync with 3 errors: last error was: failed to open source object: open /mnt/BigBoi/ix-applications/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/643/fs/run/postgresql/.s.PGSQL.5432.lock: no such file or directory
_____________________

<6>INFO :
Transferred: 2.172 MiB / 33.938 GiB, 0%, 0 B/s, ETA -
Checks: 7 / 15, 47%
Transferred: 0 / 6200, 0%
Elapsed time: 3.0s
Checking:
* ix-applications/app_migrations.json: checking
* Immich/data/postgresql.conf: checking
* Immich/data/postmaster.opts: checking
* Immich/data/postmaster.pid: checking
... 448059 more lines ...
* ix-applications/k3s/ag…/test_xor3d/model.onnx: checking
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_1.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_0.pb: checking
* ix-applications/k3s/ag…/test_xor4d/model.onnx: checking

<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting files as there were IO errors
<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting directories as there were IO errors
<3>ERROR : Attempt 3/3 failed with 4 errors and: Put "https://s3.us-east-1.amazonaws.com/...6/5b/c65b7311-ac43-4875-8522-76c8b5b38cfb.mp4": can't copy - source file is being updated (size changed from 90177584 to 90439728)
Failed to sync with 4 errors: last error was: Put "https://s3.us-east-1.amazonaws.com/...6/5b/c65b7311-ac43-4875-8522-76c8b5b38cfb.mp4": can't copy - source file is being updated (size changed from 90177584 to 90439728)

____________________

<6>INFO : ix-applications/app_migrations.json: Updated modification time in destination
<6>INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Checks: 12 / 16, 75%
Elapsed time: 3.1s
Checking:
* Immich/databackup/immi…024-03-11_21-16-31.sql: checking
* Share/App Data/.DS_Store: checking
* Share/General Backup/.DS_Store: checking
* Share/Plex/.DS_Store: checking
... 384620 more lines ...
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_1.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_0.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_1.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_0.pb: checking

<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting files as there were IO errors
<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting directories as there were IO errors
<3>ERROR : Attempt 3/3 failed with 1 errors and: Put "https://s3.us-east-1.amazonaws.com/...oi/ix-applications/k3s/server/db/state.db-wal": can't copy - source file is being updated (mod time changed from 2024-03-13 03:03:57.459827421 +0100 CET to 2024-03-13 03:03:58.395797475 +0100 CET)
Failed to sync: Put "https://s3.us-east-1.amazonaws.com/...oi/ix-applications/k3s/server/db/state.db-wal": can't copy - source file is being updated (mod time changed from 2024-03-13 03:03:57.459827421 +0100 CET to 2024-03-13 03:03:58.395797475 +0100 CET)

_______________________

<6>INFO : ix-applications/app_migrations.json: Updated modification time in destination
<5>NOTICE: ix-applications/k3s/server/kine.sock: Can't transfer non file/directory
<5>NOTICE: ix-applications/k3s/server/node-token: Can't follow symlink without -L/--copy-links
<5>NOTICE: ix-applications/k3s/server/agent-token: Can't follow symlink without -L/--copy-links
<5>NOTICE: ix-applications/k3s/data/current: Can't follow symlink without -L/--copy-links
<6>INFO :
Transferred: 0 B / 194.259 MiB, 0%, 0 B/s, ETA -
Checks: 11 / 49, 22%
Transferred: 0 / 2, 0%
Elapsed time: 3.3s
... 435406 more lines ...
* ix-applications/k3s/ag…_data_set_0/input_0.pb: checking
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_1.pb: checking

<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting files as there were IO errors
<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting directories as there were IO errors
<3>ERROR : Attempt 3/3 failed with 2 errors and: corrupted on transfer: md5 hash differ "b29c0be7c639d6ececce6f71162ce29e" vs "fc2165fdc36431b36edc02b8df69aa38"
Failed to sync with 2 errors: last error was: corrupted on transfer: md5 hash differ "b29c0be7c639d6ececce6f71162ce29e" vs "fc2165fdc36431b36edc02b8df69aa38"

_________________________

<6>INFO : ix-applications/app_migrations.json: Updated modification time in destination
<5>NOTICE: ix-applications/k3s/server/kine.sock: Can't transfer non file/directory
<5>NOTICE: ix-applications/k3s/server/node-token: Can't follow symlink without -L/--copy-links
<5>NOTICE: ix-applications/k3s/server/agent-token: Can't follow symlink without -L/--copy-links
<5>NOTICE: ix-applications/k3s/data/current: Can't follow symlink without -L/--copy-links
<6>INFO :
Transferred: 0 B / 194.259 MiB, 0%, 0 B/s, ETA -
Checks: 11 / 49, 22%
Transferred: 0 / 2, 0%
Elapsed time: 3.3s
... 435406 more lines ...
* ix-applications/k3s/ag…_data_set_0/input_0.pb: checking
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking
* ix-applications/k3s/ag…data_set_0/output_0.pb: checking
* ix-applications/k3s/ag…_data_set_0/input_1.pb: checking

<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting files as there were IO errors
<3>ERROR : S3 bucket homelab-backup1998 path Homelab/BigBoi: not deleting directories as there were IO errors
<3>ERROR : Attempt 3/3 failed with 2 errors and: corrupted on transfer: md5 hash differ "b29c0be7c639d6ececce6f71162ce29e" vs "fc2165fdc36431b36edc02b8df69aa38"
Failed to sync with 2 errors: last error was: corrupted on transfer: md5 hash differ "b29c0be7c639d6ececce6f71162ce29e" vs "fc2165fdc36431b36edc02b8df69aa38"



Now I know that the cloud sync jobs run with rsync under the hood, so I have been trying to address the problem by researching into different options of rsync. I have found the different options that rsync utilizes, and one I believe would help is the --ignore-errors flag, however I have not seen anywhere either in the documentation or on the forums regarding specifying rsync options in the TrueNAS scale GUI.

The one forum post that had this issue was able to resolve it and add options to the rsync tasks was this post. I have access and amended the cloud_sync.py file to include the --ignore-errors flag, but no luck.

I am feeling a bit stuck here, so any expertise or input to A: help me address the IO errors, or B: help me to ignore the IO errors so the destination bucket files that are not longer on the host machine are deleted, bringing the bucket in sync instead of the bucket being one large dump of all the files that were (at least at one point of time) on my TrueNAS machine, would be greatly appreciated.

Cheers!
 
Top