GlusterFS in TrueNAS Scale Bluefin - Problem in Bitrot/scrubbing/healing

SRP · Jun 7, 2023

I have a problem with Bitrot/scrubing/healing, Someone please help

Description of Problem:

1. When a file is corrupted,

a) The scrub is sometimes identifying that the file is corrupted (I can see the entry when the scrub status is checked), Can see log for corrupt entry in log and the file is not self-healing over time
b) The scrub sometimes not writing an entry in the scrub status but has healed automatically - No log for self-heal is seen

Why there is a difference in the process of scrubbing for file corruption, how this scrubbing and healing is queued?

2. Our Gluster volume is set with scrub frequency - Daily

a) Sometimes the number of skipped files is very high, for example, if we have 100 files it is skipping 88 files.
b) The count of skipped files varies on a daily basis, sometimes 88, sometimes 30 etc.., Why?
c) Suppose if, before completing the current scrubbing process, the next cycle of scrubbing is due or triggered, will the current scrubbing process skips the remaining file and fulfill the new triggering or will it wait for the current scrubbing process to get complete and fulfill the new trigger?
Please clarify

Kris Moore · Jun 8, 2023

@SRP - What kind of cluster layout did you create? Can you post output of `gluster volume info all` so we can take a look?

SRP · Jun 8, 2023

@Kris Moore

Volume Name: Test
Type: Disperse
Volume ID: abcde123-4567-3456-abc4-qwerty146azsx
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: Test-1:/mnt/gluster1
Brick2: Test-2:/mnt/gluster2
Brick3: Test-3:/mnt/gluster3
Options Reconfigured:
geo-replication.indexing: On
storage.build-pgfid: on
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
features.bitrot: on
features.scrub: Active
features.scrub-throttle: normal
features.scrub-freq: daily
storage.health-check-interval: 30

Kris Moore · Jun 9, 2023

Thanks, layout seems sane. Do you see a list of files that are unhealthy in this output?

# gluster volume heal Test info summary

It may also be good to check /var/log/glusterfs/ for any heal-related log information.

SRP · Jun 10, 2023

@Kris Moore Yes indeed I can see entries in the scrub status, but the self heal doesn't happen/triggering.

In brick 1, There are two files which were corrupted, I can see entries
In brick 3, There is one file which is corrupted, I can see entries,

In Brick 2, There was one file which is corrupted, there is no entry in scrub status. But when checked, it has slef-healed

Note: All the files are deliberately corrupted by me for testing and understanding the healing process

# gluster volume heal Test info summary

Brick Test-1:/mnt/gluster1
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick Test-2:/mnt/gluster2
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick Test-3:/mnt/gluster3
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

In the scrub status I can see skipped files, why the scrub process skipping files?

I

Kris Moore · Jun 12, 2023

Just getting back to this now. Where are you seeing entries for corrupt files? And how are you testing / simulating the corruption? From the data provided so far I don't really have anything to point to that looks like a bug / error. Do you have anything in /var/log/glusterfs on the nodes to provide evidence that nodes aren't scanning / scrubbing / healing automatically?

SRP · Jun 13, 2023

@Kris Moore

Where are you seeing entries for corrupt files?
In the Scrub status entry - gluster volume bitrot test scrub status

how are you testing / simulating the corruption?
By doing dd on part file present in any of the nodes - dd if=/dev/urandom of=<file name> bs=1024 seek=$((RANDOM%10)) count=1

Do you have anything in /var/log/glusterfs on the nodes to provide evidence that nodes aren't scanning / scrubbing / healing automatically?
No

There are few thing that I want to understand in the scrubbing/healing process

1. If a file is corrupted and if there is a corrupt entry in scrub status, will the file get self heal or does the file need to be restored manually?
2. If a file is corrupted and if it gets self-healed, there will be no entry in scrub (is my understanding correct?)

I have encountered both the above scenarios, and now I am clueless on what basis some files are getting self-healed (No entry in scrub Status) and why some files are not getting self-healed (There will be a corrupt entry in the scrub- Point1), why there is a difference in the process of scrubbing/healing if a file is corrupted?

And I want to understand why i am getting lot of skipped files entry in scrub status?

Kris Moore · Jun 14, 2023

SRP said:
@Kris Moore

There are few thing that I want to understand in the scrubbing/healing process

1. If a file is corrupted and if there is a corrupt entry in scrub status, will the file get self heal or does the file need to be restored manually?

We have self-heal enabled out of box, so I believe the case is that the heal will trigger automatically if scrub detects some silent bitrot (which is what you are simulating). This is why your heal summary shows things as healthy again.

SRP said:
2. If a file is corrupted and if it gets self-healed, there will be no entry in scrub (is my understanding correct?)

That is my understanding as well, scrub just detects the bitrot, heal would only show errors / notices if it was unable to restore the file to healthy from the redundancy of the cluster, otherwise you would see errors in the heal summary.

SRP said:
I have encountered both the above scenarios, and now I am clueless on what basis some files are getting self-healed (No entry in scrub Status) and why some files are not getting self-healed (There will be a corrupt entry in the scrub- Point1), why there is a difference in the process of scrubbing/healing if a file is corrupted?

And I want to understand why i am getting lot of skipped files entry in scrub status?

The skipped portion I don't understand either, not finding good information on it online. However to do some further testing, I'd suggest you repeat the experiment, but with this workflow:

1. Create files on cluster from client
2. Calculate sha256 checksums of files on client side
3. Run your 'dd's to silently corrupt the source data on one node
4. Kick off scrub and see if the heal output still shows 0 bad files after it runs
5. Compare checksums client side
6. Run the 'dd's to silently corrupt the source data again on a 2nd node
7. Kick off scrub and see if heal still takes place
8. Lastly check client side checksums again.

If after corrupting 2/3 of the nodes in that sequence and the client data doesn't show any checksums changes, then it would appear the scrub / heal is taking place properly as expected.

Note, this is somewhat of an unrealistic scenario, because gluster bricks are running on top of ZFS, which has its own bitrot detection in place and will self-heal any bad blocks coming from physical disks automatically. Gluster would never see the bitrot in this case. The only way this scenario could happen is in the case you are performing, I.E. if somebody or some process was injecting bad data in between gluster and ZFS.

SRP · Jun 14, 2023

@Kris Moore

Thank You for your inputs

We have self-heal enabled out of box, so I believe the case is that the heal will trigger automatically if scrub detects some silent bitrot (which is what you are simulating). This is why your heal summary shows things as healthy again.
At least for me No, If there is a corrupt entry in the scrub status, the file is not self-healing, seems it wants me to restore the correct file. I wanted to confirm from you whether self-heal occurs, going by your inputs it seems self heal don't happen for corrupt entries.

The skipped portion I don't understand the either, not finding good information on it online. However to do some further testing, I'd suggest you repeat the experiment, but with this workflow.....
This is done with different scenarios, still the same, no improvement/clue on skipping. In fact when I corrupt a file in any brick, sometimes the client shows the correct checksum of the file. On rare occasions, the checksum in client changes of a file is corrupted in any brick.

SRP · Oct 24, 2023

@Kris Moore

I have been testing the same Gluster bitrot in Ubuntu 22, the scrub in Gluster is not at all detecting the corrupt entries. Please suggest some remedies

Kris Moore · Oct 30, 2023

SRP - So the same issue is existing in Gluster on Ubuntu 22? Which version of gluster?

SRP · Oct 31, 2023

@Kris Moore The versions are Ubuntu 22.04.2 LTS + glusterfs 10.1

Important Announcement for the TrueNAS Community.

GlusterFS in TrueNAS Scale Bluefin - Problem in Bitrot/scrubbing/healing

SRP

Cadet

Kris Moore

SVP of Engineering

SRP

Cadet

Kris Moore

SVP of Engineering

SRP

Cadet

Kris Moore

SVP of Engineering

SRP

Cadet

Kris Moore

SVP of Engineering

SRP

Cadet

SRP

Cadet

Kris Moore

SVP of Engineering

SRP

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

GlusterFS in TrueNAS Scale Bluefin - Problem in Bitrot/scrubbing/healing

Cadet

SVP of Engineering

Cadet

SVP of Engineering

Cadet

SVP of Engineering

Cadet

SVP of Engineering

Cadet

Cadet

SVP of Engineering

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "GlusterFS in TrueNAS Scale Bluefin - Problem in Bitrot/scrubbing/healing"

Similar threads