winnielinnie
MVP
- Joined
- Oct 22, 2019
- Messages
- 3,641
A thought just occurred to me, and perhaps I'm approaching it the wrong way.
Does SSD "garbage collection" conflict with the principle of ZFS "copy-on-write"?
One of the reasons for data integrity with ZFS is the "copy-on-write" feature. Once a record is written, it remains as such: never modified in-place, never relocated in-place. This is true for user data and metadata. This very record has a checksum value assigned specifically to it.
However, SSD controllers do a lot of their own internal housekeeping, notably "garbage collection". From what I understand, they are always physically reading, rearranging, and re-copying pages and blocks behind-the-scenes, unknown to the operating system or filesystem.
Doesn't this negate the assurances of "copy-on-write"?
What if there is a mistake during this internal housekeeping? (I'd assume ZFS will detect a mismatch against the checksum in a future scrub or read.)
However, what if the SSD's controller runs its "garbage collection", yet it does it against areas of the disk where ZFS stores the actual checksums? In other words, the user data records are not touched, yet it messes up when trying to consolidate/rearrange/re-copy the pages that contain the checksums. Wouldn't ZFS interpret this as "data corruption", even though the user data is still intact and perfectly fine?
Does SSD "garbage collection" conflict with the principle of ZFS "copy-on-write"?
One of the reasons for data integrity with ZFS is the "copy-on-write" feature. Once a record is written, it remains as such: never modified in-place, never relocated in-place. This is true for user data and metadata. This very record has a checksum value assigned specifically to it.
However, SSD controllers do a lot of their own internal housekeeping, notably "garbage collection". From what I understand, they are always physically reading, rearranging, and re-copying pages and blocks behind-the-scenes, unknown to the operating system or filesystem.
Doesn't this negate the assurances of "copy-on-write"?
What if there is a mistake during this internal housekeeping? (I'd assume ZFS will detect a mismatch against the checksum in a future scrub or read.)
However, what if the SSD's controller runs its "garbage collection", yet it does it against areas of the disk where ZFS stores the actual checksums? In other words, the user data records are not touched, yet it messes up when trying to consolidate/rearrange/re-copy the pages that contain the checksums. Wouldn't ZFS interpret this as "data corruption", even though the user data is still intact and perfectly fine?