Transferring file via share results in a corrupted file with different checksum each time

ma_er233

Cadet
Joined
Apr 21, 2023
Messages
6
I use a SMB share to access my files through ethernet on my local network. Recently, when I tried to copy some files, they all ended up corrupted.
For example, this image is obviously corrupted
Screenshot 2023-06-17 194606.png

If I use 7zip to get the SHA256 value of this file, it will return a different result each time.
Name: IMG_9652.HEIC Size: 1369825 bytes (1337 KiB) SHA256: ea8eb7070eb8d2eee8ee50669d1867d836abbb9f9d8b4a44cec4ca8e78759837 Name: IMG_9652.HEIC Size: 1369825 bytes (1337 KiB) SHA256: c986d2be092d822ed7c09eadd7b305ed3e3924f4ccf3edb61b14d31545320008 Name: IMG_9652.HEIC Size: 1369825 bytes (1337 KiB) SHA256: b6942f3cc019ebfb4342d83966e8133dcf9cae3c14e7884622079c111eaf1d79
If I close and open this image again, it will show a different pattern of corruption, corresponding to the changing checksum. Sometimes straight up crashing explorer.

However, the WebUI told me everything is going just fine. I then scrubbed twice, did an extended offline SMART test to all drives. Still, all passed without any error.
The only error message recently was a failed sync to TrueChart GitHub repository. I think that's just my internet dropped again and has nothing to do with my problem. There's no App and VM running before the problem occurred.
Screenshot 2023-06-17 195640.png

Also, when I'm using the WebUI, sometimes the connections would drop and "Waiting for Active TrueNAS controller to come up..." would appear.

So... Why is it doing this?
- Could be a malfunction ethernet card on my NAS, otherwise why is it giving me different checksum each time but reporting no errors? However, I upgraded an App after the problem occurred and it went just fine.
- Could be a corrupted OS? I had a blackout a few months ago. However, this problem didn't occur right after. I recently upgraded the OS, that could be the reason?
- Could be a corrupted pool? Sorry, I'm just a noob. But I guess it shouldn't behave like this if the pool was corrupted.

This NAS was more or less an experiment and all my important files have backups. So I'm not worrying a lot about a data loss. Just, why on earth is it doing this? Where should I start from to tackle the problem?

The debug file is also corrupted... Is there any additional info I should put here?
CPU: i5-4690K
Motherboard: ASUS Z97-A
RAM: Kingston DDR3 8G 1600 x4
Boot Drive: Kingston 120G SSD x2 (Mirror)
Storage: Toshiba 4T x4 (2 vdevs, each mirrored)

Thanks in advance
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Welcome to the forums.

Sorry to hear you're having trouble. Please take a few moments to review the Forum Rules, conveniently linked at the top of every page in red, and pay particular attention to the section on how to formulate a useful problem report, especially including a detailed description of your hardware. Corruptions are not expected with a properly designed and burned in ZFS system, but are certainly possible in other cases. Please outline what you've got for storage, how the storage is attached, mainboard, CPU, memory type, etc.

You've basically given no one anything to work with, so the responses will tend to be random guesses rather than anything useful.
 
Joined
Oct 22, 2019
Messages
3,641
If I use 7zip to get the SHA256 value of this file, it will return a different result each time.
Over the network, or directly on the server itself?

You can login, via SSH, to run the terminal command sha256sum against the file directly. (Rather than using 7zip to check its SHA256 hash over the network via SMB.)

Going by your "Show: Setup" button, it appears to be 4 Toshiba SATA drives connected to SATA ports on your motherboard. Non-ECC RAM. What is the network device? (Among other things that can be expanded upon that @jgreco explained.)

The hints you provided do sound like a network / network card issue. (Could also be RAM.)

This might also be a situation where you're saving files to the NAS server, and upon being written to storage a Fletcher4 checksum is calculated (server-side). Hence, any scrubs will turn out "fine". Yet, you ended up saving a broken file. (ZFS doesn't know what the original file is on your client: only what you send to the server, which it receives, and then generates a hash in RAM before being saved to permanent storage on your pool.)

Another thing to consider is the ethernet cable. (Replace it, or check to see if the connection is truly tight and flush.)
 
Last edited:

ma_er233

Cadet
Joined
Apr 21, 2023
Messages
6
Over the network, or directly on the server itself?

You can login, via SSH, to run the terminal command sha256sum against the file directly. (Rather than using 7zip to check its SHA256 hash over the network via SMB.)
It was over the network.
I tried your suggestion, checked the SHA256 value directly via SSH. And found that now they match with themselves and my backup each time. So the file system and the file themselves should be intact.
The hints you provided do sound like a network / network card issue. (Could also be RAM.)
I'm considering that as well. Since my entire setup except the HDDs are made out of second hand components, it's not a surprise to have some finicky parts. I reseated all my RAM, cleaned the contact of that PCIe network card and put it into a different slot.
This sort of worked. Now my WebUI is no longer dropping out, and one in three files transfer over Ethernet are not corrupted. I always thought computer hardware can be either working or broken, not in such lingering state. Apparently I'm wrong.

Router: ASUS RT-AX56U, wired on both end. CAT6 cable, seems of decent quality.
Network card:
IMG_0608 1.jpg
IMG_0609 1.jpg

4 SATA hard drives from Hitachi (Not Toshiba, I remembered it wrong. Sorry), plugged into motherboard SATA3 ports. 4 sticks of non-ECC DDR3 Memory running in dual channel 1600 MT/s

I think I can conclude that the problem is most likely on the network. I'll try to investigate further on that.
Thanks for your advice, it helped me greatly.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I am curious why you have a separate network card in your system? The motherboard comes with a built in Intel NIC. If the NIC works, why not remove your add-on card and use the motherboard connection?

Also, if you haven't done so, run a few burn-in tests, specifically the RAM and CPU. This can rule out a few items as the problem. Also you might remate the motherboard power connectors.

If you didn't state that the GUI works better after reseating your RAM and NIC, I'd have told you to use another computer to download your files to and then check the files.

CAT6 cable, seems of decent quality.
Yea but cables go bad from what appears to be no reason. For example a SATA cable goes bad that has been installed in a computer for a year and there has been no access to the SATA cable but suddenly it fails.

Also, you can directly connect an Ethernet cable between the NAS and your computer to see if the transfers are better, basically removing the LAN infrastructure for the test.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Network card:
IMG_0608 1.jpg

My feeling is: that's probably a fake card. The lack of an Intel Yottamark sticker is highly suspicious, the soldering of the QFP looks uneven and unprofessional, and the jack looks like something that came out of a Shenzhen penny bin.

You might try to find a CT from an alternative manufacturer like 10Gtek. I am not aware of there being fakes of these; the honesty of it "not being a true Intel card" plus that they are a known Intel manufacturer might bode well. Their cards are like

 
Joined
Oct 22, 2019
Messages
3,641
Honestly, @ma_er233, just like the others, I don't think it's worth trying to salvage your PCI network card. You even said it's a "second-hand" component. Network cards aren't expensive, so it shouldn't be a deep cost to purchase a new (and quality) one, as an example in the previous post by @jgreco. Plus, they're a very, very critical part of your NAS.

Even without the issue you're having, you don't want to risk your NAS and data to rely on a cheap or fake network card. (So if replacing the network card doesn't solve this issue, it's still a worthwhile upgrade.)

Like @joeschmuck asked, how come you're not using the motherboard's integrated network port? (Even if it's a "dreaded" RealTek, it doesn't hurt to try. Better to use a RealTek, than an supposed "Intel" that corrupts over-the-wire.)

But it doesn't end there. On top of doing RAM and CPU stress tests (to rule them out, as well), you may in fact need to check the files and images that you already saved to the NAS. (You might have inadvertently saved "iffy" versions of such files, which ZFS will report as "healthy" since they will always pass their Fletcher4 checksum.

There's the option to use "rsync" with its "-c" flag, if you still have all the original files and want to transfer them all over again, after you've definitively resolved the current issue.
 

ma_er233

Cadet
Joined
Apr 21, 2023
Messages
6
Sorry for my late reply. I've been busy like crazy recently.

I am curious why you have a separate network card in your system? The motherboard comes with a built in Intel NIC. If the NIC works, why not remove your add-on card and use the motherboard connection?
Like @joeschmuck asked, how come you're not using the motherboard's integrated network port? (Even if it's a "dreaded" RealTek, it doesn't hurt to try. Better to use a RealTek, than an supposed "Intel" that corrupts over-the-wire.)
It's broken when I got it. The seller told me as such and he just used a USB WIFI dongle. It indeed wasn't working when I tested it. So I just grabbed a cheap PCIe network card and didn't bother trying to fix anything.

I think the issue is unlikely caused by the CPU. Before installing TrueNAS Scale, this PC is running Pop_OS and I gamed on it for over a month. It worked just fine. Although all the RAM sticks were a later upgrade. I ran MemTest86+ for a day when I first upgraded the system. After all this happened, I ran two times of the MemTest86+ default test. They all passed without any error. So I think my CPU and RAM are more or less out in the clear.

My feeling is: that's probably a fake card. The lack of an Intel Yottamark sticker is highly suspicious, the soldering of the QFP looks uneven and unprofessional, and the jack looks like something that came out of a Shenzhen penny bin.
I think it's quite possible now that you pointed it out. It might be a fake, or a defective one that supposed to be disposed. But somehow it entered the second hand market.

Network cards aren't expensive, so it shouldn't be a deep cost to purchase a new (and quality) one, as an example in the previous post by @jgreco. Plus, they're a very, very critical part of your NAS.
True... I probably should buy something better and with a warranty this time. Perhaps upgrade to 2.5G at the same time? Though I'm still reading reviews and haven't decided on anything yet.
 
Joined
Oct 22, 2019
Messages
3,641
I probably should buy something better and with a warranty this time. Perhaps upgrade to 2.5G at the same time?
No worth it, in my opinion. You won't even be getting 2.5 Gbps speeds, since all devices must support 2.5 GbE. Might as well stick with a quality Intel 1 GbE network card. (You've been using gigabit speeds this entire time, anyways.)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No worth it, in my opinion. You won't even be getting 2.5 Gbps speeds, since all devices must support 2.5 GbE. Might as well stick with a quality Intel 1 GbE network card. (You've been using gigabit speeds this entire time, anyways.)

But a server running at 2.5GbE may be able to do a really decent job of simultaneously serving two or three 1GbE clients.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Could be. Or maybe the Grinch just took a copy of Chatterbot Julia (90's ftw!) and made a few TrueNAS-specific lecturebot tweaks.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So I think we've determined earlier that this was probably a knockoff card. Sorry you had to go through the time and annoyance. I'm going to tag this thread into my "Fake server cards" resource because we occasionally get folks who just won't believe it. I agree it doesn't happen super often but it does happen.
 
Joined
Oct 22, 2019
Messages
3,641
@ma_er233 You should inspect a random sampling of photos saved on your NAS, to check for signs of corruption. Not because of ZFS or RAM, but because of the possibility that you saved corrupted files (over-the-wire) when you were using the cheapo fake network card.

This might also be a situation where you're saving files to the NAS server, and upon being written to storage a Fletcher4 checksum is calculated (server-side). Hence, any scrubs will turn out "fine". Yet, you ended up saving a broken file. (ZFS doesn't know what the original file is on your client: only what you send to the server, which it receives, and then generates a hash in RAM before being saved to permanent storage on your pool.)

If time permits, you could also use rsync with the "-n" (dry-run) and "-c" (checksum" flags. This will clue you in if there are multiple files on your NAS whose integrity does not match the source (client, PC, laptop, etc.) Such a task should be run overnight or during a time where the server will not see much activity. It can be quite lengthy.

EDIT: If you don't feel it's worth it, and would be too time-consuming, you could just check a random sample of 100 different files. If all of them check out fine with no corruption, then you might feel that this is good enough for your own sanity. If even just one of them is corrupt? It would be wise to check everything with rsync (as explained above.).

My Hippo Theesus:
Everything was working normally, until you personally saw with your own eyes a corrupted photo. Up until before this time, the network card might not have had any issues. But around the time you personally saw garbled photos (and kept getting different checksums using 7zip over the SMB share) is when you might have been saving corrupted photos with subsequent backups and writes over SMB, since the network card is now borked. (Within the span of a week? A few days? A month? These are the files that might be in jeopardy.)

In other words: Of all your files, your newest files are the most likely to have been saved with corruption, which even ZFS won't safeguard. (ZFS thinks they're fine.)
 
Last edited:
Joined
Jun 15, 2022
Messages
674
The network
It was over the network.
I tried your suggestion, checked the SHA256 value directly via SSH. And found that now they match with themselves and my backup each time. So the file system and the file themselves should be intact.

I'm considering that as well. Since my entire setup except the HDDs are made out of second hand components, it's not a surprise to have some finicky parts. I reseated all my RAM, cleaned the contact of that PCIe network card and put it into a different slot.
This sort of worked. Now my WebUI is no longer dropping out, and one in three files transfer over Ethernet are not corrupted. I always thought computer hardware can be either working or broken, not in such lingering state. Apparently I'm wrong.

Router: ASUS RT-AX56U, wired on both end. CAT6 cable, seems of decent quality.
Network card:
View attachment 67495 View attachment 67496
4 SATA hard drives from Hitachi (Not Toshiba, I remembered it wrong. Sorry), plugged into motherboard SATA3 ports. 4 sticks of non-ECC DDR3 Memory running in dual channel 1600 MT/s

I think I can conclude that the problem is most likely on the network. I'll try to investigate further on that.
Thanks for your advice, it helped me greatly.
The network card is most likely a fake (see that link) and known for early failure. The card you have appears to be a more recent clone that looks more accurate than previous. The network card clones-like LSI HBA clones-appear to be manufactured from inferior (cheap) components and "live a productive, though short, life."
 

ma_er233

Cadet
Joined
Apr 21, 2023
Messages
6
@ma_er233 You should inspect a random sampling of photos saved on your NAS, to check for signs of corruption. Not because of ZFS or RAM, but because of the possibility that you saved corrupted files (over-the-wire) when you were using the cheapo fake network card.



If time permits, you could also use rsync with the "-n" (dry-run) and "-c" (checksum" flags. This will clue you in if there are multiple files on your NAS whose integrity does not match the source (client, PC, laptop, etc.) Such a task should be run overnight or during a time where the server will not see much activity. It can be quite lengthy.

EDIT: If you don't feel it's worth it, and would be too time-consuming, you could just check a random sample of 100 different files. If all of them check out fine with no corruption, then you might feel that this is good enough for your own sanity. If even just one of them is corrupt? It would be wise to check everything with rsync (as explained above.).

My Hippo Theesus:
Everything was working normally, until you personally saw with your own eyes a corrupted photo. Up until before this time, the network card might not have had any issues. But around the time you personally saw garbled photos (and kept getting different checksums using 7zip over the SMB share) is when you might have been saving corrupted photos with subsequent backups and writes over SMB, since the network card is now borked. (Within the span of a week? A few days? A month? These are the files that might be in jeopardy.)

In other words: Of all your files, your newest files are the most likely to have been saved with corruption, which even ZFS won't safeguard. (ZFS thinks they're fine.)
Thanks for the advice. I always use TeraCopy and do integrity verification for each file transfer. So I think all files already on my NAS should be fine. Although that reminds me, at first, TeraCopy will occasionally report some failed verification. It's pretty rare and easily resolved by simply copying again. So I didn't mind it too much. Now that I think about it, it's probably the first sign of a failing network card. Should have been alerted by that.
 
Top