jengle
Dabbler
- Joined
- Jan 4, 2023
- Messages
- 26
Greetings,
I think I have the sequence of events pretty close to accurate
. Thanks in advance for any help to another nooob. System config in my signature.
Two days ago I finished my build-out for my first TrueNAS Scale system and let some family members know that they could check it out. To be safe, I decided to replicate my primary pool (Main) to an external drive after loading all the Apps that I wanted and about 9000 photographs. The primary pool also has an SMB share that has just under 600,000 files, and the ix-applications datasets using 2 TiB of space for everything. During the replication I got 14 read errors. When I checked the status of the replication the next morning the system was frozen - on, but not responsive. I rebooted and the Boot pool was unavailable from Main. I re-installed TrueNAS, and put the pool on Main (as it was before) and imported my two pools, did a scrub (no errors) and a short S.M.A.R.T. test on the drive that had the errors (a brand new Seagate Ironwolf 4 tb drive). One issue was that I now had 2 checksum errors. My system has 4 SATA ports on the motherboard, and a PCIe card with 4 SATA ports. Since the checksum error could be caused by a bad cable or controller I moved the faulty disk from the PCIe SATA card to the motherboard (now all pool Main is on the SATA card and the PCIe card has 2 6 Tb drives for the 2nd pool and the boot drive). I also changed the cable for the problematic drive.
Since I didn't know the status of my replication I decided to backup my most critical files on the SMB share (using Backup4all on my desktop). During that process I got errors on the same 4 Tb drive that I had moved from the PCIe SATA card to the motherboard SATA port:
I am using non-ECC memory - is there a memory test I can do? I am not getting any ZFS errors.
CPU runs about 30C with moderate activity and when I was doing some photo processing along with copying files it got up into the mid 40Cs. CPU usage never went above about 30 or 35%.
Again, thanks in advance for the advice.
jengle
I think I have the sequence of events pretty close to accurate
Two days ago I finished my build-out for my first TrueNAS Scale system and let some family members know that they could check it out. To be safe, I decided to replicate my primary pool (Main) to an external drive after loading all the Apps that I wanted and about 9000 photographs. The primary pool also has an SMB share that has just under 600,000 files, and the ix-applications datasets using 2 TiB of space for everything. During the replication I got 14 read errors. When I checked the status of the replication the next morning the system was frozen - on, but not responsive. I rebooted and the Boot pool was unavailable from Main. I re-installed TrueNAS, and put the pool on Main (as it was before) and imported my two pools, did a scrub (no errors) and a short S.M.A.R.T. test on the drive that had the errors (a brand new Seagate Ironwolf 4 tb drive). One issue was that I now had 2 checksum errors. My system has 4 SATA ports on the motherboard, and a PCIe card with 4 SATA ports. Since the checksum error could be caused by a bad cable or controller I moved the faulty disk from the PCIe SATA card to the motherboard (now all pool Main is on the SATA card and the PCIe card has 2 6 Tb drives for the 2nd pool and the boot drive). I also changed the cable for the problematic drive.
Since I didn't know the status of my replication I decided to backup my most critical files on the SMB share (using Backup4all on my desktop). During that process I got errors on the same 4 Tb drive that I had moved from the PCIe SATA card to the motherboard SATA port:
- 9 Read Errors
- 1 Write Errors
- 2 Checksum Errors (no change)
- 1 Read Errors
- 5 Write Errors
- 1 Checksum Errors
I am using non-ECC memory - is there a memory test I can do? I am not getting any ZFS errors.
CPU runs about 30C with moderate activity and when I was doing some photo processing along with copying files it got up into the mid 40Cs. CPU usage never went above about 30 or 35%.
Again, thanks in advance for the advice.
jengle