TrueNAS throwing syslog-ng errors in console every 2 minutes.

Lothyde

Cadet
Joined
May 22, 2023
Messages
7
Hello I'm new to TrueNAS, my current setup is:

CPU - Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz
RAM - 8GB
1 120GB SSD for OS
1 120GB SSD cache
1 pool with 2 vdevs:
- MIRROR 2 2TB HDDs.
- MIRROR 2 1TB HDDs.

I connect a VGA monitor to the TrueNAS machine and these errors show up every 2 minutes:
IMG_20230522_185431.jpg

"May 22 18:53:19 truenas syslog-nd(1290): Last message 'I/O error occurred w' repeated 1 times, suppressed by syslog-nd on truenas.local
May 22 18:53:20 truenas syslog-nd(1290): I/O error occurred while writing; fd='23', error='Integrity check failed (97)'"

The machine seems to be working fine, with SMB access and sync thing jail working. However, this error worries me.

What is it?

Appreciate any and all help. Thank you.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I would Say disk or controller related issue, for starters:
  1. remove your cache SSD, with 8GB RAM it does way more harm than other (you can start using L2ARC at 64GB).
  2. Show output of zpool status
How are your HDDs connected? USB?
 

Lothyde

Cadet
Joined
May 22, 2023
Messages
7
I have removed my cache SSD.

Here is zpool status:

zpool status.png


The 4 HDDs are connected through the 4 motherboard SATA interfaces, the 2 SSDs (boot+former cache) are connected to a PCIe SATA extender.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The 4 HDDs are connected through the 4 motherboard SATA interfaces, the 2 SSDs (boot+former cache) are connected to a PCIe SATA extender.

What, pray tell, is a "PCIe SATA extender"? Some add-on SATA controller? What kind? What chipset?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
You likely will need a backup. Run a scrub on your data pool, then zpool status -v to identify the corrupted data.

Check the data and power cables of your drives (they should be properly connected and show no obvious signs of damage) and please tell us their model.

Finally, run long smart tests on each drive and show us the output once you get it.
 
Last edited:

Lothyde

Cadet
Joined
May 22, 2023
Messages
7
What, pray tell, is a "PCIe SATA extender"? Some add-on SATA controller? What kind? What chipset?
My apologies, I meant to say PCIe extension card. It's a conceptronic generic card, P/N: EMRICK03G. I'm not sure where to find the chipset but I did find this information on the product page: "This 4-Port SATA PCIe Adapter uses the high performance chip (Marvell 88se9215) to extend 4 SATA Ports." At the moment only the boot SSD is running through it. Since I have physically removed the other SSD.
You likely will need a backup. Run a scrub on your data pool, then zpool status -v to identify the corrupted data.
In an attempt to fix this issue I ran a scrub on the pool yesterday before making this post.

Here is the output of zpool status -v

zpool status v.png


Check the data and power cables of your drives (they should be properly connected and show no obvious signs of damage) and please tell us their model.
I removed all drives and inserted them back in the desktop, reseated all cables and took a picture of each drive (I saved the S/N of each adaX before shutting down to identify them here):

all disks.jpg


However, upon boot up, the pool was offline and the 2x 1TB disks were missing, these two disks are being powered by a Molex to SATA power adapter (since my PSU only has 3 SATA power connectors). After this I shut the system down, and used a different Molex connector to power the 2x 1TB drives, the pool is now back online but "unhealthy".

Perhaps relevant to mention that these 2x 1TB disks are the disks throwing checksum errors:

pool status.JPG


The checksum errors seem to increase every time that I/O syslog-nd error appears on the VGA output of TrueNAS.

Finally, run long smart tests on each drive and show us the output once you get it.

I have set the long smart tests running on each drive and will post results once they are finished.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Looks like the molex adapter was the issue. Try zpool clear and running another scrub, see if you get more cheksum errors.
 

Lothyde

Cadet
Joined
May 22, 2023
Messages
7
Looks like the molex adapter was the issue. Try zpool clear and running another scrub, see if you get more cheksum errors.
Running zpool clear sets the checksum error count to 0 but the errors still show up every 2 minutes and checksum error count keeps increasing.

I am waiting for the 2TB disks to finish the long SMART test to run another scrub.

Here are the results of the long SMART tests of the 1TB disks:

smart long.png


(I'm not sure if this is where you properly check the results)
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
(I'm not sure if this is where you properly check the results)
smartctl -a /dev/ada3 and smartctl -a /dev/ada4

Running zpool clear sets the checksum error count to 0 but the errors still show up every 2 minutes and checksum error count keeps increasing.
Not fixed then. I would try dropping the molex and daisy chaining the sata connectors (example), you won't have issues with 2 drives per slot.
Since you will be messing with cables, I would replace the data ones too.

What is your motherboard model?

If anyone else wants to step in please do so, I don't see any other possible cause except a cable issue.
 
Last edited:

Lothyde

Cadet
Joined
May 22, 2023
Messages
7
smartctl -a /dev/ada3 and smartctl -a /dev/ada4
smartctl -a /dev/ada3:
root@truenas[~]# smartctl -a /dev/ada3
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA HDWV110
Serial Number: 42CUUT1NS
LU WWN Device Id: 5 000039 fc5cbbe09
Firmware Version: MU2OA9A0
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5700 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue May 23 18:30:28 2023 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 9006) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 150) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 054 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 130 130 024 Pre-fail Always - 165 (Average 164)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 134
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 574
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 36
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 96
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 186
194 Temperature_Celsius 0x0002 153 153 000 Old_age Always - 39 (Min/Max 22/44)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 571 -
# 2 Short offline Completed without error 00% 559 -
# 3 Short offline Completed without error 00% 535 -
# 4 Short offline Completed without error 00% 485 -
# 5 Short offline Completed without error 00% 461 -
# 6 Short offline Completed without error 00% 413 -
# 7 Short offline Completed without error 00% 389 -
# 8 Short offline Completed without error 00% 365 -
# 9 Extended offline Completed without error 00% 344 -
#10 Short offline Completed without error 00% 317 -
#11 Short offline Completed without error 00% 293 -
#12 Short offline Completed without error 00% 245 -
#13 Short offline Completed without error 00% 221 -
#14 Short offline Completed without error 00% 197 -
#15 Short offline Completed without error 00% 164 -
#16 Short offline Completed without error 00% 152 -
#17 Extended offline Completed without error 00% 130 -
#18 Short offline Completed without error 00% 104 -
#19 Short offline Completed without error 00% 80 -
#20 Short offline Completed without error 00% 32 -
#21 Short offline Completed without error 00% 8 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@truenas[~]#

smartctl -a /dev/ada4:
root@truenas[~]# smartctl -a /dev/ada4
smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Blue
Device Model: WDC WD10EZEX-08M2NA0
Serial Number: WD-WCC3F1YK5R3V
LU WWN Device Id: 5 0014ee 2b637f385
Firmware Version: 01.01A01
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue May 23 18:36:17 2023 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (11280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 117) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 174 165 021 Pre-fail Always - 2300
4 Start_Stop_Count 0x0032 096 096 000 Old_age Always - 4533
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 054 054 000 Old_age Always - 33686
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1537
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 342
193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 4448
194 Temperature_Celsius 0x0022 105 100 000 Old_age Always - 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 33683 -
# 2 Short offline Completed without error 00% 33672 -
# 3 Short offline Completed without error 00% 33648 -
# 4 Short offline Completed without error 00% 33595 -
# 5 Short offline Completed without error 00% 33571 -
# 6 Short offline Completed without error 00% 33523 -
# 7 Short offline Completed without error 00% 33499 -
# 8 Short offline Completed without error 00% 33476 -
# 9 Extended offline Completed without error 00% 33453 -
#10 Short offline Completed without error 00% 33398 -
#11 Short offline Completed without error 00% 33374 -
#12 Short offline Completed without error 00% 33350 -
#13 Short offline Completed without error 00% 33302 -
#14 Extended offline Completed without error 00% 33280 -
#15 Short offline Completed without error 00% 33254 -
#16 Short offline Completed without error 00% 33230 -
#17 Short offline Completed without error 00% 33182 -
#18 Short offline Completed without error 00% 33158 -
#19 Short offline Completed without error 00% 33137 -
#20 Extended offline Completed without error 00% 33115 -
#21 Short offline Completed without error 00% 33089 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@truenas[~]#

Not fixed then. I would try dropping the molex and daisy chaining the sata connectors (example), you won't have issues with 2 drives per slot.
The Molex adapter was being used to power the 2x 1TB HDDs who are throwing checksum errors, to make sure the Molex adapter is the issue here I switched the Molex adapter to power the 2x 2TB HDDs instead. However the 2x 1TB HDDs are still throwing checksum errors, and the I/O syslog-nd errors are still showing up in the VGA TrueNAS output every 2 minutes.
Following this test I find it hard to believe that the molex adapter could be the culprit here.
Since you will be messing with cables, I would replace the data ones too.
At the moment I have no spare cables SATA cables. I might be able to get some tomorrow.
What is your motherboard model?
Asus H61M-A
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
If you are able to, try changing the SATA ports they are connected to the motherboard.
If it doesn't work, try a clean install of the OS (you can save the configuration and then import it on the new install).
 

Lothyde

Cadet
Joined
May 22, 2023
Messages
7
I fixed it by deleting the directory using "rm -r" where the error was located and running a scrub afterwards.

Not sure if what I did is the correct solution for this, but it worked.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Top