Multi-Report Critical Error

daschmidt

Explorer
Joined
Feb 12, 2023
Messages
61
Hope this is the right place to post, if not please move it.

Unfortunately I have to contact you again with a question/problem. I've been using the multi-report v2.0.9 tool for a few days.

Since then, I have always received an email from this tool with a critical error.

CRITICAL LOG FILE
Drive 50026B7784EF7EBE CRC Errors -65614
Drive 50026B7784EF7702 CRC Errors -44

unfortunately I couldn't find anything about it.

I hope you have a solution for me here.
 

daschmidt

Explorer
Joined
Feb 12, 2023
Messages
61
Today I checked the S.m.a.r.t. in the gui, there is last short and long test with Success
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Please post your hardware as per forum rules - so we know what we are dealing with

Then

can you please post the results of...
"zpool status"
"smartctl -a /dev/...." where .... are the names of your disks in sda/ada/whatever format.
 

daschmidt

Explorer
Joined
Feb 12, 2023
Messages
61
thanks, this are the outputs

Code:
root@truenas:~# zpool status
  pool: backups
 state: ONLINE
  scan: scrub repaired 0B in 00:18:54 with 0 errors on Sun Apr  2 00:18:56 2023
config:

        NAME                                    STATE     READ WRITE CKSUM
        backups                                 ONLINE       0     0     0
          73b5f297-e7f3-4c17-970e-428320bb8158  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:00:05 with 0 errors on Fri Apr  7 03:45:07 2023
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p3  ONLINE       0     0     0

errors: No known data errors

  pool: daten
 state: ONLINE
  scan: scrub repaired 0B in 00:39:48 with 0 errors on Sun Apr  2 00:39:51 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        daten                                     ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            eaa83887-562f-4836-b401-bc76abd9383a  ONLINE       0     0     0
            35a234b6-0736-4ddb-b2ea-7803b3bc59ec  ONLINE       0     0     0

errors: No known data errors

  pool: storj
 state: ONLINE
  scan: scrub repaired 0B in 02:14:27 with 0 errors on Sun Apr  9 02:14:34 2023
config:

        NAME                                    STATE     READ WRITE CKSUM
        storj                                   ONLINE       0     0     0
          afbc0b5e-4a44-49a8-ab8a-cc137079d6fe  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 00:01:48 with 0 errors on Sun Mar 12 00:01:50 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        tank                                      ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            6ada7c65-30d8-42e8-bd8a-09df6049c9eb  ONLINE       0     0     0
            1aae3f5f-f1af-4715-bb18-6842dc2ba183  ONLINE       0     0     0




Code:
root@truenas:~# smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37960G
Serial Number:    50026B7784EF7EBE
LU WWN Device Id: 5 0026b7 784ef7ebe
Firmware Version: SBFKZ1.3
User Capacity:    960,197,124,096 bytes [960 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  9 11:19:57 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (65535) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       737
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       20
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       20
170 Bad_Blk_Ct_Lat/Erl      0x0000   100   100   010    Old_age   Offline      -       0/17
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       42 (Average 21)
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       16
194 Temperature_Celsius     0x0022   031   050   000    Old_age   Always       -       31 (Min/Max 13/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       524288
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   098   098   000    Old_age   Offline      -       98
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       12449
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       6659
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       534
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       21
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       42
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       163296

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       646         -
# 2  Short offline       Completed without error       00%        16         -

Selective Self-tests/Logging not supported

Code:
root@truenas:~# smartctl -a /dev/sdf
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Phison Driven SSDs
Device Model:     KINGSTON SA400S37960G
Serial Number:    50026B7784EF7702
LU WWN Device Id: 5 0026b7 784ef7702
Firmware Version: SBFKZ1.3
User Capacity:    960,197,124,096 bytes [960 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  9 11:24:48 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (65535) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1019
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
167 Write_Protect_Mode      0x0000   100   100   000    Old_age   Offline      -       0
168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0
169 Bad_Block_Rate          0x0000   100   100   000    Old_age   Offline      -       18
170 Bad_Blk_Ct_Lat/Erl      0x0000   100   100   010    Old_age   Offline      -       0/15
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       48 (Average 20)
181 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count        0x0000   100   100   000    Old_age   Offline      -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0022   034   050   000    Old_age   Always       -       34 (Min/Max 16/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       458752
218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0
231 SSD_Life_Left           0x0000   098   098   000    Old_age   Offline      -       98
233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       13964
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       6551
242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       548
244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       20
245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       48
246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       155152

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       929         -
# 2  Short offline       Completed without error       00%        23         -

Selective Self-tests/Logging not supported
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Code:
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       524288
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       458752

Something happened.
 

daschmidt

Explorer
Joined
Feb 12, 2023
Messages
61
Code:
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       524288
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       458752

Something happened.
where I can find the problem? This says nothing to me
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Please post the entire output of the smartctl -x /dev/... (drive) because you have limited us by only posting what you feel we need to see.

Generally CRC Errors are a sign of a data cable issue. Replace the data cable for the affected drive(s). This is the safest thing to do. I prefer the locking sata data cables if you must purchase new ones.
 

daschmidt

Explorer
Joined
Feb 12, 2023
Messages
61
Replace the data cable for the affected drive(s).
I build this server about 2months ago and this is my first ownbuild server. The ssd are pluged in this pci.
As motherboard I use a supermicro maybe there is somethink missconfigured in the motherboard??
The other 4 HDDs have no problem.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I build this server about 2months ago and this is my first ownbuild server. The ssd are pluged in this pci.
That's likely the issue.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
How many drives are plugged into the PCIe card? What make/model drives? What is this being used for? Boot Drive? Cache?

Do not purchase another PCIe card until we know for certain that is the problem.

If this is only a single SSD and it's your boot drive, then I doubt you need a new PCIe card, but we need to know more. My advice still holds true, replace the SATA data cables with other ones. The ones that may have come with the card could be crap, but if you have more unused ones, install them. Make sure they are one straight and not crooked or under pressure from an angle.
 

daschmidt

Explorer
Joined
Feb 12, 2023
Messages
61
What is this being used for? Boot Drive? Cache?
it's used for VM's and the truecharts apps.

The other Hdds are pluged directly in the motherboard.
Today evening I try other cables.

After the replace how can I delete the alert or check if it will work? The count doesen't go back to zero or I'm wrong?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
UDMA CRC Errors will never return to zero. They are stuck there forever. I have no idea why SMART does that and there is no way to reset the value once the problem is solved. However you can sort of hide them. Run the multi_report script with the -config option, select Advanced "A" -> "K" -> press Return once, you should be at Automatic Drive Compensation, press 'y' and the script will find the errors and offset them. Press Return several more times until you return to the Advanced Configuration Settings page, Press "W" to write the changes. Next press "X" to exit. Now run the script normally and the CRC Errors will be a zeroed out value. The alarm condition will go away. However if more errors occur the alarm will return. This is a great way to find out if you have fixed the problem or not. Once the problem is fixed, if you have any more errors, just reconduct the above procedure to zero it out again.

Hope this helps some.

Again, if you run the script with the -dump email then I will be provided enough data where I may be able to help you more. As I said, it may not be the PCIe controller, but then again, it might be. You have a huge number of errors on one drive and few errors on the other. You should identify which drive is the VM drive, if it's the high count drive, maybe this will narrow down your issue.

Also note, the next version of Scale is coming out in a few days. Maybe it will have a positive impact on you, but I doubt the CRC Errors will be fixed by it.

Good Luck
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Also, if you must purchase a new PCIe SATA card, look at my build. You can purchase a nice HBA, see this links below.

 
Joined
Jun 15, 2022
Messages
674
I build this server about 2months ago and this is my first ownbuild server. The ssd are pluged in this pci.
As motherboard I use a supermicro maybe there is somethink missconfigured in the motherboard??
The other 4 HDDs have no problem.
As you start putting more data on the drives and working that SATA card harder you're likely to have all sorts of unexpected issues which eventually result in loss of your NAS.

The firmware on those cards is horrific. If there's an error (which of course should be handled)...who cares? Dump it off and keep running like nothing happened. (and let me tell you, there are errors happening all the time in computers when using awesome hardware simply due to the speeds, volume, and complexity we're asking of these systems--so you should see the cow-plop that knock-off Chinese stuff produces) @jgreco nailed it in his article, the firmware is so sketch it's a wonder that stuff even works under Windows.

BUT that brings up a point Joe mentioned, Windows does the bare minimum. Windows works with pretty much any hardware and is "fast" and pretty, and that's exactly what consumers want: cheap, fast, pretty. Who really cares if your game is dropping a frame here and there or crashes once every 4 months, you can't beat the price. Did you get fragged by a lucky shot or bad math? Don't know, don't care. HERE, however, we all care. HERE "bad math" isn't "politically sensitive" new-age-math "close enough" so you feel good bunk where 2+3=23, it's wrong. Flat-out inexcusably wrong.

From reading this (and loads of experience) it's pretty obvious (to me) you're under-valuing your hardware and what it does, and I mean that constructively. I (we) want your system to work, reliablby, long-term, and cost you as little time/money/and effort as practical. We (collectively) want you to succeed. But for that to happen 'you gotta' learn more about the hardware you're asking to rip billions of bytes of information around without messing up along the way. (or, set up a Windows Share, that works too. until it doesn't. but hey, it was two-clicks-and-done.)
 

daschmidt

Explorer
Joined
Feb 12, 2023
Messages
61
Again, if you run the script with the -dump email then I will be provided enough data where I may be able to help you more.
Today when the children sleep, I will vhange the cables ans send the script with your dump mail


You should identify which drive is the VM drive, if it's the high count drive, maybe this will narrow down your issue.
both this drives sda and sdf are mirrored in the pool tank
 
Joined
Jun 15, 2022
Messages
674
I'm way-far-remote in the middle of nowhere on the edge of a a sketchy relay-relay-relay.... connection and ironiccally my previous post is a great example of what your hardware inside your system is doing....trying to get it right but with no spell-check, no grammer-check, and few-to-no resources to do so. :eek:

But truely we do want your system to work well for you.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
And post your hardware - don't just trickle feed us with bits of info here and there.
Post your entire server hardware setup. Every drive, every card, memory, CPU, PSU etc
 
Last edited:
Top