Resilvering keeps restarting

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
I had a drive fail and is trying to replace it, but it doesn't work. The resilvering runs some percentage, highest I've seen is about 15%, then I lose connection to the web ui for a minute or so and when it comes back, the resilvering starts from 0%. I can't see anything in the logs, I suspect it might reboot actually. Someone in a Facebook group was sure it must be a problem with my PSU, so I replaced it, but of course it didn't help. Really don't know what to do to solve this...
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
More hardware details please, including the make, model and how the disks are connected. Plus, software version.
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
TrueNas Core TrueNAS-12.0-RELEASE

Ryzen 5 5600G
Asrock Fatal1ty B450 Gaming-ITX
2 x Kingston ValueRAM DDR4, 3200MHz, 16GB, Non-ECC, CL22
4 x WD Red 3TB

I'm replacing the disks now though, to Samsung 870 EVO 4TB. And I upgraded the RAM to these ones recently, starting to wonder if they are the culprit...
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
Switched back to the old RAMs and while not completed yet, its far beyond where it ever went before (28% and counting). No one told me AMD is shit with RAM before I bought this stuff, I used to have Intel in the machine, but got recommended AMD works just fine for TrueNas. I was very skeptical as I had trouble with AMD back in the days, now I'm never buying it again for sure...
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
AMD does work - but depending on the generation you have to turn things off to make it work reliably

Also - if you add new memory - make sure you test it thoroughly with memtest first
 

somethingweird

Contributor
Joined
Jan 27, 2022
Messages
183

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
It may not be an AMD thing, as you are using a gaming system board;
Asrock Fatal1ty B450 Gaming-ITX
While it may, and probably will, work just fine, sometimes the defaults for gaming boards are to enable over-clocking on everything it can. INCLUDING MEMORY. An actual 24x7 server on the other hand not only does not want over-clocking, the extra heat and power usage caused by over-clocking may result in un-reliable operation.

That said, I have no clue if that particular gaming system board defaults to over-clocking. Or previously had over-clocking enabled.

In any case, good luck.
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
It may not be an AMD thing, as you are using a gaming system board;

While it may, and probably will, work just fine, sometimes the defaults for gaming boards are to enable over-clocking on everything it can. INCLUDING MEMORY. An actual 24x7 server on the other hand not only does not want over-clocking, the extra heat and power usage caused by over-clocking may result in un-reliable operation.

That said, I have no clue if that particular gaming system board defaults to over-clocking. Or previously had over-clocking enabled.

In any case, good luck.
Yeah, I hear that is something to look into from other sources as well. Thanks
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
Exact model? Are they CMR or SMR - if SMR - that might be the problem.
Not sure (I thought WD Red was WD Red for the reason they are always the same), but I am replacing all of them now anyhow and it really seem to be the RAM that was the problem. I'm at 76% now...
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
Ok, I'm throwing this NAS in the ocean. About all the hardware is max a year old. This time it went to 80% resilvering (which was never even close to possible with the other RAM), then I think it rebooted and started from 0%... I hate this NAS so much. Spent so much time and money on this crap... I really think its just AMD that is useless. Time to spend another $500 on a Intel system I guess...
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Turn off all the power saving features. Turn off anything involving overclocking.
Then run memtest for a minimum of 24 hours
WD Red can me SMR or CMR depending on age. But you need to tell us the model numbers (EFAX, EFRX for example)
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I hate this NAS so much. Spent so much time and money on this crap... I really think its just AMD that is useless.
AMD has nothing to do with your issues, we have seen plenty of successful builds: it would be unfitting to blame them.

Time to spend another $500 on a Intel system I guess...
If you do, please make an informed purchase or get stuck with a useless $500 Intel System.

You can start from the following resource.
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
AMD has nothing to do with your issues, we have seen plenty of successful builds: it would be unfitting to blame them.


If you do, please make an informed purchase or get stuck with a useless $500 Intel System.

You can start from the following resource.
Already spent probably $3000 on just crap that never works on this nas. Replacing several disks every year. Switched from Intel to amd because it keeps eating disks. Now switching to ssds instead of normal hdds in case they can't take it down here I Mexico. And now I'm looking at a new mobo and cpu again, because I've spent over a week here on this crap and I've already replace the PSU for another $150 after recommendations. I also updated the Bios after someone recommended that and last night the resilvering ran to 80% before I went to bed. It's on 50% now this morning, so I give up...
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Do you have a UPS?
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
I greped in the messages now and I find this before it reboots. Does that mean I have multiple failed disks?
Code:
root@truenas[/var/log]# zcat messages.1.bz2 | grep "<BOOT>" -A 10 -b10
90496-Sep  4 01:58:53 truenas 1 2023-09-04T01:58:53.115286-07:00 truenas.uglybob smartd 1790 - - Device: /dev/ada3, 1 Currently unreadable (pending) sectors
90647-Sep  4 02:28:53 truenas 1 2023-09-04T02:28:53.891637-07:00 truenas.uglybob smartd 1790 - - Device: /dev/ada3, 1 Currently unreadable (pending) sectors
90798-Sep  4 02:58:53 truenas 1 2023-09-04T02:58:53.234445-07:00 truenas.uglybob smartd 1790 - - Device: /dev/ada3, 1 Currently unreadable (pending) sectors
90949-Sep  4 03:28:53 truenas 1 2023-09-04T03:28:53.589417-07:00 truenas.uglybob smartd 1790 - - Device: /dev/ada3, 1 Currently unreadable (pending) sectors
91100-Sep  4 03:58:53 truenas 1 2023-09-04T03:58:53.954735-07:00 truenas.uglybob smartd 1790 - - Device: /dev/ada3, 1 Currently unreadable (pending) sectors
91251-Sep  4 04:28:53 truenas 1 2023-09-04T04:28:53.318964-07:00 truenas.uglybob smartd 1790 - - Device: /dev/ada3, 1 Currently unreadable (pending) sectors
91402-Sep  4 04:58:53 truenas 1 2023-09-04T04:58:53.805139-07:00 truenas.uglybob smartd 1790 - - Device: /dev/ada3, 1 Currently unreadable


Ada3 is the first ssd I replaced 3 weeks ago and it didn't complain then. I guess I am losing all my data...
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
It's telling you that on September 4th ada3 had a pending sector.
I'd suggest you to run memtest86+ in order to validate your RAM first.
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
It's telling you that on September 4th ada3 had a pending sector.
I'd suggest you to run memtest86+ in order to validate your RAM first.
Wow, sorry, I just woke up. Didn't see the date was all wrong. Well, I find it rare that both my new ram and my old ram are broken. I really think it's this mobo. I'm also so fed up with this NAS that I think it's time to get a commercial one. I have replaced countless of drives and computer parts over 3 years. It's just shit. And yes, I have a UPS.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
It is possible to get lemon systems, and even several in a row. Especially if using generic components, like consumer grade SSDs, gaming system boards, or SMR disks.

This is one reason why iXsystems sells pre-assembled and packaged MiniNASes. Now it is not possible for iXsystems to make a pre-assembled and packaged NAS for every use case. Nor can they make them as cheap as DYI, (Do It Yourself), NASes can be. But, it is an option for someone frustrated and can afford it.

Note that I have no business connection to iXsystems, other than having bought a TrueNAS Mini 7 years ago. (And while not a speed daemon, it was reliable for me...)
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Wow, sorry, I just woke up. Didn't see the date was all wrong. Well, I find it rare that both my new ram and my old ram are broken. I really think it's this mobo.
If it passes the RAM validation then we can exclude it from the probable cause list, if it doesn't we can look deeper into the CPU/Motherboard.

Which BIOS version are you running? Might be worth updating.
 

UglyBob

Dabbler
Joined
Jul 19, 2022
Messages
17
I updated to latest BIOS yesterday, still rebooted before finishing. I found this now, can this be a clue?


Code:
root@truenas[/var/log]# cat messages | grep "<BOOT>" -B 10
Nov 21 00:40:16 truenas ahcich5: is 00000000 cs 10000000 ss 1c000000 rs 1c000000 tfd 40 serr 00000000 cmd 0004dc17
Nov 21 00:40:16 truenas (ada3:ahcich5:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 688d 41 40 61 00 00 00 00 00
Nov 21 00:40:16 truenas (ada3:ahcich5:0:0:0): CAM status: Command timeout
Nov 21 00:40:16 truenas (ada3:ahcich5:0:0:0): Retrying command, 3 more tries remain
Nov 21 00:40:46 truenas ahcich5: Timeout on slot 29 port 0
Nov 21 00:40:46 truenas ahcich5: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 234 serr 00000000 cmd 0004dc17
Nov 21 00:40:46 truenas (aprobe0:ahcich5:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 0000 40 00 00 00 00 00 00
Nov 21 00:40:46 truenas (aprobe0:ahcich5:0:0:0): CAM status: Command timeout
Nov 21 00:40:46 truenas (aprobe0:ahcich5:0:0:0): Retrying command, 0 more triesremain
Nov 21 03:01:23 truenas syslog-ng[1508]: syslog-ng starting up; version='3.25.1'
Nov 21 03:01:23 truenas ---<<BOOT>>---
--
Nov 21 07:14:48 truenas syslog-ng[1508]: Suspending write operation because of an I/O error; fd='31', time_reopen='60'
Nov 21 07:14:50 truenas proftpd[1772]: 127.0.0.1 - ProFTPD killed (signal 15)
Nov 21 07:14:50 truenas proftpd[1772]: 127.0.0.1 - ProFTPD 1.3.6b standalone mode SHUTDOWN
Nov 21 07:14:50 truenas 1 2023-11-21T07:14:50.464911-08:00 truenas.uglybob ntpd1746 - - ntpd exiting on signal 15 (Terminated)
Nov 21 07:14:51 truenas kernel: vnet0.1: link state changed to DOWN
Nov 21 07:14:51 truenas kernel: epair0b: link state changed to DOWN
Nov 21 07:14:51 truenas 1 2023-11-21T15:14:51.409482+00:00 truenas.uglybob devd436 - - notify_clients: send() failed; 
 
Top