Seeking disk replacement advice (newbie)

xzdc

Cadet
Joined
Mar 28, 2023
Messages
6
Hello. My home NAS (specs below) had trouble booting the other day. I turned it off as I was away for a few months. When it was setup initially, everything was working as expected. I copied my data onto it which filled it up till 90% of capacity. It was on for several days before I turned it off.

When I came back, I noticed it was taking a long time to load TrueNAS. I saw on the monitor that it was processing some jobs. After what seemed like hours, TrueNAS finally booted only for me to realise that 1 hard disk had failed. Even the BIOS couldn't detect it. I switched cables, ports. Nothing worked. As it was still under warranty, I was able to exchange the failed drive for a new one. Due to my lack of knowledge/carelessness, I did not do the "replace disk" process (which only recently I read up on) and turned off the machine, took out the faulty drive and sent it for RMA.

Upon receiving the new hard disk, I tested it with a drive dock and it was working as expected. I installed it, made sure that the BIOS could see it and started the TrueNAS boot process. Along the way, it seemed to be stuck on a loop of loading this "disk journal" job (can't recall the exact name). It kept failing and starting again. So I turned off the machine (via Ctrl Alt Del for which the machine actually started a shutdown process), unplugged the new drive and tried to boot it up with only 3 connected drives. But this would also still be stuck in some sort of job processing (but not a loop - different lines popped up every once in a while). Not willing to spend anymore time, I turned the machine off.

So as of now, I have not been able to successfully boot into TrueNAS with the new hard disk together with my data intact. I would like some advice on how to resolve this. Thank you.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
Exact details / screen captures would be useful. When you get a chance, take a picture, whatever you have to do.

Also, one problem that sticks out already is you filled it to 90%, that right there causes large performance issues in general with zfs.
 

xzdc

Cadet
Joined
Mar 28, 2023
Messages
6
Exact details / screen captures would be useful. When you get a chance, take a picture, whatever you have to do.

Also, one problem that sticks out already is you filled it to 90%, that right there causes large performance issues in general with zfs.
Thanks for the reply.

I've attached some photos of the NAS processing at boot up with 3 hard disks connected. The last one which shows "Starting nfs.config service" is after I pressed Ctrl+Alt+Del. The NAS is still running with this message.

I was thinking to shut it off and connect the new hard disk to get some screenshots.

Should I just turn it off via power plug switch or wait until it shows some signal to shut down?
 

Attachments

  • 20231018_135455.jpg
    20231018_135455.jpg
    449.2 KB · Views: 126
  • 20231018_135720.jpg
    20231018_135720.jpg
    474.2 KB · Views: 122
  • 20231018_142238.jpg
    20231018_142238.jpg
    471.5 KB · Views: 117

xzdc

Cadet
Joined
Mar 28, 2023
Messages
6
UPDATE

There was a power cut. So I decided to connect the new hard disk and turned the system on.

I have attached screenshots of its current progress. I'm seeing "Job ix-zfs.service/start running" over and over. Is this expected? 20231018_182243.jpg 20231018_182230.jpg

UPDATE 2

TrueNAS eventually booted to main menu. However, my excitement was short lived when the system rebooted itself all of a sudden and it is now stuck in this loop as shown in the newly added screenshots. 20231018_185536.jpg 20231018_185524.jpg 20231018_185437.jpg 20231018_185437.jpg
 

Attachments

  • 20231018_185542.jpg
    20231018_185542.jpg
    466.2 KB · Views: 111
Last edited:

xzdc

Cadet
Joined
Mar 28, 2023
Messages
6
UPDATE 3

I did a memory test and found that one stick was causing some errors. I removed it and retested successfully with 2 passes.

I left the machine running overnight to boot and discovered that it had booted to the main menu today. I logged in at the Web GUI and it seemed to be processing something as the widgets were taking some time to load. Then, out of the blue the system restarted and it is now loading some things as you can see in attached screenshots.

Should I just reinstall truenas at this point? Looking for some advice on this. Thank you.
 

Attachments

  • 20231020_124831.jpg
    20231020_124831.jpg
    427.3 KB · Views: 120
  • 20231020_124850.jpg
    20231020_124850.jpg
    424.3 KB · Views: 114
  • 20231020_125220.jpg
    20231020_125220.jpg
    415.1 KB · Views: 118

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Should I just reinstall truenas at this point? Looking for some advice on this. Thank you.
No, reinstalling wouldn't do any good. You need to figure out what is causing your stability issues first. A properly functioning system should not be rebooting "out of the blue".

FreeBSD is a very stable OS. I've reached uptime of almost a year. Your hardware is likely the culprit, not TrueNAS.

EDIT: I noticed you're actually running SCALE (Linux). Still, the point stands. You shouldn't be rebooting out of the blue willy-nilly.
 

xzdc

Cadet
Joined
Mar 28, 2023
Messages
6
No, reinstalling wouldn't do any good. You need to figure out what is causing your stability issues first. A properly functioning system should not be rebooting "out of the blue".

FreeBSD is a very stable OS. I've reached uptime of almost a year. Your hardware is likely the culprit, not TrueNAS.

EDIT: I noticed you're actually running SCALE (Linux). Still, the point stands. You shouldn't be rebooting out of the blue willy-nilly.
Thanks for the reply.

I'm currently doing more tests. As of now, memtest reports more errors with the 3 sticks.

In the worst case scenario, would it be possible to move the boot drive and the 4 hard disks to different system i.e. different motherboard, cpu, ram type without reinstalling? I am hoping that moving to a stable functioning system would allow me to rebuild my data array, if possible.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
I'm currently doing more tests. As of now, memtest reports more errors with the 3 sticks.
That's why we use ECC memory :D
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
In the worst case scenario, would it be possible to move the boot drive and the 4 hard disks to different system i.e. different motherboard, cpu, ram type without reinstalling?
Yes. And if a you have a suitable system at hand you may do it right now.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
If you had bad memory modules, that's not good as it could have well corrupted data on disks, or worse, metadata. I would not use the disks as is in any new system. I would try and copy stuff off if you can get some system to for a backup purpose, but I wouldn't keep them as is if you had that many memory issues.

In general, when putting in a new system (even with used equipment), run tests on the drives (yes, may take days), runs tests on memory, before loading the OS and spending time configuring, etc. Will save you time and effort in the end. It's all waiting anyway, so very little of your time.
 
Top