Uncorrectable I/O failure and disaster recovery

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
You just have to stay clear of SMR (basically standard WD RED for their NAS line). Although I am a Seagate guy too.

iirc rebooting the systems clears any errors in the pools status, so the zpool status output we have now could be not truthful... well, a scrub will make concealed errors pup up if there are any.
 

kirkdickinson

Contributor
Joined
Jun 29, 2015
Messages
174
I will say that the board doesn't particularly play well with the RAM - plan is to upgrade MB, CPU and RAM later this year.

I highly suggest that you spend some time on reading the recommended hardware thread.


I have three TrueNAS systems between my small office and home PLEX server. All of them have SuperMicro boards and ECC RAM. Both are highly recommended. :)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Not for me, maybe for joe.
Me either. It's a lot of data to wade through.

If you click on @Davvo "Show: TrueNAS-13.0-U5.3" button, or my "Show: Systems" and then the top button, it will show you the details we are looking for.

Yes feature uprgades were made so no going back
That is too bad. A word of advice, never upgrade the ZFS feature set unless you need a new feature. Most folks do not. This affords you the opportunity to roll back to the previous version. Tough lesson to learn and I've been there before as well.

Take your time if there is any important data on your drives. Do not rush it at all. It's very easy to make a mistake and "Zap", all your data is gone.
Do not get frustrated at TrueNAS, it's a solid piece of software and has some hardware requirements. Lots of Youtube videos make it sound like a person can use any old computer and convert it to a NAS, that isn't true. The good thing you have going for you is the system once worked with TrueNAS 12. Maybe if you do not need your data, you could wipe the drives and start all over with a fresh install of TrueNAS 13.

If you have more issues after the scrub, you might try this.
Here are a few steps you might take as it will install a "clean install" vice an upgrade which is what I suspect you did:
1. If you do not already have a backup of your TrueNAS configuration files, make a copy and save it to a local computer (not the NAS).
2. Download a new copy of the TrueNAS 13 image (ISO).
3. If you have an 8GB USB Flash drive you can boot from, remove your current boot drive and use the USB Flash drive, install TrueNAS 13 to it.
4. Boot the system, answer the questions.
5. Restore your configuration files and if the computer does not reboot, then manually reboot it.
6. Cross your fingers it all works.

Please post the output of each command between [CODE][/CODE] tags.
You used <code> and </code>, these are greater than and less than not brackets as indicated above. Yes, it makes a difference and the formatting of the text is way off and more difficult to read.

Both of the faulty HDD's are WD SMR drives as you guys pointed out. I can tell you one thing now... I'm sticking to seagate from now on.
ALWAYS look up the model number an any drive to ensure it is CMR. Always. Manufacturers make both CMR and SMR drives. If you can't prove it's CMR, then assume it is not.

I hope you copy all your data and are able to start over with proper drives and hardware.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Seagate also has SMR drives, scattered through their lower end lines of disks. So make sure you get non-SMR drives, (aka CMR or PMR / Perpendicular Magnetic Recording). Enterprise and IronWolf, (or IronWolf Pro), seem to not be SMR and are generally reliable with TrueNAS & ZFS.

If a scrub is in progress at the time of a reboot, after booting, the scrub will simply resume on pool import.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
I have three TrueNAS systems between my small office and home PLEX server. All of them have SuperMicro boards and ECC RAM. Both are highly recommended. :)

SuperMicro is just a little out of my personal budget range. Would love some ECC though.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
That is too bad. A word of advice, never upgrade the ZFS feature set unless you need a new feature. Most folks do not. This affords you the opportunity to roll back to the previous version. Tough lesson to learn and I've been there before as well.

I know, I wasn't going to but the constant notification drew me a little mad so I made the coice. I never thought it was the upgrade. If you guys got the impression I was blaming the upgrade - thats on you
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
SuperMicro is just a little out of my personal budget range. Would love some ECC though.
Strongly consider used hardware, almost all of the systems you see in the signatures are made of it.

Regarding the CMR vs SMR, the following website can help you.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
Here are a few steps you might take as it will install a "clean install" vice an upgrade which is what I suspect you did:

Nope I switched over to the Samsung with the upgrade and the original drive was a 250GB. So i had to do a fresh install and import the config
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
Me either. It's a lot of data to wade through.
If you click on @Davvo "Show: TrueNAS-13.0-U5.3" button, or my "Show: Systems" and then the top button, it will show you the details we are looking for.

All this is in the first 5 posts or so. I only reported the CPU incorrectly which is a 5650G.

Only things left out was the PSU: which is a 600W Coolermaster and the case which is (of no concern but) an old gigabyte desktop box.

I even added links!

And I have lots of cooling. the drives under heavy load doesn't exceed 45degrees
 

kirkdickinson

Contributor
Joined
Jun 29, 2015
Messages
174
SuperMicro is just a little out of my personal budget range. Would love some ECC though.
Yeah, buy once, cry once. Many of the gaming boards do not support ECC memory. That is a problem with them.

From speaking to my "TrueNAS" guru, he feels that full ECC support with AMD motherboards of any kind are shaky. The problem is that you don't know if it is working or not... until it doesn't work. All the IX Systems use hardware that completely support ECC. They have been tested to assure that they fully support it. Using motherboards/CPU/RAM from the recomended hardware list *should* fully support ECC.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
Seagate also has SMR drives, scattered through their lower end lines of disks.

I assumed as much but I have over the course of my life had much less issues with seagate then WD. I've had significantly more WD's fail on me then seagates
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
eah, buy once, cry once. Many of the gaming boards do not support ECC memory. That is a problem with them.

From speaking to my "TrueNAS" guru, he feels that full ECC support with AMD motherboards of any kind are shaky. The problem is that you don't know if it is working or not... until it doesn't work. All the IX Systems use hardware that completely support ECC. They have been tested to assure that they fully support it. Using motherboards/CPU/RAM from the recomended hardware list *should* fully support ECC.

Yeah no I hear you but, ECC is only really used on VERY important data so for a home NAS/media server doesn't really make sense to invest that money IMO anyway.

As for the AMD thing...yes agreed AMD might be best for gaming( only my opinion don't start war please ), but not ideal for servers. I had an extra board and it was a good fit at the time for a home NAS so work with what you got, not wqhat you wish you had.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Quoting someone in this forum, either you care about your data or you don't; if you take the effort of using ZFS, you want ECC RAM.

It's not really AMD vs INTEL but consumer vs server-grade.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
The 2TiB failed while copying torrents. Some incomplete and almsot all 8k videos. The scrub is showing 277000+ errors so far with 50% to go

***EDIT:*** It sounds like I was copying the data from the drive now as it failed. I already copied the torrent back to local storage with no reported error. I meant I was transfering the torrents to the drive when it first failed
 
Last edited:

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
Quoting someone in this forum, either you care about your data or you don't; if you take the effort of using ZFS, you want ECC RAM.

It's not really AMD vs INTEL but consumer vs server-grade.

The thing that put me off AMD the other day especially for server hardware was the AMD microcode in firmware...can't remember the name it's responsible for booting up the entire multicore CPU. It seems there are serious security issues with it (and have been for years). That has since deterred me from AMD CPU's
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Hold doing anything until the scrub completes, then post the output of zpool status NR.2TiB.2. Anyway it's not encouraging... such corruptions usually stems from RAID cards/port multipliers... can't imagine the lack of ECC RAM being at fault here.
Let ZFS cook.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
Hold doing anything until the scrub completes, then post the output of zpool status NR.2TiB.2. Anyway it's not encouraging... such corruptions usually stems from RAID cards/port multipliers... can't imagine the lack of ECC RAM being at fault here.
Let ZFS cook.

Ye no it's idle. I even switched over to youtube so as not to use any drives on the nas.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
I can't imagine the sata controller on the board just decided to fail now. It has to be the SMR else it's something really sinister.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
Code:
Oct  9 17:06:48 truenas ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
Oct  9 17:06:48 truenas ada3: <WDC WD20EARX-00PASB0 51.0AB51> s/n WD-WCAZAJ631126 detached
Oct  9 17:06:48 truenas GEOM_MIRROR: Device swap2: provider ada3p1 disconnected.
Oct  9 17:37:03 truenas 1 2023-10-09T15:37:03.039502+00:00 truenas.vs.lan devd 369 - - notify_clients: send() failed; dropping unresponsive client


This is causing high cpu load. it's this service that is causing the high CPU usage.

What is this?

PS: there are lots of logs like that send() failed
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
*** UPDATE: ***

The 3TiB LONG smart test just completed. Success.
Seems more and more definitive that the problem is with either SMR and ZFS or some other software glitch on the server (maybe due to failing hardware or pure software failure due to some config or something)

I would just be flabbergasted if WD SMR is causing so serious errors so soon on ZFS. I guess WD really has serious issues with ZFS.

FYi: I don't think I noticed this but the failing 2TB is also a drive that was in a desktop computer and was moved to truenas once I got the ASM1166. Have since disconnected a single ppol to make room on the onboard for the 2TB.
 
Top