One drive of a TrueNAS filesystem mirror is failing.

Vega Alpha

Dabbler
Joined
Mar 19, 2016
Messages
17
I knew this day would come.

Six years ago, I purchased FreeNAS Mini with 32GB ECC RAM, 1 4 TB Data Set Mirrored via 2 7200 RPM Western Digital NAS Red, 1 4 TB Data Set Striped via 1 7200 RPM Western Digital Enterprise Black Amazon Refurbished.

I installed software:

Plugin: Plex-Pass
Jail: Gogs a Git server <- Old FreeNAS warren jail
Jail: PostgreSQL with PostgreSQL Admin <- Plans to move to community plugin
Plugin: GitLab <- Currently migrating to.

Samba Shares to drop movies, music, and pictures into folders so Plex will find and later show them in its GUI.

In December 2020, the Western Digital Black strip failed SMART manual backup, and restoring this non-critical data was successful.

December 2021 One of the drives in the Mirror began to fail. This is critical data.

In console:
Dec 19 16:10:48 server (ada1:ahcich10:0:0:0): Error 5, Retries exhausted
Dec 19 16:10:55 server (ada1:ahcich10:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f0 00 8d 40 d5 00 00 05 00 00
Dec 19 16:10:55 server (ada1:ahcich10:0:0:0): CAM status: ATA Status Error
Dec 19 16:10:55 server (ada1:ahcich10:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Dec 19 16:10:55 server (ada1:ahcich10:0:0:0): RES: 41 40 a0 01 8d 40 d5 00 00 00 00
Dec 19 16:10:55 server (ada1:ahcich10:0:0:0): Retrying command, 3 more tries remain
Dec 19 16:11:02 server (ada1:ahcich10:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f0 00 8d 40 d5 00 00 05 00 00
Dec 19 16:11:02 server (ada1:ahcich10:0:0:0): CAM status: ATA Status Error
Dec 19 16:11:02 server (ada1:ahcich10:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Dec 19 16:11:02 server (ada1:ahcich10:0:0:0): RES: 41 40 17 02 8d 40 d5 00 00 00 00
Dec 19 16:11:02 server (ada1:ahcich10:0:0:0): Retrying command, 2 more tries remain
Dec 19 16:11:09 server (ada1:ahcich10:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f0 00 8d 40 d5 00 00 05 00 00
Dec 19 16:11:09 server (ada1:ahcich10:0:0:0): CAM status: ATA Status Error
Dec 19 16:11:09 server (ada1:ahcich10:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Dec 19 16:11:09 server (ada1:ahcich10:0:0:0): RES: 41 40 90 01 8d 40 d5 00 00 00 00
Dec 19 16:11:09 server (ada1:ahcich10:0:0:0): Retrying command, 1 more tries remain
Dec 19 16:11:16 server (ada1:ahcich10:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f0 00 8d 40 d5 00 00 05 00 00
Dec 19 16:11:16 server (ada1:ahcich10:0:0:0): CAM status: ATA Status Error
Dec 19 16:11:16 server (ada1:ahcich10:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Dec 19 16:11:16 server (ada1:ahcich10:0:0:0): RES: 41 40 50 01 8d 40 d5 00 00 00 00
Dec 19 16:11:16 server (ada1:ahcich10:0:0:0): Retrying command, 0 more tries remain
Dec 19 16:11:23 server (ada1:ahcich10:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f0 00 8d 40 d5 00 00 05 00 00
Dec 19 16:11:23 server (ada1:ahcich10:0:0:0): CAM status: ATA Status Error
Dec 19 16:11:23 server (ada1:ahcich10:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Dec 19 16:11:23 server (ada1:ahcich10:0:0:0): RES: 41 40 f7 01 8d 40 d5 00 00 00 00
Dec 19 16:11:23 server (ada1:ahcich10:0:0:0): Error 5, Retries exhausted
Dec 19 16:11:32 server GEOM_MIRROR: Cannot open consumer ada1p1 (error=1).
Dec 19 16:11:32 server GEOM_MIRROR: Device swap1 destroyed.
Dec 20 00:00:00 server syslog-ng[1176]: Configuration reload request received, reloading configuration;
Dec 20 00:00:00 server syslog-ng[1176]: Configuration reload finished;
Dec 21 00:00:00 server syslog-ng[1176]: Configuration reload request received, reloading configuration;
Dec 21 00:00:00 server syslog-ng[1176]: Configuration reload finished;
Dec 22 00:00:00 server syslog-ng[1176]: Configuration reload request received, reloading configuration;
Dec 22 00:00:00 server syslog-ng[1176]: Configuration reload finished;
Dec 23 00:00:00 server syslog-ng[1176]: Configuration reload request received, reloading configuration;
Dec 23 00:00:00 server syslog-ng[1176]: Configuration reload finished;
Dec 23 16:04:58 server 1 2021-12-23T16:04:58.699007-06:00 server.local sshd 56057 - - _secure_path: /mnt/volume-1/users/jerry/.login_conf is group writeable by non-authorised groups
Dec 23 16:04:58 server 1 2021-12-23T16:04:58.699219-06:00 server.local sshd 56057 - - _secure_path: /mnt/volume-1/users/jerry/.login_conf is group writeable by non-authorised groups
Dec 23 16:04:58 server 1 2021-12-23T16:04:58.796591-06:00 server.local sshd 56058 - - _secure_path: /mnt/volume-1/users/jerry/.login_conf is group writeable by non-authorised groups
Dec 23 16:04:58 server 1 2021-12-23T16:04:58.796791-06:00 server.local sshd 56058 - - _secure_path: /mnt/volume-1/users/jerry/.login_conf is group writeable by non-authorised groups
Dec 23 16:51:29 server 1 2021-12-23T16:51:29.032764-06:00 server.local sshd 57912 - - _secure_path: /mnt/volume-1/users/jerry/.login_conf is group writeable by non-authorised groups
Dec 23 16:51:29 server 1 2021-12-23T16:51:29.032962-06:00 server.local sshd 57912 - - _secure_path: /mnt/volume-1/users/jerry/.login_conf is group writeable by non-authorised groups
Dec 23 16:51:29 server 1 2021-12-23T16:51:29.082248-06:00 server.local sshd 57913 - - _secure_path: /mnt/volume-1/users/jerry/.login_conf is group writeable by non-authorised groups
Dec 23 16:51:29 server 1 2021-12-23T16:51:29.082451-06:00 server.local sshd 57913 - - _secure_path: /mnt/volume-1/users/jerry/.login_conf is group writeable by non-authorised groups
Dec 24 00:00:00 server syslog-ng[1176]: Configuration reload request received, reloading configuration;
Dec 24 00:00:00 server syslog-ng[1176]: Configuration reload finished;
Dec 25 00:00:00 server syslog-ng[1176]: Configuration reload request received, reloading configuration;
Dec 25 00:00:00 server syslog-ng[1176]: Configuration reload finished;
Dec 26 00:00:00 server syslog-ng[1176]: Configuration reload request received, reloading configuration;
Dec 26 00:00:00 server syslog-ng[1176]: Configuration reload finished;

I am a Windows SDK Software Developer of Windows desktop applications, Java Web Middle Ware Developer, and HTML, CSS, and JavaScript Front End Developer, with very light knowledge of using the Linux command line, barely sufficient to deploy my Web Applications into Tomcat, WebLogic, WebSphere, and Fuse application servers. When I purchased the FreeNAS server, I thought it would be easy to manage but instead found myself in a nightmare of confusion. I needed more Linux knowledge, so I was willing to put great effort into FreeBSD, which I did for about a year of casual weekends. I found very little translated into Linux for my job, so I stopped applying myself to FreeBSD. My practical Linux skills laid stagnated so far as have been needed by my particular needs in my Software Developer career. That said. You can give me instructions, and I will likely have the aptitude to follow instructions, understand definitions, and so on, but will not know enough of the specifics to get the job done myself. I'm a software developer knowing C/C++, Java, JavaScript, 6502 assembly. (Yes, that's right—Assembly for an 8-bit 6502 on a Commodore 64 back in the '80s. I tell my kids I'm old as dirt.) I know software developer kits and application frameworks, but nearly no system management, system admin, hardware configuration, OS configuration. I'm not far down the Dev/Ops path yet.

Would you mind helping me save 25 years of family photos and home videos?

I purchased a FreeNAS Mini from iXSystems. I think the hardware and software support hot swap. So when the new drive comes in the mail, I pop the old drive out and slide the new drive in. And TrueNAS begins to block by block copy the existing drive with data to the new drive. I will see no interruption of service from the outside. A couple of days later, I will receive an email from my server saying the two halves of the Mirror are once again in sync. No command line nor GUI actions to take. Ok, I know better. That's an extreme. I've been reading the manuals for an hour now, but I still don't know what is wrong with the Mirror. I only know that something is wrong, and I'm not sure precisely what variation of the documented procedure I take. I say it that way because a facade for something fundamentally complicated never goes as planned. I've been in software development for 30+ years. Stuff happens, and today I don't feel prepared.

Where do I look to analyze the situation? I was on the "Pools" page. It says "volume-1" is "degraded," but little more. Then articles around the web have me typing in "diskinfo" and "zsf" commands on the command line. I'm following instructions. I know I'll get there. But how many weekends will it take? A process consuming a lot of my time is not what I promised my wife when I spent $3000 on this machine. I think I oversold myself 6 years ago when I described to my wife my effort when a drive eventually failed. Maybe I should just go back to the TrueNAS docs. When I left it, I was reading a lot that didn't seem to apply, but at the same time I was not positive as to what I can and can not do when the new drive arrives in the mail.

Where is my mentor from when I was 20? I wonder if he is alive today? Can anyone with experience step me through this my first time?
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
When I purchased the FreeNAS server, I thought it would be easy to manage but instead found myself in a nightmare of confusion. I needed more Linux knowledge, so I was willing to put great effort into FreeBSD, which I did for about a year of casual weekends. I found very little translated into Linux for my job, so I stopped applying myself to FreeBSD.

Well, right. Linux isn't FreeBSD. But both of them have significant similarities, as Linux is a UNIX clone, and FreeBSD is a UNIX lineal descendant. However, working with these systems doing networking and storage are generally arcane arts that do not have large practical use to an application developer. So, not really shocked that this didn't work out differently.

I tell my kids I'm old as dirt.

Ok. Well, I still sport a Synertek SYM-1 (~1975) as decoration in my office, so.

Anyways, "DEGRADED" merely means that you've got a developing problem. That's consistent with the quoted stuff above.

So, here's my recommendation.

Would you mind helping me save 25 years of family photos and home videos?

Yes. This is important. Before you do anything, I strongly suggest you go out and make a BACKUP.

ZFS provides redundant storage for your stuff to save you from a disk failure. This is not the same thing as a backup. Go out and get something like an external HDD. Best Buy has the WD EasyStore 8TB's for $159 right now, this seems like it'd be big enough to fit everything you have. Back up all your precious bits to the EasyStore. Set it aside. Breathe a sigh of relief. Be aware that lots of "smaller" HDD's sold these days are "SMR" drives, and they really suck. It's better to buy an 8TB or larger HDD because of this.

Once you've done that, you're going to want to replace, what looks to me like ada1, and you'll want to replace that with a new drive. There's more to be discussed down that road, which include possible options for reorganizing or redesigning your pool, such as moving to RAIDZ2, or just to a three-way mirror design. There's no real reason you should have to live in fear of a single disk failure toasting your 25 years of photos and videos.

But make a backup first, while your situation is stable (even if degraded).
 

Vega Alpha

Dabbler
Joined
Mar 19, 2016
Messages
17
First of all, thanks for responding to my request. I'll be paying close attention and doing a reasonable amount of homework.

About the backup:

My existing backup drive has gone bad, and being laid off Nov. 1st, 2020 today; buying a replacement drive for the mirror is all I can afford this Christmas season. But I'm not entirely careless. I have a volume-2 which is mirrored and has enough space to hold a copy of the failing volume-1. The future is uncertain for me at this time.

So to implement the backup, today I'm reading about "rsync," and I'm beginning to think it is a better way for me to make a backup of volume-1 to volume-2 since perhaps all file attributes will be preserved which will not be the case if I backup using "cp".

My backup in the past was just expendable files with ordinary permissions. I remember struggling with exFAT and FreeNAS's lack of support of it. I gave up on that, and then I tried an NTFS filesystem, but Windows and FreeNAS fought over who owned the NTFS filesystem. Only one OS or the other could see the files. The internet at the time said I was mistaken, so I was confused and without resolution. Eventually, I was able to handle that backup but not gracefully. I still worked very hard for the backup I ultimately got, and I don't recall every detail of how I did it. Painfully slowly, I copied all the files over a Samba share, but each file lost its permissions.

Today the volume I need to backup is a system volume. It contains all of my jails, plugins, user directories, and everything else except the actual boot drives. Permissions are likely more complex than before and of greater importance to the stability and security of TrueNAS OS than before. There are also possible things I'm not aware of as well. Like once at work, when I thought I was getting the hang of things, I ran into SELINUX security on a Red Hat system joined to a Windows Domain. Very embarrassing.

It seems like FreeNAS likely includes a backup and restore solution, but I haven't found it. There are some Community plugins relating to backups but reading the docs doesn't give me confidence the plugins are meant to work in my case, and I can afford it at the same time. So I will continue studying rsync until I hit a blocker, all while checking this forum each time I recall I should do so.
 

Vega Alpha

Dabbler
Joined
Mar 19, 2016
Messages
17
I'm surprised no one has posted since last I looked. I'll report my progress as though others are watching to step in before I stumble and lose data.

I failed my stated goal in my last report. My task was to learn how to use "rsync" entirely to backup my system volume and retain permissions and other file attributes. Having failed that in a reasonable time, I changed my priority to copying my data on the degraded "volume-1" to the online mirrored "volume-2". I finished this task, including changing all Samba shares to point to the new locations on "volume-2", and I reinstalled my Plex server with mount points to "volume-2". From outside the TrueNAS box, my family doesn't notice anything has changed, even though so much data is now safe. But my business-related data is not secure, and I don't know how to make it safe. Please read on.

I don't fully understand what remains to be preserved. I do know part of what remains.

1. My Source Code
I have source code stored in two jails. One is an iocage jail running a community GitLab plugin. The other is not a plugin and is an ancient warren jail where I installed Gogs, another git server. I intend to move all of the sources in Gogs out and into GitLab. Then I can delete the Gogs jail. For now, I will clone all of the sources to my laptop, and then it will not matter what becomes of the GitLab and Gogs jails. Is there a better way? Can someone help with this?

2. PostgreSQL Database Schema
I have a database I created. It has no data. It's just a bunch of tables with many types, constraints, and relational information. It is so intricate that it resists being documented well enough to reproduce manually. I've exported SQL scripts, which look complete, but I want to test those scripts at least once before I trust them so completely that I'm willing to allow the destruction of the original database. For now, I want to preserve the database as a record of its design. I don't know how to backup this jail and restore it later. Can someone help with this?

3. User directories.
The OS provides 5 user directories. I need help backing up and restoring these. Can someone help with this?

If TrueNAS keeps more information on the drive that needs to be backed up I don't know about it. I need help getting this information.

I'm having all this trouble I think because I don't know any better than I need to perform a patchwork backup. I'm using a custom backup solution for each kind of data on my degraded volume. This has left me in a bad place because the user manual doesn't go into the low-level depths of the system where the user shouldn't be going. I see high-end and expensive solutions or solutions requiring a high level of knowledge before entry. Or perhaps Christmas, financial problems, visiting relatives who all know better than I what I should be doing, has helped me miss the obvious clear and concise TrueNAS documentation that an ordinary software developer with no sysadmin experience can follow to painlessly backup and restore his FreeNAS Mini. Sorry for the whining. It's been rough lately and I really need my NAS to "just work," and I don't seem to have the knowledge to make that so.

If TrueNAS simply had a backup and restore that was trivially accessible. I don't fully understand snapshots, but some experts tell me that it's another form of redundancy except I control by giving the mechanism a hint that now is a good time because the dataset is currently more valuable than at other times. So If we had a button on the Windows desktop, or on the free nase Dashboard so that without drilling in through layers of UI I could just quickly click and the NAS would snapshot the current state to an external drive. Backing up the Windows computer is a future project. Right now I'm talking about the NAS.

When I'm working, and I reach a state of mind called "the flow" I'm hyper-productive. This is true of many people. The state is fragile and when I reach a valuable state that should be a snapshot I'm not going to disrupt the flow I'm in. The current valuable state of the data that makes up the project is not as valuable as the "states" of the project I might be able to reach in the next few minutes or hours. Sounds crazy, but true. So these valuable states are never backed up.

I'm rambling. I allowed it only because I imagined it might be helpful. Maybe someone else in this forum has already said something similar. If so the statistical likelihood the idea is desirable by an important group of people increases. For most years TrueNAS has been great. If I can at least say something helpful, then I should.

Happy New Year will soon be here!
 
Last edited:

Vega Alpha

Dabbler
Joined
Mar 19, 2016
Messages
17
I've never used forums much and never really had the luck that I see so abundant for others. Due to the abundance I know it's me or my presentation or my timing. Something.

Regardless, I wish you guys the best. May your TrueNAS be true and your New Year be... Um, I'll stop.

All the best!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If you have a disk in a mirrored pool that's failed/failing, replace the disk--the manual for your version of Free/TrueNAS (which you haven't bothered to tell us) will tell you how. Note: The relevant feature will always be called "Replace"--if you're about to "add" or "extend", you're doing the wrong thing.
 

Vega Alpha

Dabbler
Joined
Mar 19, 2016
Messages
17
My FreeNAS Mini version FREENAS-MINI-2.0 is up to date with the latest TrueNAS OS, which at the time of this writing is TrueNAS-12.0-U7.

I have followed the documentation to replace a disk but hit a blocker. The step to "offline" the bad drive fails. The documentation describes what to do if the "offline" operation fails, but only when the error message is "Disk offline failed - no valid replicas." I tried to scrub the volume anyway, just in case, it would help. It did not. The status of the disk remains "FAULTED." The TrueNAS documentation does not offer a resolution on the page "Documentation Hub/TrueNAS CORE and Enterprise/Storage/Disks/Replacement." I do not see a clear path to replacing my "FAULTED" drive.

I have successfully moved data in SMB shares on the "Degraded" volume-01 mirror to the "Online" volume-02 mirror. Along the way, I have learned there is no sound way to mount a drive in TrueNAS that can later be plugged into a Windows system and then accessed without encountering another concern. The data hidden within the jails and the jails themselves still do not have backups that I know how to restore. I'm stuck.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The step to "offline" the bad drive fails. The documentation describes what to do if the "offline" operation fails, but only when the error message is "Disk offline failed - no valid replicas."
So you do not get the message of "no valid replicas"? What error do you get? What does your pool status page show?
 

Vega Alpha

Dabbler
Joined
Mar 19, 2016
Messages
17
I don't get an error message. Therefore you are correct; I do not see a "no valid replicas" message anywhere on the screen.
I only know the "OFFLINE" step failed because the drive status does not transition to "OFFLINE"

The disk ada1p2 status is "FAULTED"
The volume-01 status is "DEGRADED"

The volume is mirrored, and the second drive ada2p2 is "ONLINE"
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Status Faulted is already inclusive of offline... no need to do that step.
 
Top