Critical Alert

usopkhan

Dabbler
Joined
Jun 10, 2020
Messages
15
Hi guys... i came across this error recently...

CRITICAL
Device: /dev/ada4, 1 Currently unreadable (pending) sectors.

then

Pool khan1 state is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected..



What does this mean..? What should i do now... i am on Raid 0 to maximize space.
@ Pools it stated Healty,,,

Pools

khan1 check_circle HEALTHY: (90%) Used / 2.08 TiB Free


Would appreciate some advice.. thanks guys.

Version: FreeNAS-11.3-U3.2
Intel(R) Pentium(R) CPU G3250 @ 3.20GHz
ASUS VANGUARD B85
32GB RAM
26 TB storage
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
If what you mean with "on RAID-0" is that there is no redundancy in your vdev, start making a backup NOW. You may still loose data, but hopefully can safe some/most.
 

usopkhan

Dabbler
Joined
Jun 10, 2020
Messages
15
Thanks for the advice.
I do need some assistance with that... how do make a backup..? Thanks

Also , here's some output from shell :-


root@freenas[~]# zpool status -v
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:03 with 0 errors on Wed Sep 8 03:45:03 2021

config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
ada1p2 ONLINE 0 0 0
errors: No known data errors

pool: khan1
state: ONLINE

status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A

scan: scrub repaired 0 in 0 days 17:42:11 with 0 errors on Sun Aug 15 17:42:12 2021

config:
NAME STATE READ WRITE CKSUM
khan1 ONLINE 2 0 0
gptid/d6c9a620-afb1-11ea-8689-08626644cb6b ONLINE 0 0 0
gptid/d6f46e21-afb1-11ea-8689-08626644cb6b ONLINE 0 0 0
gptid/eef58609-afdf-11ea-9141-08626644cb6b ONLINE 2 0 0
gptid/ef0103ad-afdf-11ea-9141-08626644cb6b ONLINE 0 0 0
gptid/a9040569-bdee-11eb-bb60-08626644cb6b ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

/mnt/khan1/media1/Movies/Sicario.2015.2160p.BluRay.x265.10bit.SDR.DTS-HD.MA.TrueHD.7.1.Atmos-SWTYBLZ/Sicario.2015.2160p.BluRay.x265.10bit.SDR.DTS-HD.MA.TrueHD.7.1.Atmos-SWTYBLZ.mkv

What should i do now bro...?
should i just delete or replace the file..? will that solve the problem...?

Thanking you in advance.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
It means that gptid/eef58609-afdf-11ea-9141-08626644cb6b one of your HDD's through a hissy fit. The file in question may or may not be toast (play it to see) and your pool may or may not be at extra risk depending on why it threw a hissy fit

However the major issue is that you have no redundancy / resiliancy. A striped vdev is only to be used when the data is temporary and doesn't matter if you lose it. If you lose a disk, you lose the pool. Note that if you replace a disk, then you also lose the pool.

So, what should you do now....

1. Buy several more disks (2 minimum, preferably 4) as you need resiliancy and extra disk space (90% full = time to upgrade)
2. Copy your movie selection (and anything else) somewhere else
3. Delete the pool
4. Install new disks
5. Create a new pool, with RAIDZ2 (or similar) and create new datasets
6. Copy your data back
7. Oh, and one more thing - Post your full hardware spec, like disks and HBA adapter (if one is in use)
 

usopkhan

Dabbler
Joined
Jun 10, 2020
Messages
15
Version: FreeNAS-11.3-U3.2
Intel(R) Pentium(R) CPU G3250 @ 3.20GHz
ASUS VANGUARD B85
32GB RAM

26 TB storage

ada0
khan1
ZR10X3RS
5.46 TiB

ada1
Boot Pool
50026B768361C02A
111.79 GiB

ada2
khan1
ZDH5F6GF
3.64 TiB

ada3
khan1
WD-WCC7K6XTJPSC
3.64 TiB

ada4
khan1
WD-WX11D575Z9SA
5.46 TiB

ada5
khan1
WD-WX11D27HAZ87
5.46 TiB

I am planning to get a hba adapter to add more drives since i used up all my sata ports already..
Just don't know which one to get. Budget is quite low right now...

Appreciate your help.. Thanks a million...!
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
You posted the disk serial numbers, not model numbers.
Get an LSI board from a system dismantler rather than an el cheapo from China and make sure its flashed to IT Mode before you set anything up. If you get an external case for extra drives, carefully read the various guides on here before choosing one.

What country are you based in?
 

usopkhan

Dabbler
Joined
Jun 10, 2020
Messages
15
Sorry about the serial numbers.... I am based in Malaysia...

ada0
khan1
Seagate IronWolf
5.46 TiB

ada1
Boot Pool
Phison Driven SSDs (KINGSTON)
111.79 GiB

ada2
khan1
Seagate IronWolf
3.64 TiB

ada3
khan1
Western Digital Red
3.64 TiB

ada4
khan1
Western Digital Red
5.46 TiB

ada5
khan1
Western Digital Red
5.46 TiB

Thanks Again Bro
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
And the model numbers of the WD Red's?
Are they SMR or CMR?
 

usopkhan

Dabbler
Joined
Jun 10, 2020
Messages
15
the WD reds -

WD60EFRX -6TB ( x2 )
WD40EFRX – 4TB ( x1 )

EFRX means they are older drives i guess (64MB cache) - CMR i think..

Thanks @NugentS
 
Last edited:

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
As to backup: The easiest approach will be to do this via your client/Windows machine. Determine which files are critical to you and get one or several disks (USB will be easiest) and dump stuff there. I would not use Windows Explorer's copy but robocopy (command line, comes with Windows 10) or something like SyncBack. Not ideal, but speed is of essence.
 

usopkhan

Dabbler
Joined
Jun 10, 2020
Messages
15
Thanks @ChrisRJ ... so, by backup means - me copying my files elsewhere , like duplicating them..?

Another question is... if i managed deleting or replacing the files with error .. will it solve my (critical error) problem..?
* I just tried playing the file - its working without any problem.

Lastly, does this error means the drive affected is dying soon...?

Thanks a million guys
 
Last edited:

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
@ChrisRJ is absolutely correct - you need to copy the files somewhere else, if you want to keep them

"Another question is... if i managed deleting or replacing the files with error .. will it solve my (critical error) problem..?"
If you delete the files involved then this error will go away (probably)

"Lastly, does this error means the drive affected is dying soon...? "
Maybe - when did you last run a long smart test on the drives? HDD's normally don't suddenly die - they give warnings (normally ignored) and then die - the process can take days, weeks, months or even years. The major problem is that as you are running striped with no resiliency, if an error does occur you have no way of fixing any corruption and that the loss of any disk - loss of all data on the entire pool.

You have 3*6TB and 2 * 4TB TB Drives for a total of RAW = 26TB
With Z2 you would have 12TB useable (in round numbers to nearest TB)
With Z1 (which this forum does not like, for good data integrity reasons), 16TB useable

You really need to plan your storage, with resiliency in mind (assuming you care about the data) and then get the disks and ports required. TrueNAS and ZFS treat data integrity as the most important thing going, but you aren't letting it do most of its magic with RAID0/Striped. It can't fix errors and it can't survive a single disk failure
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
the WD reds -

WD60EFRX -6TB ( x2 )
WD40EFRX – 4TB ( x1 )

EFRX means they are older drives i guess (64MB cache) - CMR i think..

Thanks @NugentS

EFRX is probably CMR. WD Red did sneak some SMR into the EFRX range at your drive size - but I am not sure how to tell
 

usopkhan

Dabbler
Joined
Jun 10, 2020
Messages
15
@ChrisRJ is absolutely correct - you need to copy the files somewhere else, if you want to keep them

"Another question is... if i managed deleting or replacing the files with error .. will it solve my (critical error) problem..?"
If you delete the files involved then this error will go away (probably)

"Lastly, does this error means the drive affected is dying soon...? "
Maybe - when did you last run a long smart test on the drives? HDD's normally don't suddenly die - they give warnings (normally ignored) and then die - the process can take days, weeks, months or even years. The major problem is that as you are running striped with no resiliency, if an error does occur you have no way of fixing any corruption and that the loss of any disk - loss of all data on the entire pool.

You have 3*6TB and 2 * 4TB TB Drives for a total of RAW = 26TB
With Z2 you would have 12TB useable (in round numbers to nearest TB)
With Z1 (which this forum does not like, for good data integrity reasons), 16TB useable

You really need to plan your storage, with resiliency in mind (assuming you care about the data) and then get the disks and ports required. TrueNAS and ZFS treat data integrity as the most important thing going, but you aren't letting it do most of its magic with RAID0/Striped. It can't fix errors and it can't survive a single disk failure
Great explanation... i totally understand it now... appreciate the help...
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
"Another question is... if i managed deleting or replacing the files with error .. will it solve my (critical error) problem..?"
If you delete the files involved then this error will go away (probably)
I have to disagree here. Possibly the error message will not be displayed anymore. But the error condition (=faulty hard disk) will stay. Deleting files is in absolutely no way going to help.
"Lastly, does this error means the drive affected is dying soon...? "
Maybe - when did you last run a long smart test on the drives? HDD's normally don't suddenly die
If that has been the case for you, consider yourself lucky. I have lost 6 hard drives over the last couple of years. They all suddenly had unrecoverable errors out of the blue. I am running daily SMART tests and bi-weekly ZFS scrubs, and still ...
Great explanation... i totally understand it now... appreciate the help...
I don't want to be rude or anything. And I am certain you honestly mean that. But the questions you have asked here, indicate otherwise from where I stand. My advice would be to find someone local who can help at your place.

In the meantime, leave the machine running and DO NOT TURN IT OFF. This would come with a very high risk of killing the hard disk completely, since spinning up a hard disk is the most stressful operation for it.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
"I don't want to be rude or anything. And I am certain you honestly mean that. But the questions you have asked here, indicate otherwise from where I stand. My advice would be to find someone local who can help at your place."

To be honest - I was thinking that as well.

ChrisRJ and I may not totally agree 100% - but we both think (know) that you are potentially in a world of "I have no data" soon.
 

usopkhan

Dabbler
Joined
Jun 10, 2020
Messages
15
I have to disagree here. Possibly the error message will not be displayed anymore. But the error condition (=faulty hard disk) will stay. Deleting files is in absolutely no way going to help.

If that has been the case for you, consider yourself lucky. I have lost 6 hard drives over the last couple of years. They all suddenly had unrecoverable errors out of the blue. I am running daily SMART tests and bi-weekly ZFS scrubs, and still ...

I don't want to be rude or anything. And I am certain you honestly mean that. But the questions you have asked here, indicate otherwise from where I stand. My advice would be to find someone local who can help at your place.

In the meantime, leave the machine running and DO NOT TURN IT OFF. This would come with a very high risk of killing the hard disk completely, since spinning up a hard disk is the most stressful operation for it.

Appreciate your kind advice here... i really mean it. I understand the limitations of my setup. It started as a fun project, then suddenly i got used to it - expanding it bit by bit.

I know I should not set it up wit raid 0... but at this moment, this is what i could afford...my limitation is my budget...

i need to do a back up
get a LSI hba card
get more disks
do a proper raid setup....

Exiting times... I'm glad everything is still okay now... I pray the disk will stay alive until i have enough budget...hehe

Thanks @ChrisRJ @NugentS
 
Last edited:
Top