Hard Drive Troubleshooting - Massive Failures - Need Help Isolating the Problem(s)

arameen · Oct 4, 2017

I don't know why it is so hard to use FreeNAS. And using i mean work with it as soon as problems occur.
I always find it hard to find answer. Last time i almost lost my pool but thanks to someones help here i managed to pull out my data from my encrypted pool. lesson learned so no more encrypted disks with FreeNAS.
Now am using FreeNAS-11.0-U4 (54848d13b) and am on the edge of loosing my pool again. i have 2 pools, one with 5 and one with 11 disks, both raidz3.
What happened is FreeNAS has lately been complaining about drives giving different read and write errors. Thought the drive was dying so started replacing. next drive FreeNAS had issues with i did replace. suddenly FreeNAS had issues with those new replaced drives. so I got even more drives. and FreeNAS kept giving me errors. long story short and after RAM memory test, some cable replacement and sata power switching that didn't help, my zpool output is as following:

Code:

status: One or more devices is currently being resilvered.  The pool will															
		 continue to function, possibly in a degraded state.																		 
action: Wait for the resilver to complete.																						 
  scan: resilver in progress since Thu Oct  5 00:21:05 2017																		 
		 410G scanned out of 34.8T at 255M/s, 39h22m to go																			
		 15.3G resilvered, 1.15% done																								
config:																															  
																																	
		 NAME											STATE	 READ WRITE CKSUM												
		 Secondary_Raidz3								DEGRADED	 0	 0   772												
		   raidz3-0									  DEGRADED	 0	 0 3.02K												
			 17620392916775898278						UNAVAIL	  0	 0	 0  was /dev/gptid/8275e396-a83c-11e7-9cee-002590f5b
804																																  
			 gptid/3a44142c-931c-11e7-b895-002590f5b804  ONLINE	   0	 0	 0  (resilvering)								  
			 gptid/33c047e7-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 gptid/34749735-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 6370505857967419013						 OFFLINE	  0	 0	 0  was /dev/gptid/3536bf51-2292-11e7-9626-002590f5b
804																																  
			 gptid/35e2d6ec-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 gptid/368b679d-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 gptid/3730ee56-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 gptid/37de7e53-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 replacing-9								 UNAVAIL	  0	 0	 0												
			   5660221525628801207					   UNAVAIL	  0	 0	 0  was /dev/da8p2								
			   10093850100708201031					  UNAVAIL	  0	 0	 0  was /dev/gptid/4f5f3806-a952-11e7-a2e0-002590f5b
804																																  
			gptid/39778368-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors			

while my working pool looks like this

		NAME											STATE	 READ WRITE CKSUM												
		Main_Raidz3									 ONLINE	   0	 0	 0												
		  raidz3-0									  ONLINE	   0	 0	 0												
			gptid/97e250ef-3b33-11e7-b14d-002590f5b804  ONLINE	   0	 0	 0												
			gptid/f68c027f-2124-11e7-a3b9-002590f5b804  ONLINE	   0	 0	 0												
			gptid/526e0a14-1fd8-11e7-baad-002590f5b804  ONLINE	   0	 0	 0												
			gptid/8d745ef0-2052-11e7-a7ad-002590f5b804  ONLINE	   0	 0	 0												
			gptid/42e6847a-3a70-11e7-b20f-002590f5b804  ONLINE	   0	 0	 0

and this is glabel status output

Code:

gptid/33c047e7-2292-11e7-9626-002590f5b804	 N/A  ada0p2																		
gptid/34749735-2292-11e7-9626-002590f5b804	 N/A  ada1p2																		
gptid/35e2d6ec-2292-11e7-9626-002590f5b804	 N/A  ada2p2																		
gptid/368b679d-2292-11e7-9626-002590f5b804	 N/A  ada3p2																		
gptid/8d745ef0-2052-11e7-a7ad-002590f5b804	 N/A  ada4p2																		
gptid/3730ee56-2292-11e7-9626-002590f5b804	 N/A  da0p2																		  
gptid/97e250ef-3b33-11e7-b14d-002590f5b804	 N/A  da1p2																		  
gptid/526e0a14-1fd8-11e7-baad-002590f5b804	 N/A  da2p2																		  
gptid/37de7e53-2292-11e7-9626-002590f5b804	 N/A  da3p2																		  
gptid/42e6847a-3a70-11e7-b20f-002590f5b804	 N/A  da4p2																		  
gptid/f68c027f-2124-11e7-a3b9-002590f5b804	 N/A  da5p2																		  
gptid/39778368-2292-11e7-9626-002590f5b804	 N/A  da6p2																		  
gptid/3a44142c-931c-11e7-b895-002590f5b804	 N/A  da7p2																		  
							  label/efibsd	 N/A  da8p1																		  
gptid/4462538f-a509-11e7-9fe5-d92aa311b103	 N/A  da8p1																		  
gptid/ec05a778-a561-11e7-ba31-002590f5b804	 N/A  da9p1																		  
gptid/3a1e0ab1-931c-11e7-b895-002590f5b804	 N/A  da7p1

So as of now, FreeNAS rebooted several time and is not completing the long resilvering. I can not pull out any disk because it seems the parity is on minimum level. I say seem because it is very hard to read what drive is online, offline, or having problems. I did switch between checking gptid with glabel status and disk list in GUI but always end somehow not getting the desired disk with right serial out of the server.
One reason I find it very hard to handle FreeNAS as soon as there is some issues. as of now its impossible for me to understand what is going on except the one newly added disk is being resilvering. hopefully FreeNAS wont reboot before it finishes this resilvering.

Does anyone understand what drive is online, what drive is offline? or what the h..l is going on? FreeNAS complaining about several newly bought ironwolf nas drives.

Ericloewe · Oct 4, 2017

I'm not sure why you're blaming FreeNAS, but your first pool looks like a complete disaster. It's probably bad cabling, a bad controller and/or expander or bad power. Less likely, an absolutely atrocious bunch of hard drives.

rs225 · Oct 4, 2017

I would first look very carefully at the cabling and see if there is anything in common between the pool with problems, and the pool that seems fine.

If not that, this looks like a bad system component. Probably not RAM, since you've tested that, but there can be others. Are any of your disk controllers slot-based? If so, can you change the slot?

The worst symptom is the reboots and the 700 top level checksum errors. If those checksum errors aren't 'fixed' by finding the problem and correcting it, then the pool will never complete resilver. That you have seen reboots is also not encouraging; that suggests there is bad metadata in the pool (this would only be from a problem in the server) and it is panicking when it hits that. If that is the case, your only resolution is going to be to back what you can from that pool and then re-create it, hopefully after having found the underlying cause.

I would export the Main pool and physically remove the drives from the system for now, then reboot and see how the second pool looks. If any drives are missing, shut down and move them into the newly emptied bays.

SweetAndLow · Oct 4, 2017

Read the forum rules and post the correct information to get real help. Mainly we need your hardware specs and how your drives are attached. Your issues are the result of not following directions and reading. FreeNAS is not idiot proof it requires you to learn and want to know how things work a little bit. You should also burn in all hdd's this includes the replacement because there is a very good chance that they will die during the resilver which makes a bad thing worse.

gptid/39778368-2292-11e7-9626-002590f5b804 <= why is this not indented correctly in your output you just gave us? Was that a typo or is this drive striped? Can you take a screenshot Storage -> volume status?

You also have 6 drives that appear to be failing or having errors, this means you have lost data already. There are also some drives that have failed and are marked UNAVAILABLE.

garm · Oct 4, 2017

SweetAndLow said:
gptid/39778368-2292-11e7-9626-002590f5b804 <= why is this not indented correctly in your output you just gave us? Was that a typo or is this drive striped? Can you take a screenshot Storage -> volume status?

You beat me ^^ was just about to ask that myself. It looks like there is a striped drive in that pool.

arameen · Oct 4, 2017

Ericloewe said:
I'm not sure why you're blaming FreeNAS, but your first pool looks like a complete disaster. It's probably bad cabling, a bad controller and/or expander or bad power. Less likely, an absolutely atrocious bunch of hard drives.

I find it very hard to troubleshoot FreeNAS. And there is no guides either for that, neither here or anywhere else online.
And as it looks for me now. FreeNAS is doing something wrong, despite if some hardware is failing. the gptid and serials are not synced and should. I even discovered a new disk serial of a disk that is not in my system. has this been consistent then it would have been easier for me.
Add to that the fact that FreeNAS is supposed not to require any skills of FreeBSD and handling it though the GUI is enough. That is koto correct as skills in FreeBSD are needed both for me and others. So why is a Windows guy like me using FreeNAS. Well because of zfs of course. Not that I enjoy getting in those situations.

Anyway. I doubt it's the cable. I already replaced the cable months ago.
Hardly think so many drives are failing. Am talking about triple sets of several drives causing issues as son as being introduced to the system.
It's neither the RAM.
The PSU ? Yes. Not sure how to test that to confirm. But then why is my primary pool of 5 working perfectly ?
The IBM m1505. Yes. That is my guess too. because this failing pool is partially connected to the m1505.
Hopefully you guys who are experts can see something that I can not see and figure out where this is pointing or what could be the problem here.

SweetAndLow · Oct 4, 2017

How are you pulling out the wrong drives? The serial numbers are written on the drive and listed in the GUI. You can also get the serial number from smartctl. There is no mismatch in serial numbers if you use smartctl. I think this is the recommended way anyways.

Can you provide a screen shot of your storage layout? Also run smartctl -a /dev/daX and adaX on all drives and provide that info please. I would like to see if your disks are bad or maybe it's something else.

arameen · Oct 5, 2017

rs225 said:
I would first look very carefully at the cabling and see if there is anything in common between the pool with problems, and the pool that seems fine.

If not that, this looks like a bad system component. Probably not RAM, since you've tested that, but there can be others. Are any of your disk controllers slot-based? If so, can you change the slot?

The worst symptom is the reboots and the 700 top level checksum errors. If those checksum errors aren't 'fixed' by finding the problem and correcting it, then the pool will never complete resilver. That you have seen reboots is also not encouraging; that suggests there is bad metadata in the pool (this would only be from a problem in the server) and it is panicking when it hits that. If that is the case, your only resolution is going to be to back what you can from that pool and then re-create it, hopefully after having found the underlying cause.

I would export the Main pool and physically remove the drives from the system for now, then reboot and see how the second pool looks. If any drives are missing, shut down and move them into the newly emptied bays.

Well replaced the cable once.
The troubled pool is connected to a controller. The healthy pool is connected to the motherboard.

rs225 said:
I would first look very carefully at the cabling and see if there is anything in common between the pool with problems, and the pool that seems fine.

If not that, this looks like a bad system component. Probably not RAM, since you've tested that, but there can be others. Are any of your disk controllers slot-based? If so, can you change the slot?

The worst symptom is the reboots and the 700 top level checksum errors. If those checksum errors aren't 'fixed' by finding the problem and correcting it, then the pool will never complete resilver. That you have seen reboots is also not encouraging; that suggests there is bad metadata in the pool (this would only be from a problem in the server) and it is panicking when it hits that. If that is the case, your only resolution is going to be to back what you can from that pool and then re-create it, hopefully after having found the underlying cause.

I would export the Main pool and physically remove the drives from the system for now, then reboot and see how the second pool looks. If any drives are missing, shut down and move them into the newly emptied bays.

Well I doubt its cable. I replaced the cable since before and even tried switching power cable between the drives to see if anything is wrong with any powercable. but the problem didnt seem to follow any powercable once switched between different drives.

The healthy pool is connected to the motherboard. While the troubled pool is partially (4 drives) connected to an IBM ServeRAID M1015 and the rest connected to the motherboard.
I did long ago move the IBM ServeRAID M1015 to another slot.
One of my suspicitions is that the card itself its failing rather than it would be related to the slot itself. Ofcourse this is so extremely hard to figure out when having only one card. I already wasted a lot of money getting new drives. Don’t wanna invest in another card unless I know more.
Well I think I will go with your suggestion for now, start pulling out the data from the troubled pool. Last time I did it, it took weeks and I was hoping not to need to do it ever again. I usually do that by copying the files to drives in windows. Then copying it back once the pool is ok. Don’t know of any faster are better way than that

any ideas ?
I will even go with your second suggestion. I will disconnected the main and healthy pool, connect the trouble one to the motherboard only avoiding the M1015. Hopefully there is enough slots for that on the motherboard.
Then I will let the pool run for few days and see if there is any issues.
The working healty main pool makes me think that my be Quiet PSU is ok

SweetAndLow · Oct 5, 2017

arameen said:
Well replaced the cable once.
The troubled pool is connected to a controller. The healthy pool is connected to the motherboard.

Well I doubt its cable. I replaced the cable since before and even tried switching power cable between the drives to see if anything is wrong with any powercable. but the problem didnt seem to follow any powercable once switched between different drives.

The healthy pool is connected to the motherboard. While the troubled pool is partially (4 drives) connected to an IBM ServeRAID M1015 and the rest connected to the motherboard.
I did long ago move the IBM ServeRAID M1015 to another slot.
One of my suspicitions is that the card itself its failing rather than it would be related to the slot itself. Ofcourse this is so extremely hard to figure out when having only one card. I already wasted a lot of money getting new drives. Don’t wanna invest in another card unless I know more.
Well I think I will go with your suggestion for now, start pulling out the data from the troubled pool. Last time I did it, it took weeks and I was hoping not to need to do it ever again. I usually do that by copying the files to drives in windows. Then copying it back once the pool is ok. Don’t know of any faster are better way than that any ideas ?
I will even go with your second suggestion. I will disconnected the main and healthy pool, connect the trouble one to the motherboard only avoiding the M1015. Hopefully there is enough slots for that on the motherboard.
Then I will let the pool run for few days and see if there is any issues.
The working healty main pool makes me think that my be Quiet PSU is ok

Are you ignoring me?

arameen · Oct 5, 2017

SweetAndLow said:
Are you ignoring me?

Absolutely not, no reason for doing that. I appreciate all help I can get, specially from the experienced ones like you ;)
I am still at work and replying with my phone, using firefox, is not so easy doing that. Am a guy that allways prefers PCs.
Anyway, I was replying on your 2 posts in one reply:
I did read the forum rules, hopefully I didn’t miss anything. If I did then enlighten me.
Regarding the setup, I think most of it is given below my profile.

I could add how the drives them self are connected and the 2 pools are setup.
One healthy raidz3 pool consisting of 5 disks, all sata connected directly to the motherboard. No problems there at all
One very troubled raidz3 pool consisting of 11 drives
6 x 4TB Seagate NAS drives
3x 8TB Seagate Ironwolf (those 8TB drives are the only as temporary replacement)
2x 4TB Seagate Ironwolf
4 of those drives are connected to the M1015 (IT Mode)
The rest are connected to the motherboard itself

The problems started with those connected to the M1015 but later I had to replace at least one drive connected to the motherboard. So I can definitely NOT say for sure that all my problems are because of the M1015 even if I did suspect it.

As of now status is that I can not access my pool through samba share, it makes FreeNAS reboot. So I am letting it resilver, it will finish in less than 20 hours.
Strange this prior to this I finished adding and resilvering another disk, so thought my problem were almost gone. But it seems there is some disk missing or unavailable all time. There is no sync between gptid and da numers in the GUI.
That is why I have been pulling out wrong disks. I know the disk has a visible serial. But the serial that the gui tells me is wrong doesn’t seem to be the one that FreeNAS itself think is wrong.
Add to that, the fact that several disks are not missing gptid and shown as unavailable.
I don’t want to do anything with FreeNAS now, so can not post any smart results.
But I can see FreeNAS is complaining all the time about one drive, da7, telling me to backup now because of failed smart self-check. Of course I can not replace anything now until the resilvering is finished. At that time I need to know how to know exactly what drive to pull out. Because what I get from the gui doesn’t seem to be what FreeNAS thinks is a certain drive in the pool.
Not sure what you mean with
“gptid/39778368-2292-11e7-9626-002590f5b804 <= why is this not indented correctly in your output you just gave us? Was that a typo or is this drive striped? Can you take a screenshot Storage -> volume status?”
But I can confirm that there has been no stripped drive, the only setup is the one I mentioned earlier.

I will have to wait with running smartctl -a /dev/daX and adaX until the pool is at least in ok state, as of now don’t know if its dead or will die soon. I have lots of new drives ready to replace, if I can only be sure what serial drive to pull out.

Now another weird thing is, what is FreeNAS doing? I mean this pool looks dead in the picture. The GUI is showing only 8 disk while the pool should be 11 as raidz3. so what is it resilvering considering the state of this pool and number of disk left in it ?

SweetAndLow · Oct 5, 2017

There is nothing stopping you for getting smart data. It's a read only operation. I see that disk is not striped which is great. I still think you are confused on the serial number thing, basically smart will tell you what serial number needs to be pulled. You then need to also look at the da number so you can get the right smart data.

SweetAndLow · Oct 5, 2017

You can't use the cli to replace disks in FreeNAS. I'm not sure anyone here can help you because of all the non standard things you have done. I suspect your resilver will never actually finish. Give it some time and see what happens but you will most likely be destroying this pool.

rs225 · Oct 5, 2017

I hope the pool stabilizes when you get the main pool offline. But, the power supply is still a possible problem. When power is borderline for whatever reason, the behavior can get very random.

arameen · Oct 5, 2017

SweetAndLow said:
There is nothing stopping you for getting smart data. It's a read only operation. I see that disk is not stripped which is great. I still think your confused on the serial number thing, basically smart will tell you what serial number needs to be pulled. You then need to also look at the da number so you can get the right smart data.

ok, usually when I pull out I disk I do as following, getting pool status, then typing glabel status, then matching gptid from poolstatus with gpitd and da name in glabel status. Then I use the da number in the gui to match it to the serial number of the disk.
You mean that

SweetAndLow said:
You can't use the cli to replace disks in FreeNAS. I'm not sure anyone here can help you because of all the non standard things you have done. I suspect your resilver will never actually finish. Give it some time and see what happens but you will most likely be destroying this pool.

10 hours left for the resilvering, looks like it will finish.
I already noticed that the reboots happens as soon as I try to access the share. And the resilvering is continuing as soon as FreeNAS rebooted.
For now am just waiting and hoping tomorrow morning that this resilvering is done. question is how to proceed when that is done. I mean its hard to read what is what on that screenshot i posted earlier of the pools.

Non standard things

? what non standard things do you mean i did

?
I only tried to replace disks through the GUI as instructed in the FreeNAS manual. offline then replace.
I guarantee you that I don't know my way around the cli or have skills, am a windows guy who desires zfs. The few commands I know are things i have found on the forum. and replacing disk is not one of them

unless you meant something else I did that is non standard?

arameen · Oct 5, 2017

rs225 said:
I hope the pool stabilizes when you get the main pool offline. But, the power supply is still a possible problem. When power is borderline for whatever reason, the behavior can get very random.

agree lets wait the 10 hours and see what status the pool is in.
but if it is the powersupply, isnt it strange that the main pool consisting of five disks is not affected at all ?
sure it could be just luck, and only a question of time before those would be affected too.
Is there a good way to know if it is the powersupply? except for replacing the powersupply and see what happens ?

Inxsible · Oct 5, 2017

I might have missed it, but couldn't it be the controller, like @Ericloewe mentioned earlier? You said your motherboard connected pool is healthy. It's only the controller connected pool giving you trouble.

arameen · Oct 5, 2017

Inxsible said:
I might have missed it, but couldn't it be the controller, like @Ericloewe mentioned earlier? You said your motherboard connected pool is healthy. It's only the controller connected pool giving you trouble.

sure as of now, it is my main suspect.
as soon as the server is online and the smart data is confirmed ok, i will try to connect the whole pool to the motherboard to see what happens. but before that i need to get the pool in normal statues and do the replacments correctly. Maybe smart data tells that after all it was very bad luck with more than one drive at once, even if i dont think that

rogerh · Oct 5, 2017

arameen said:
sure as of now, it is my main suspect.
as soon as the server is online and the smart data is confirmed ok, i will try to connect the whole pool to the motherboard to see what happens. but before that i need to get the pool in normal statues and do the replacments correctly. Maybe smart data tells that after all it was very bad luck with more than one drive at once, even if i dont think that

As several people have told you, there is every reason to do smartctl -a on every drive (currently powered) now. It puts no stress on the drives, doesn't interfere with data or create a significant system load. And it will tell how many drives are definitely failing, how many can't be connected to, and how many are basically healthy. Easily obtained and very valuable information.

If you care to put *all* the results here (in code tags) I am sure several people will laboriously read through it and confirm any questions you have about the results.

arameen · Oct 5, 2017

Outputs of smartctl -a /dev/daX for all detected drives on the troubled pool.
As mentioned before, pool consists of 11 drives and in a raidz3 configuration.
4 of those drives are SATA connected to an IBM M1015 (IT Mode) while the rest are SATA connected to the motherboard directly. Except for one drive that needed to be replaced, all other troubled drives have been the ones connected to the M1015.
These are Seagate NAS Drives. while the new ones that i did try to insert have been Seagate Ironwolf 8TB

da0 SMART output (this is an older drive)

Code:

=== START OF INFORMATION SECTION === 
Model Family:  Seagate NAS HDD 
Device Model:  ST4000VN000-1H4168 
Serial Number:  S3019PAC 
LU WWN Device Id: 5 000c50 0804fa622 
Firmware Version: SC46 
User Capacity:  4,000,787,030,016 bytes [4.00 TB] 
Sector Sizes:  512 bytes logical, 4096 bytes physical 
Rotation Rate:  5900 rpm 
Form Factor:  3.5 inches 
Device is:  In smartctl database [for details use: -P show] 
ATA Version is:  ACS-2, ACS-3 T13/2161-D revision 3b 
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) 
Local Time is:  Thu Oct  5 23:52:26 2017 CEST 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 
 
=== START OF READ SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 
 
General SMART Values: 
Offline data collection status:  (0x82) Offline data collection activity 
  was completed without error. 
  Auto Offline Data Collection: Enabled. 
Self-test execution status:  (  0) The previous self-test routine completed 
  without error or no self-test has ever 
  been run. 
Total time to complete Offline 
data collection:  (  117) seconds. 
Offline data collection 
capabilities:  (0x7b) SMART execute Offline immediate. 
  Auto Offline data collection on/off support. 
  Suspend Offline collection upon new 
  command. 
  Offline surface scan supported. 
  Self-test supported. 
  Conveyance Self-test supported. 
  Selective Self-test supported. 
SMART capabilities:  (0x0003) Saves SMART data before entering 
  power-saving mode. 
  Supports SMART auto save timer. 
Error logging capability:  (0x01) Error logging supported. 
  General Purpose Logging supported. 
Short self-test routine 
recommended polling time:  (  1) minutes. 
Extended self-test routine
recommended polling time:  ( 500) minutes. 
Conveyance self-test routine 
recommended polling time:  (  2) minutes. 
SCT capabilities:  (0x10bd) SCT Status supported. 
  SCT Error Recovery Control supported. 
  SCT Feature Control supported. 
  SCT Data Table supported. 
 
SMART Attributes Data Structure revision number: 10 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate  0x000f  118  099  006  Pre-fail  Always  -  171867080 
  3 Spin_Up_Time  0x0003  091  091  000  Pre-fail  Always  -  0 
  4 Start_Stop_Count  0x0032  098  098  020  Old_age  Always  -  2213 
  5 Reallocated_Sector_Ct  0x0033  100  100  010  Pre-fail  Always  -  0 
  7 Seek_Error_Rate  0x000f  085  060  030  Pre-fail  Always  -  375074940 
  9 Power_On_Hours  0x0032  086  086  000  Old_age  Always  -  13123 
10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0 
12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  115 
184 End-to-End_Error  0x0032  100  100  099  Old_age  Always  -  0 
187 Reported_Uncorrect  0x0032  100  100  000  Old_age  Always  -  0 
188 Command_Timeout  0x0032  100  099  000  Old_age  Always  -  4295032833 
189 High_Fly_Writes  0x003a  100  100  000  Old_age  Always  -  0 
190 Airflow_Temperature_Cel 0x0022  071  061  045  Old_age  Always  -  29 (Min/Max 23/30) 
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0 
192 Power-Off_Retract_Count 0x0032  099  099  000  Old_age  Always  -  2212 
193 Load_Cycle_Count  0x0032  099  099  000  Old_age  Always  -  2215 
194 Temperature_Celsius  0x0022  029  040  000  Old_age  Always  -  29 (0 16 0 0 0) 
197 Current_Pending_Sector  0x0012  100  100  000  Old_age  Always  -  0 
198 Offline_Uncorrectable  0x0010  100  100  000  Old_age  Offline  -  0 
199 UDMA_CRC_Error_Count  0x003e  200  200  000  Old_age  Always  -  4 
 
SMART Error Log Version: 1 
No Errors Logged 
 
SMART Self-test log structure revision number 1 
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error 
# 1  Short offline  Completed without error  00%  9194  - 
# 2  Extended offline  Completed without error  00%  8946  - 
# 3  Short offline  Completed without error  00%  8937  - 
# 4  Extended offline  Completed without error  00%  8737  - 
# 5  Short offline  Completed without error  00%  8729  - 
# 6  Extended offline  Completed without error  00%  483  - 
# 7  Short offline  Completed without error  00%  475  - 
 
SMART Selective self-test log data structure revision number 1 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS 
  1  0  0  Not_testing 
  2  0  0  Not_testing 
  3  0  0  Not_testing 
  4  0  0  Not_testing 
  5  0  0  Not_testing 
Selective self-test flags (0x0): 
  After scanning selected spans, do NOT read-scan remainder of disk. 
If Selective self-test is pending on power-up, resume after 0 minute delay.

da3 SMART output (this is an older drive)

Code:

=== START OF INFORMATION SECTION === 
Model Family:  Seagate NAS HDD 
Device Model:  ST4000VN000-1H4168 
Serial Number:  W30124AF 
LU WWN Device Id: 5 000c50 08ffe78a9 
Firmware Version: SC46 
User Capacity:  4,000,787,030,016 bytes [4.00 TB] 
Sector Sizes:  512 bytes logical, 4096 bytes physical 
Rotation Rate:  5900 rpm 
Form Factor:  3.5 inches 
Device is:  In smartctl database [for details use: -P show] 
ATA Version is:  ACS-2, ACS-3 T13/2161-D revision 3b 
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) 
Local Time is:  Fri Oct  6 00:01:28 2017 CEST 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 
 
=== START OF READ SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 
 
General SMART Values: 
Offline data collection status:  (0x82) Offline data collection activity 
  was completed without error. 
  Auto Offline Data Collection: Enabled. 
Self-test execution status:  (  0) The previous self-test routine completed 
  without error or no self-test has ever 
  been run. 
Total time to complete Offline 
data collection:  (  107) seconds. 
Offline data collection 
capabilities:  (0x7b) SMART execute Offline immediate. 
  Auto Offline data collection on/off support. 
  Suspend Offline collection upon new 
  command. 
  Offline surface scan supported. 
  Self-test supported. 
  Conveyance Self-test supported. 
  Selective Self-test supported. 
SMART capabilities:  (0x0003) Saves SMART data before entering 
  power-saving mode. 
  Supports SMART auto save timer. 
Error logging capability:  (0x01) Error logging supported. 
  General Purpose Logging supported. 
Short self-test routine 
recommended polling time:  (  1) minutes. 
Extended self-test routine 
recommended polling time:  ( 509) minutes. 
Conveyance self-test routine 
recommended polling time:  (  2) minutes. 
SCT capabilities:  (0x10bd) SCT Status supported. 
  SCT Error Recovery Control supported. 
  SCT Feature Control supported. 
  SCT Data Table supported. 
 
SMART Attributes Data Structure revision number: 10 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate  0x000f  116  099  006  Pre-fail  Always  -  116290760 
  3 Spin_Up_Time  0x0003  091  091  000  Pre-fail  Always  -  0 
  4 Start_Stop_Count  0x0032  100  100  020  Old_age  Always  -  133 
  5 Reallocated_Sector_Ct  0x0033  100  100  010  Pre-fail  Always  -  0 
  7 Seek_Error_Rate  0x000f  081  060  030  Pre-fail  Always  -  132829057 
  9 Power_On_Hours  0x0032  092  092  000  Old_age  Always  -  7244 
10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0 
12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  119 
184 End-to-End_Error  0x0032  100  100  099  Old_age  Always  -  0 
187 Reported_Uncorrect  0x0032  100  100  000  Old_age  Always  -  0 
188 Command_Timeout  0x0032  100  100  000  Old_age  Always  -  0 
189 High_Fly_Writes  0x003a  100  100  000  Old_age  Always  -  0 
190 Airflow_Temperature_Cel 0x0022  068  063  045  Old_age  Always  -  32 (Min/Max 28/33) 
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0 
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  105 
193 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  144 
194 Temperature_Celsius  0x0022  032  040  000  Old_age  Always  -  32 (0 20 0 0 0) 
197 Current_Pending_Sector  0x0012  100  100  000  Old_age  Always  -  0 
198 Offline_Uncorrectable  0x0010  100  100  000  Old_age  Offline  -  0 
199 UDMA_CRC_Error_Count  0x003e  200  200  000  Old_age  Always  -  3 
 
SMART Error Log Version: 1 
No Errors Logged 
 
SMART Self-test log structure revision number 1 
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error 
# 1  Extended offline  Completed without error  00%  6935  - 
# 2  Short offline  Completed without error  00%  6927  - 
# 3  Extended offline  Interrupted (host reset)  00%  5722  - 
# 4  Extended offline  Interrupted (host reset)  00%  5713  - 
# 5  Extended offline  Interrupted (host reset)  00%  5701  - 
# 6  Extended offline  Interrupted (host reset)  00%  5683  - 
# 7  Extended offline  Interrupted (host reset)  00%  5667  - 
# 8  Short offline  Completed without error  00%  5666  - 
 
SMART Selective self-test log data structure revision number 1 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS 
  1  0  0  Not_testing 
  2  0  0  Not_testing 
  3  0  0  Not_testing 
  4  0  0  Not_testing 
  5  0  0  Not_testing 
Selective self-test flags (0x0): 
  After scanning selected spans, do NOT read-scan remainder of disk. 
If Selective self-test is pending on power-up, resume after 0 minute delay.

da6 SMART output

Code:

=== START OF INFORMATION SECTION === 
Model Family:  Western Digital Red 
Device Model:  WDC WD40EFRX-68WT0N0 
Serial Number:  WD-WCC4E0020880 
LU WWN Device Id: 5 0014ee 25e559406 
Firmware Version: 80.00A80 
User Capacity:  4,000,787,030,016 bytes [4.00 TB] 
Sector Sizes:  512 bytes logical, 4096 bytes physical 
Rotation Rate:  5400 rpm 
Device is:  In smartctl database [for details use: -P show] 
ATA Version is:  ACS-2 (minor revision not indicated) 
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) 
Local Time is:  Fri Oct  6 00:03:56 2017 CEST 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 
 
=== START OF READ SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 
 
General SMART Values: 
Offline data collection status:  (0x00) Offline data collection activity 
  was never started. 
  Auto Offline Data Collection: Disabled. 
Self-test execution status:  (  0) The previous self-test routine completed 
  without error or no self-test has ever 
  been run. 
Total time to complete Offline 
data collection:  (55440) seconds. 
Offline data collection 
capabilities:  (0x7b) SMART execute Offline immediate. 
  Auto Offline data collection on/off support. 
  Suspend Offline collection upon new 
  command. 
  Offline surface scan supported. 
  Self-test supported. 
  Conveyance Self-test supported. 
  Selective Self-test supported. 
SMART capabilities:  (0x0003) Saves SMART data before entering 
  power-saving mode. 
  Supports SMART auto save timer. 
Error logging capability:  (0x01) Error logging supported. 
  General Purpose Logging supported. 
Short self-test routine 
recommended polling time:  (  2) minutes. 
Extended self-test routine 
recommended polling time:  ( 554) minutes. 
Conveyance self-test routine 
recommended polling time:  (  5) minutes. 
SCT capabilities:  (0x703d) SCT Status supported. 
  SCT Error Recovery Control supported. 
  SCT Feature Control supported. 
  SCT Data Table supported. 
 
SMART Attributes Data Structure revision number: 16 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0 
  3 Spin_Up_Time  0x0027  171  170  021  Pre-fail  Always  -  8433 
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  246 
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0 
  7 Seek_Error_Rate  0x002e  100  253  000  Old_age  Always  -  0 
  9 Power_On_Hours  0x0032  069  069  000  Old_age  Always  -  23063 
10 Spin_Retry_Count  0x0032  100  100  000  Old_age  Always  -  0 
11 Calibration_Retry_Count 0x0032  100  100  000  Old_age  Always  -  0 
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  243 
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  104 
193 Load_Cycle_Count  0x0032  199  199  000  Old_age  Always  -  3587 
194 Temperature_Celsius  0x0022  121  108  000  Old_age  Always  -  31 
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0 
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0 
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0 
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0 
200 Multi_Zone_Error_Rate  0x0008  200  200  000  Old_age  Offline  -  0 
 
SMART Error Log Version: 1 
No Errors Logged 
 
SMART Self-test log structure revision number 1 
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error 
# 1  Extended offline  Completed without error  00%  22764  - 
# 2  Short offline  Completed without error  00%  22754  - 
# 3  Short offline  Completed without error  00%  8596  - 
# 4  Extended offline  Completed without error  00%  6926  - 
 
SMART Selective self-test log data structure revision number 1 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS 
  1  0  0  Not_testing 
  2  0  0  Not_testing 
  3  0  0  Not_testing 
  4  0  0  Not_testing 
  5  0  0  Not_testing 
Selective self-test flags (0x0): 
  After scanning selected spans, do NOT read-scan remainder of disk. 
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada0 SMART output (this is an older drive)

Code:

=== START OF INFORMATION SECTION === 
Model Family:  Seagate NAS HDD 
Device Model:  ST4000VN000-1H4168 
Serial Number:  Z301NHXV 
LU WWN Device Id: 5 000c50 066dce71a 
Firmware Version: SC44 
User Capacity:  4,000,787,030,016 bytes [4.00 TB] 
Sector Sizes:  512 bytes logical, 4096 bytes physical 
Rotation Rate:  5900 rpm 
Form Factor:  3.5 inches 
Device is:  In smartctl database [for details use: -P show] 
ATA Version is:  ACS-2, ACS-3 T13/2161-D revision 3b 
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) 
Local Time is:  Fri Oct  6 00:12:37 2017 CEST 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 
 
=== START OF READ SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 
 
General SMART Values: 
Offline data collection status:  (0x82) Offline data collection activity 
  was completed without error. 
  Auto Offline Data Collection: Enabled. 
Self-test execution status:  (  0) The previous self-test routine completed 
  without error or no self-test has ever 
  been run. 
Total time to complete Offline 
data collection:  (  107) seconds. 
Offline data collection 
capabilities:  (0x7b) SMART execute Offline immediate. 
  Auto Offline data collection on/off support. 
  Suspend Offline collection upon new 
  command. 
  Offline surface scan supported. 
  Self-test supported. 
  Conveyance Self-test supported. 
  Selective Self-test supported. 
SMART capabilities:  (0x0003) Saves SMART data before entering 
  power-saving mode. 
  Supports SMART auto save timer. 
Error logging capability:  (0x01) Error logging supported. 
  General Purpose Logging supported. 
Short self-test routine 
recommended polling time:  (  1) minutes. 
Extended self-test routine 
recommended polling time:  ( 508) minutes. 
Conveyance self-test routine 
recommended polling time:  (  2) minutes. 
SCT capabilities:  (0x10bd) SCT Status supported. 
  SCT Error Recovery Control supported. 
  SCT Feature Control supported. 
  SCT Data Table supported. 
 
SMART Attributes Data Structure revision number: 10 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate  0x000f  117  099  006  Pre-fail  Always  -  163296560 
  3 Spin_Up_Time  0x0003  093  091  000  Pre-fail  Always  -  0 
  4 Start_Stop_Count  0x0032  100  100  020  Old_age  Always  -  261 
  5 Reallocated_Sector_Ct  0x0033  100  100  010  Pre-fail  Always  -  0 
  7 Seek_Error_Rate  0x000f  087  060  030  Pre-fail  Always  -  634318899 
  9 Power_On_Hours  0x0032  074  074  000  Old_age  Always  -  23109 
10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0 
12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  241 
184 End-to-End_Error  0x0032  100  100  099  Old_age  Always  -  0 
187 Reported_Uncorrect  0x0032  100  100  000  Old_age  Always  -  0 
188 Command_Timeout  0x0032  100  100  000  Old_age  Always  -  3 
189 High_Fly_Writes  0x003a  100  100  000  Old_age  Always  -  0 
190 Airflow_Temperature_Cel 0x0022  069  055  045  Old_age  Always  -  31 (Min/Max 24/32) 
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0 
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  75 
193 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  256 
194 Temperature_Celsius  0x0022  031  045  000  Old_age  Always  -  31 (0 18 0 0 0) 
197 Current_Pending_Sector  0x0012  100  100  000  Old_age  Always  -  0 
198 Offline_Uncorrectable  0x0010  100  100  000  Old_age  Offline  -  0 
199 UDMA_CRC_Error_Count  0x003e  200  200  000  Old_age  Always  -  4 
 
SMART Error Log Version: 1 
No Errors Logged 
 
SMART Self-test log structure revision number 1 
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error 
# 1  Extended offline  Completed without error  00%  9424  - 
# 2  Short offline  Completed without error  00%  9410  - 
 
SMART Selective self-test log data structure revision number 1 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS 
  1  0  0  Not_testing 
  2  0  0  Not_testing 
  3  0  0  Not_testing 
  4  0  0  Not_testing 
  5  0  0  Not_testing 
Selective self-test flags (0x0): 
  After scanning selected spans, do NOT read-scan remainder of disk. 
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada1 SMART output (this is an older drive)

Code:

=== START OF INFORMATION SECTION === 
Model Family:  Seagate NAS HDD 
Device Model:  ST4000VN000-1H4168 
Serial Number:  S300VSWF 
LU WWN Device Id: 5 000c50 0753d5db1 
Firmware Version: SC44 
User Capacity:  4,000,787,030,016 bytes [4.00 TB] 
Sector Sizes:  512 bytes logical, 4096 bytes physical 
Rotation Rate:  5900 rpm 
Form Factor:  3.5 inches 
Device is:  In smartctl database [for details use: -P show] 
ATA Version is:  ACS-2, ACS-3 T13/2161-D revision 3b 
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) 
Local Time is:  Fri Oct  6 00:13:33 2017 CEST 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 
 
=== START OF READ SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 
 
General SMART Values: 
Offline data collection status:  (0x82) Offline data collection activity 
  was completed without error. 
  Auto Offline Data Collection: Enabled. 
Self-test execution status:  (  0) The previous self-test routine completed 
  without error or no self-test has ever 
  been run. 
Total time to complete Offline 
data collection:  (  128) seconds. 
Offline data collection 
capabilities:  (0x7b) SMART execute Offline immediate. 
  Auto Offline data collection on/off support. 
  Suspend Offline collection upon new 
  command. 
  Offline surface scan supported. 
  Self-test supported. 
  Conveyance Self-test supported. 
  Selective Self-test supported. 
SMART capabilities:  (0x0003) Saves SMART data before entering 
  power-saving mode. 
  Supports SMART auto save timer. 
Error logging capability:  (0x01) Error logging supported. 
  General Purpose Logging supported. 
Short self-test routine 
recommended polling time:  (  1) minutes. 
Extended self-test routine 
recommended polling time:  ( 532) minutes. 
Conveyance self-test routine 
recommended polling time:  (  2) minutes. 
SCT capabilities:  (0x10bd) SCT Status supported. 
  SCT Error Recovery Control supported. 
  SCT Feature Control supported. 
  SCT Data Table supported. 
 
SMART Attributes Data Structure revision number: 10 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate  0x000f  111  099  006  Pre-fail  Always  -  37782800 
  3 Spin_Up_Time  0x0003  093  092  000  Pre-fail  Always  -  0 
  4 Start_Stop_Count  0x0032  100  100  020  Old_age  Always  -  257 
  5 Reallocated_Sector_Ct  0x0033  100  100  010  Pre-fail  Always  -  0 
  7 Seek_Error_Rate  0x000f  087  060  030  Pre-fail  Always  -  632648685 
  9 Power_On_Hours  0x0032  074  074  000  Old_age  Always  -  23108 
10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0 
12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  239 
184 End-to-End_Error  0x0032  100  100  099  Old_age  Always  -  0 
187 Reported_Uncorrect  0x0032  100  100  000  Old_age  Always  -  0 
188 Command_Timeout  0x0032  100  099  000  Old_age  Always  -  4 
189 High_Fly_Writes  0x003a  100  100  000  Old_age  Always  -  0 
190 Airflow_Temperature_Cel 0x0022  069  053  045  Old_age  Always  -  31 (Min/Max 24/32) 
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0 
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  77 
193 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  257 
194 Temperature_Celsius  0x0022  031  047  000  Old_age  Always  -  31 (0 18 0 0 0) 
197 Current_Pending_Sector  0x0012  100  100  000  Old_age  Always  -  0 
198 Offline_Uncorrectable  0x0010  100  100  000  Old_age  Offline  -  0 
199 UDMA_CRC_Error_Count  0x003e  200  200  000  Old_age  Always  -  1 
 
SMART Error Log Version: 1 
No Errors Logged 
 
SMART Self-test log structure revision number 1 
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error 
# 1  Extended offline  Completed without error  00%  9433  - 
 
SMART Selective self-test log data structure revision number 1 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS 
  1  0  0  Not_testing 
  2  0  0  Not_testing 
  3  0  0  Not_testing 
  4  0  0  Not_testing 
  5  0  0  Not_testing 
Selective self-test flags (0x0): 
  After scanning selected spans, do NOT read-scan remainder of disk. 
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada2 SMART output (this is an older drive)

Code:

=== START OF INFORMATION SECTION === 
Model Family:  Seagate NAS HDD 
Device Model:  ST4000VN000-1H4168 
Serial Number:  S3019679 
LU WWN Device Id: 5 000c50 0802a8a02 
Firmware Version: SC46 
User Capacity:  4,000,787,030,016 bytes [4.00 TB] 
Sector Sizes:  512 bytes logical, 4096 bytes physical 
Rotation Rate:  5900 rpm 
Form Factor:  3.5 inches 
Device is:  In smartctl database [for details use: -P show] 
ATA Version is:  ACS-2, ACS-3 T13/2161-D revision 3b 
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) 
Local Time is:  Fri Oct  6 00:14:51 2017 CEST 
SMART support is: Available - device has SMART capability. 
SMART support is: Enabled 
 
=== START OF READ SMART DATA SECTION === 
SMART overall-health self-assessment test result: PASSED 
 
General SMART Values: 
Offline data collection status:  (0x82) Offline data collection activity 
  was completed without error. 
  Auto Offline Data Collection: Enabled. 
Self-test execution status:  (  0) The previous self-test routine completed 
  without error or no self-test has ever 
  been run. 
Total time to complete Offline 
data collection:  (  107) seconds. 
Offline data collection 
capabilities:  (0x7b) SMART execute Offline immediate. 
  Auto Offline data collection on/off support. 
  Suspend Offline collection upon new 
  command. 
  Offline surface scan supported. 
  Self-test supported. 
  Conveyance Self-test supported. 
  Selective Self-test supported. 
SMART capabilities:  (0x0003) Saves SMART data before entering 
  power-saving mode. 
  Supports SMART auto save timer. 
Error logging capability:  (0x01) Error logging supported. 
  General Purpose Logging supported. 
Short self-test routine 
recommended polling time:  (  1) minutes. 
Extended self-test routine 
recommended polling time:  ( 485) minutes. 
Conveyance self-test routine 
recommended polling time:  (  2) minutes. 
SCT capabilities:  (0x10bd) SCT Status supported. 
  SCT Error Recovery Control supported. 
  SCT Feature Control supported. 
  SCT Data Table supported. 
 
SMART Attributes Data Structure revision number: 10 
Vendor Specific SMART Attributes with Thresholds: 
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE 
  1 Raw_Read_Error_Rate  0x000f  118  099  006  Pre-fail  Always  -  187246184 
  3 Spin_Up_Time  0x0003  092  091  000  Pre-fail  Always  -  0 
  4 Start_Stop_Count  0x0032  100  100  020  Old_age  Always  -  121 
  5 Reallocated_Sector_Ct  0x0033  100  100  010  Pre-fail  Always  -  0 
  7 Seek_Error_Rate  0x000f  085  060  030  Pre-fail  Always  -  382872866 
  9 Power_On_Hours  0x0032  085  085  000  Old_age  Always  -  13591 
10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0 
12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  102 
184 End-to-End_Error  0x0032  100  100  099  Old_age  Always  -  0 
187 Reported_Uncorrect  0x0032  100  100  000  Old_age  Always  -  0 
188 Command_Timeout  0x0032  100  001  000  Old_age  Always  -  472453766982 
189 High_Fly_Writes  0x003a  100  100  000  Old_age  Always  -  0 
190 Airflow_Temperature_Cel 0x0022  071  064  045  Old_age  Always  -  29 (Min/Max 23/30) 
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0 
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  87 
193 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  127 
194 Temperature_Celsius  0x0022  029  040  000  Old_age  Always  -  29 (0 17 0 0 0) 
197 Current_Pending_Sector  0x0012  100  100  000  Old_age  Always  -  0 
198 Offline_Uncorrectable  0x0010  100  100  000  Old_age  Offline  -  0 
199 UDMA_CRC_Error_Count  0x003e  200  190  000  Old_age  Always  -  63497 
 
SMART Error Log Version: 1 
No Errors Logged 
 
SMART Self-test log structure revision number 1 
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error 
# 1  Extended offline  Completed without error  00%  10326  - 
# 2  Short offline  Completed without error  00%  10308  - 
# 3  Extended offline  Completed without error  00%  236  - 
# 4  Short offline  Completed without error  00%  229  - 
 
SMART Selective self-test log data structure revision number 1 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS 
  1  0  0  Not_testing 
  2  0  0  Not_testing 
  3  0  0  Not_testing 
  4  0  0  Not_testing 
  5  0  0  Not_testing 
Selective self-test flags (0x0): 
  After scanning selected spans, do NOT read-scan remainder of disk. 
If Selective self-test is pending on power-up, resume after 0 minute delay.

arameen · Oct 5, 2017

ada3 SMART output (this is an older drive)

Code:

=== START OF INFORMATION SECTION ===
Model Family:  Seagate NAS HDD
Device Model:  ST4000VN000-1H4168
Serial Number:  S3019PGW
LU WWN Device Id: 5 000c50 0804fa3a4
Firmware Version: SC46
User Capacity:  4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5900 rpm
Form Factor:  3.5 inches
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:  Fri Oct  6 00:16:05 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
  was completed without error.
  Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (  107) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  1) minutes.
Extended self-test routine
recommended polling time:  ( 501) minutes.
Conveyance self-test routine
recommended polling time:  (  2) minutes.
SCT capabilities:  (0x10bd) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x000f  118  099  006  Pre-fail  Always  -  193620432
  3 Spin_Up_Time  0x0003  093  092  000  Pre-fail  Always  -  0
  4 Start_Stop_Count  0x0032  100  100  020  Old_age  Always  -  173
  5 Reallocated_Sector_Ct  0x0033  100  100  010  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x000f  087  060  030  Pre-fail  Always  -  490358292
  9 Power_On_Hours  0x0032  080  080  000  Old_age  Always  -  18337
10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  164
184 End-to-End_Error  0x0032  100  100  099  Old_age  Always  -  0
187 Reported_Uncorrect  0x0032  100  100  000  Old_age  Always  -  0
188 Command_Timeout  0x0032  100  099  000  Old_age  Always  -  1
189 High_Fly_Writes  0x003a  100  100  000  Old_age  Always  -  0
190 Airflow_Temperature_Cel 0x0022  068  058  045  Old_age  Always  -  32 (Min/Max 26/33)
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  59
193 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  179
194 Temperature_Celsius  0x0022  032  042  000  Old_age  Always  -  32 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012  100  100  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0010  100  100  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x003e  200  200  000  Old_age  Always  -  5

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline  Completed without error  00%  4677  -
# 2  Short offline  Completed without error  00%  4663  -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

da7 no output. this drive is propably already failing and dying.
FreeNAS is warning about this drive constantly saying "FAILED SMART self-check. BACK UP DATA NOW!" & "Failed SMART usage Attribute: 10 Spin_Retry_Count."
this is a Seagate Ironwolf NAS 8TB. I have a few more of those here, bought them recently. But FreeNAS had issues with all this drives so far.

Zpool status:

Code:



	
	
		
			root@freenas ~]# zpool status -v Secondary_Raidz3																				
  pool: Secondary_Raidz3																											
state: DEGRADED																													
status: One or more devices is currently being resilvered.  The pool will															
		 continue to function, possibly in a degraded state.																		
action: Wait for the resilver to complete.																						
  scan: resilver in progress since Thu Oct  5 00:21:05 2017																		
		 26.9T scanned out of 34.8T at 282M/s, 8h11m to go																			
		 15.5G resilvered, 77.25% done																								
config:																															
																																	
		 NAME											STATE	 READ WRITE CKSUM												
		 Secondary_Raidz3								DEGRADED	 0	 0 6.82K												
		   raidz3-0									  DEGRADED	 0	 0 27.3K												
			 17620392916775898278						UNAVAIL	  0	 0	 0  was /dev/gptid/8275e396-a83c-11e7-9cee-002590f5b
804																																
			 gptid/3a44142c-931c-11e7-b895-002590f5b804  ONLINE	   0	 0	 0  (resilvering)								
			 gptid/33c047e7-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 gptid/34749735-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 6370505857967419013						 OFFLINE	  0	 0	 0  was /dev/gptid/3536bf51-2292-11e7-9626-002590f5b
804																																
			 gptid/35e2d6ec-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 gptid/368b679d-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 gptid/3730ee56-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 gptid/37de7e53-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
			 replacing-9								 UNAVAIL	  0	 0	 0												
			   5660221525628801207					   UNAVAIL	  0	 0	 0  was /dev/da8p2								
			   10093850100708201031					  UNAVAIL	  0	 0	 0  was /dev/gptid/4f5f3806-a952-11e7-a2e0-002590f5b
804																																
			 gptid/39778368-2292-11e7-9626-002590f5b804  DEGRADED	 0	 0	 0  too many errors								
																																	
errors: Permanent errors have been detected in the following files:																
																																	
		Secondary_Raidz3:<0x0>

question is, once i wake up tomorrow and the resilvering hopefully is done. i guess I should replace da7 ?
offline it from the GUI, remove it, insert a new one and replace ?
if i can not offline it from the GUI, how to proceed ?
or should i do something else ?

Important Announcement for the TrueNAS Community.

Hard Drive Troubleshooting - Massive Failures - Need Help Isolating the Problem(s)

Contributor

Server Wrangler

Guru

Sweet'NASty

Wizard

Contributor

Sweet'NASty

Contributor

Sweet'NASty

Contributor

Sweet'NASty

Sweet'NASty

Guru

Contributor

Contributor

Guru

Contributor

Guru

Contributor

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Hard Drive Troubleshooting - Massive Failures - Need Help Isolating the Problem(s)"

Similar threads