Failed disk plus unbootable OS USB thumbdrive lead to incorrect iSCSI extents and more

Status
Not open for further replies.

klennan

Cadet
Joined
Aug 2, 2018
Messages
8
I've managed to develop several problems all at once.

1 of 6 3TB disks failed out of a RAIDz1. I ordered a new disk.
In the meantime I thought I'd replace the OS thumb drives with faster Patriot 32GB ones. I had it mirrored, so I replaced one -no issues- then replaced the other a few days later, but FreeNAS (11 u4) wasn't happy. It also rejected the other original drive. So I was left with only the new single Patriot.

New 3TB disk arrived, so I power down, swap disks, power on - no bootable drive found. No matter what, I can't convince the machine to boot that new Patriot drive. I try the previous thumb drives with no luck either. I don't know - does FreeNAS wipe them when it replaces them?

FINALLY last night, I get the PC to boot a newly imaged Patriot, the 2nd one, I've left the first one unmodified. "newest" backup I had was from March. So I load that, but that was before I had iSCSI extents configured. It was 11 u1, I then had it upgrade to 11 u5, but instead it sucked down 11.2 BETA (or STABLE depending on which output I believe). Whatever, it boots.

That new 3TB disk fails to gpart -s gpt - "gpart: geom 'ada0': File exists" when I've written nothing to it, and "gpart: Input/output error" when using gpart destroy -F /dev/ada0
Running the destroy command a 2nd time produces "gpart: Device not configured" So that's left me with no redundancy.

Code:
I have a SSD in it which was home to a iSCSI extent, presented to a Windows server, and contains (hopefully) a few virtual machines. However, it shows a 25GB partition when it should be 250GB. Windows is confused. I might have now lost those VMs.

On the RAIDz, I have 3 other iSCSI extents - WSUS, Veeam Server, and Veeam Client. The WSUS disk recreated just fine, presented, recognized, it's happy. It was the one I cared least about.
Veeam Client - backups of my main desktop, no biggie if I lost it. But it presents and shows no partitions.
Veeam Server - backups of those VMs on the ssd, plus lots of others (homelab). It presents with a 25GB partition (RAW if I online it), when it should be 3TB.  

-- SOLVED!
These three extents had been created as 4k blocks and presented that way. I went through various options until I deleted the extent (not underlying file) and recreated while selecting 4096 for the block size. Presented & attached them, and it's all good there!

So I've got ONE issue remaining - the 3TB hard disk that wont partition for FreeNAS.

please help :(
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Your first time posting, and you just joined today. Welcome to the forum. I sure wish you would have come along earlier. We have a lot of resources here that could have helped you.
I've managed to develop several problems all at once.
I am really sorry you have run into a bit of trouble.
Running the destroy command a 2nd time produces "gpart: Device not configured" So that's left me with no redundancy.
Which is why we have been suggesting RAIDz2 as a minimum for many years here on the forum. particularly with drives larger than 1TB.
That new 3TB disk fails to gpart -s gpt - "gpart: geom 'ada0': File exists" when I've written nothing to it, and "gpart: Input/output error" when using gpart destroy -F /dev/ada0
Are you doing this from the command line or doing a replace in the GUI, like you should.?
So I've got two issues - the 3TB hard disk that won't partition for FreeNAS,
Lest deal with the healthy pool issue first because that is probably more important in the long run. The iSCSI extents may be lost or not, but if the pool is lost, they are gone for sure.
The drive should not be partitioned before you use the GUI to execute the replace. The GUI partitions the drive. Since you have tried to do it manually, you may have created a problem with the partition table. If you look at the documentation:
http://doc.freenas.org/11/storage.html#view-disks
In the 'View Disks' tab you should be able to select the new disk and click the 'Wipe' button at the bottom of the page. That should set the disk straight. Then you need to do a replace within the GUI. Here is a guide:
https://forums.freenas.org/index.php?resources/replacing-a-failed-failing-disk.75/

PS. You should always do burn-in testing on a drive prior to adding it to the pool. Here is a guide:
https://forums.freenas.org/index.php?resources/hard-drive-burn-in-testing.92/
 

klennan

Cadet
Joined
Aug 2, 2018
Messages
8
Hi Chris, thanks for responding.

Your first time posting, and you just joined today. Welcome to the forum. I sure wish you would have come along earlier. We have a lot of resources here that could have helped you.

Long time lurker, first time poster.

Which is why we have been suggesting RAIDz2 as a minimum for many years here on the forum. particularly with drives larger than 1TB.

I didn't want to sacrifice that much space.

Are you doing this from the command line or doing a replace in the GUI, like you should.?

I do everything through the GUI until the GUI doesn't work. In this case -
upload_2018-8-2_12-29-44.png


results in -
upload_2018-8-2_12-30-28.png


Which drove me to the command line to investigate deeper.

And
upload_2018-8-2_12-32-13.png


results in
upload_2018-8-2_12-32-42.png
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I didn't want to sacrifice that much space.
I understand, and my first pool in FreeNAS was RAIDz1, but you are living on the raw edge of loosing the pool with a single drive failure.
For just $77 more, you could have had RAIDz2 and still have a drive of redundancy:
https://www.ebay.com/itm/HP-Seagate...-3-5-SATA-6-0GB-s-Enterprise-HDD/153070795097
I built several pools over the years using re-purposed drives. I even built a RAIDz3 at one point, but I figure RAIDz2 is good enough since I have backups in place.
I do everything through the GUI until the GUI doesn't work. In this case -
Where did you get the replacement disk?
New 3TB disk arrived
If it already had something on it, which is what is indicated in the error, it must have not been exactly as 'new' as might otherwise be hoped for.
It may be defective or it might be configured for some specialty purpose.
 

klennan

Cadet
Joined
Aug 2, 2018
Messages
8
The new disk came from Newegg, marked as new. It's a Seagate Ironport.

I'm running through the burn in test suggestions. So far it's putting out these numbers - however I don't know if they're a carryover from the failed drive, as it's plugged into the same port.
Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   100   100   006	Pre-fail  Always	   -	   151632
  3 Spin_Up_Time			0x0003   095   095   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   6
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   100   253   030	Pre-fail  Always	   -	   597
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   18
 10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   6
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   099   000	Old_age   Always	   -	   6
189 High_Fly_Writes		 0x003a   100   100   000	Old_age   Always	   -	   0
190 Airflow_Temperature_Cel 0x0022   073   073   045	Old_age   Always	   -	   27 (Min/Max 25/27)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   6
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   6
194 Temperature_Celsius	 0x0022   027   040   000	Old_age   Always	   -	   27 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   178   008   000	Old_age   Always	   -	   332


And if you didn't see, I managed to get my iSCSI extent issue sorted out!
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
-- SOLVED!
These three extends had been created as 4k blocks and presented that way. I went through various options until I deleted the extent (not underlying file) and recreated while selecting 4096 for the block size. Presented & attached them, and it's all good there!
That is excellent news. I thought that might be a matter of a setting.
The new disk came from Newegg, marked as new.
I am looking at the numbers and they don't look bad. Part of the burn-in process should wipe the disk. I am curious what might have been on the disk though.
What is the model number?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
PS. If you enclose the text like that in code tags, it is usually easier to read:
Code:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   100   100   006	Pre-fail  Always	   -	   151632
  3 Spin_Up_Time			0x0003   095   095   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   6
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   100   253   030	Pre-fail  Always	   -	   597
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   18
10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   6
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   099   000	Old_age   Always	   -	   6
189 High_Fly_Writes		 0x003a   100   100   000	Old_age   Always	   -	   0
190 Airflow_Temperature_Cel 0x0022   073   073   045	Old_age   Always	   -	   27 (Min/Max 25/27)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   6
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   6
194 Temperature_Celsius	 0x0022   027   040   000	Old_age   Always	   -	   27 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   178   008   000	Old_age   Always	   -	   332
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Seek_Error_Rate and Raw_Read_Error_Rate are numbers that are constantly increasing on Seagate drives because of the way the number is reported. It is actually two numbers truncated together and it isn't as bad as it looks.
 
Last edited:

klennan

Cadet
Joined
Aug 2, 2018
Messages
8
Code:
root@Shockwave:/mnt/SSD/iSCSI # badblocks -b 4096 -ws /dev/ada0
badblocks: Operation not permitted while trying to open /dev/ada0


Not sure if that's indicative of something.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
It isn't good.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I hate to say it, but at this point I would suggest putting it in a windows system and see if you can partition and format it there.
 

klennan

Cadet
Joined
Aug 2, 2018
Messages
8
Windows didn't complain about the disk. Created & formatted partitions, no problem. FreeNAS still refuses to do anything with it.
upload_2018-8-6_8-46-46.png


upload_2018-8-6_8-47-31.png


This is the odd part:
Code:
root@Shockwave:~ # gpart show ada0
=>		40  5860533088  ada0  GPT  (2.7T)
		  40  5860533088		- free -  (2.7T)

root@Shockwave:~ # gpart destroy -F ada0
gpart: Operation not permitted
root@Shockwave:~ # gpart show ada0
=>		 0  5860533168  ada0  (none)  (2.7T)
		   0  5860533168		- free -  (2.7T)

root@Shockwave:~ # gpart create -s gpt /dev/ada0
gpart: Operation not permitted
root@Shockwave:~ # gpart show ada0
=>		40  5860533088  ada0  GPT  (2.7T)
		  40  5860533088		- free -  (2.7T)

The CLI claims the commands aren't working, but 'gpart show' says otherwise.

I'm a few clicks away from sending this thing back to Newegg.
 

Attachments

  • upload_2018-8-6_7-48-35.png
    upload_2018-8-6_7-48-35.png
    101.7 KB · Views: 399

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I'm a few clicks away from sending this thing back to Newegg.
Before you do that...
I see that you are running BETA2. Did you have a BETA1 or some other previous installation of FreeNAS that you can roll back to? I would like to test and see if the disk replace will work there before we call it a bad disk because I have been fighting issues all weekend that appear to be related to the BETA software.
FINALLY last night, I get the PC to boot a newly imaged Patriot, the 2nd one, I've left the first one unmodified. "newest" backup I had was from March. So I load that, but that was before I had iSCSI extents configured. It was 11 u1, I then had it upgrade to 11 u5, but instead it sucked down 11.2 BETA (or STABLE depending on which output I believe). Whatever, it boots.
I know that your configuration information will not be there, but if we can boot in a previous version of FreeNAS and get the replace to work, the pool status should be healthy again and then you should be able to boot back into the installation of FreeNAS that includes your configuration for iSCSI extents and all... It would also help to know if the problem with replacing a disk is due to the BETA2 release because other people have been having other problems with it, like not being able to delete files without crashing the system.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Windows didn't complain about the disk. Created & formatted partitions, no problem.
I hate to keep throwing things at you when you have not had a chance to reply yet, but we could also try using diskpart in Windows to delete the partition table. This is command line stuff that isn't widely known, so here is a link to how, just make sure you have the correct disk selected:

http://knowledge.seagate.com/articles/en_US/FAQ/005929en
 

klennan

Cadet
Joined
Aug 2, 2018
Messages
8
Before you do that...
Too late. I submitted the RMA request, and ordered the 4TB disk you suggested earlier. Minus the restocking fee + shipping, I might come out a couple dollars ahead and end up with a better disk.

I see that you are running BETA2. Did you have a BETA1 or some other previous installation of FreeNAS that you can roll back to?
I didn't have any Beta1 installs, but I've got 11 U5 installing now. Fingers crossed this thing comes back to life with it.
(udpate) So that was a bad idea. I hadn't applied the new ZFS options, figured my stuff would be safe. What I didn't think about was the database upgrade from 11.1 to 11.2. The downgrade boots, but complains about columns not found in the db at every turn.
 
Last edited:

klennan

Cadet
Joined
Aug 2, 2018
Messages
8
Got my new 4TB disk in and installed. FreeNAS complained about ZFS labels existing just like the last disk, however this time with the Force option enabled, it accepted the disk as a replacement. My array is now "healthy."
Using 11.2 Beta2.
 
Status
Not open for further replies.
Top