Slow spa_sync on reboot, FreeNAS can't boot

Status
Not open for further replies.

guglez

Explorer
Joined
Feb 21, 2014
Messages
56
Hi,

I have an HP MicroServer n54L with 10 Gb of ECC RAM.
I have 4 drives and two pools.

2x1Tb WD black RE stripe
2x3Tb WD RED mirror

Few days ago I upgraded ZFS pools via GUI. Then after several hours server hanged and I did a power cycle reset. I was able to open the GUI but I was unable to perform any actions there. I also tried to execute reboot command over SSH but had no luck. Now it can't boot. Screenshot is attached. I removed 2x1Tb drives but still have this issue.

What can I do here?
 

Attachments

  • Снимок экрана 2018-01-02 в 10.37.39.png
    Снимок экрана 2018-01-02 в 10.37.39.png
    805.7 KB · Views: 280
Last edited:

guglez

Explorer
Joined
Feb 21, 2014
Messages
56
Dunno what was that. But after a 5th reboot it was able to mount the pool. I have an alert regarding the health of my disks.

What was that? How to explain this behaviour?
Code:


ADA0
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   643
  3 Spin_Up_Time			0x0027   177   174   021	Pre-fail  Always	   -	   6108
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   123
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   070   070   000	Old_age   Always	   -	   22188
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   122
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   37
193 Load_Cycle_Count		0x0032   001   001   000	Old_age   Always	   -	   1659597
194 Temperature_Celsius	 0x0022   121   110   000	Old_age   Always	   -	   29
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   1
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   100   253   000	Old_age   Offline	  -	   0

ADA1
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0
  3 Spin_Up_Time			0x0027   174   172   021	Pre-fail  Always	   -	   6291
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   121
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   070   070   000	Old_age   Always	   -	   22188
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   121
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   37
193 Load_Cycle_Count		0x0032   001   001   000	Old_age   Always	   -	   1690765
194 Temperature_Celsius	 0x0022   119   109   000	Old_age   Always	   -	   31
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   100   253   000	Old_age   Offline	  -	   0



ADA2
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   199   199   051	Pre-fail  Always	   -	   83080
  3 Spin_Up_Time			0x0027   168   167   021	Pre-fail  Always	   -	   4566
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   171
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   042   042   000	Old_age   Always	   -	   42818
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   170
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   69
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   101
194 Temperature_Celsius	 0x0022   112   097   000	Old_age   Always	   -	   35
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   55
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   3
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   184   000	Old_age   Offline	  -	   20


ADA3
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   3
  3 Spin_Up_Time			0x0027   171   169   021	Pre-fail  Always	   -	   4425
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   171
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   053   053   000	Old_age   Always	   -	   34455
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   170
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   69
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   101
194 Temperature_Celsius	 0x0022   113   099   000	Old_age   Always	   -	   34
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   2

 

Attachments

  • Снимок экрана 2018-01-02 в 15.34.30.png
    Снимок экрана 2018-01-02 в 15.34.30.png
    72.7 KB · Views: 283

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Are you on 11.1?

Your drives have very high load cycle counts, meaning you haven't run WDIDLE3.EXE on them.

Do you have scrubs and SMART tests set with email notification to you?
 

guglez

Explorer
Joined
Feb 21, 2014
Messages
56
Are you on 11.1?

Your drives have very high load cycle counts, meaning you haven't run WDIDLE3.EXE on them.

Do you have scrubs and SMART tests set with email notification to you?

Yes, 11.1
I have email notifications after every scrub or in case of an error. Smart tests are disabled. Yes I never had a chance to run wdidle on my drives. As far as I understand it's not possible to tun it under FreeNAS.
 
Last edited:

rs225

Guru
Joined
Jun 28, 2014
Messages
878
You should immediately run SMART short tests on your drives, and schedule automatic tests every month.

Any drives that you don't replace should have WDIDLE3.EXE run on them.
 

guglez

Explorer
Joined
Feb 21, 2014
Messages
56
My happiness was not long enough. Tonight server hanged again. How can I determine the cause? It's responding to pings. SSH is asking for a password but then nothing happens. On the console I just see a messages regarding pending sectors on my hard drives. Web interface is not loading but the port is open.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
In your case, I don't know for sure. There have been a few reports of non-responsive/hung system on 11.1.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
You should try to file a bug report. If not, the problem is probably already under investigation.

Make sure you watch your drive health also.
 
Status
Not open for further replies.
Top