Disk I/O and ARC Requests Spike every morning before office opens

Status
Not open for further replies.

rbabich

Dabbler
Joined
Jul 28, 2014
Messages
14
Build FreeNAS-9.3-STABLE-201605170422
Platform Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Memory 7974MB


This has seemed to started in the last few weeks, or at least that is when people have started to complain.
People arrive to office at 730am, they are not able to reach shared drives in freenas. It tries to connect then times out.

Looking at the reports I see a spike in Disk I/O and ARC Requests starting around 6:30am, well before anyone is in the office.
SdBA2mZ.jpg


Z6WdWEY.jpg


This is what our traffic usually looks like:
lJoKuGG.jpg

hlqdoEL.jpg


My backups finish around 2 in the morning, well before these issues arise.

I am not sure where to start looking on this one. Been digging through the forums and have not seen anyone else with this issue.
Any suggestions would be greatly appreciated.


Narwhal bacons at 12:00am
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
I am not sure what the meaning of this is?

“The Narwhal Bacons at Midnight” is a catchphrase that was created for Redditors to identify themselves in public places. It is used in fanart, rage comics, and is often referenced as an inside joke in Reddit threads.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Sorry, I have 1.3 Tib
I wonder how you actually have the pool configured. Can you post the output of zpool status?

It would help if you posted the output in [ code ] tags so it looks like this:
Code:
pool: Vol1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
		continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Sep 4 17:01:17 2017
		76.7G scanned out of 5.05T at 90.5M/s, 16h0m to go
		18.3G resilvered, 1.48% done
config:

		NAME											 STATE	READ WRITE C									KSUM
		Vol1											 DEGRADED	0	0 8									7.3K
		 raidz2-0										DEGRADED	0	0									 175K
			gptid/bd936cf5-9894-11e6-b64f-d05099c0c5b3	ONLINE	 0	0										0
			gptid/be3e1755-9894-11e6-b64f-d05099c0c5b3	ONLINE	 0	0										0
			gptid/bef699fe-9894-11e6-b64f-d05099c0c5b3	ONLINE	 0	0										0
			gptid/bfe92d1c-9894-11e6-b64f-d05099c0c5b3	ONLINE	 0	0										0
			gptid/c0cf3a07-9894-11e6-b64f-d05099c0c5b3	ONLINE	 0	0										0

errors: Permanent errors have been detected in the following files:
 

rbabich

Dabbler
Joined
Jul 28, 2014
Messages
14
here we go.
Thanks.

Code:
[root@xxxxxshares ~]# zpool status											 
  pool: xxxxxshare															 
state: ONLINE																 
  scan: scrub repaired 0 in 2h7m with 0 errors on Sun Oct 22 02:07:39 2017	 
config:																		 
																				
		NAME												   STATE	   READ WRITE CKSUM
				xxxxxshare									ONLINE	   0	 0	 0
gptid/e740e9c0-f54f-11e6-a344-005056a8c07f  ONLINE	   0	 0	 0
																				
errors: No known data errors													
																				
pool: freenas-boot															
state: ONLINE																 
 scan: scrub repaired 0 in 0h0m with 0 errors on Fri Oct 27 03:45:43 2017	 
config:																		 
																				
		NAME		STATE	 READ WRITE CKSUM								 
		freenas-boot  ONLINE	   0	 0	 0								
		  da0p2	 ONLINE	   0	 0	 0								 
																				
errors: No known data errors													

 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
When are your SMART tests scheduled to run?
Since there is only one drive in the pool, it is limited to the performance of one drive, which could make the system slow.
 

rbabich

Dabbler
Joined
Jul 28, 2014
Messages
14
When are your SMART tests scheduled to run?
Since there is only one drive in the pool, it is limited to the performance of one drive, which could make the system slow.

I didnt think about SMART let me check that right now.
during the day speed and connectivity is not an issue, just between like 6 and 8 am.
 

rbabich

Dabbler
Joined
Jul 28, 2014
Messages
14
When are your SMART tests scheduled to run?
Since there is only one drive in the pool, it is limited to the performance of one drive, which could make the system slow.

I had left the SMART test interval at the default. It does a short test every Sunday.
 

chris crude

Patron
Joined
Oct 13, 2016
Messages
210
What are your hard drives attached to if you say you have 3? Some kind of RAID card?
 

chris crude

Patron
Joined
Oct 13, 2016
Messages
210
I'm not as experienced as most around here, but it looks like your PERC card is not setup to passthrough your drives. Chris Moore Zpool status shows all his drives, yours shows a single drive even though you have 3.
FreeNAS wants control of all the discs, not a RAID pool given to it.
I would say you have some conflict between the card and FreeNAS, but you say it just started and you your version number tells me an update didnt cause this so i'm not sure.
 

Waco

Explorer
Joined
Dec 29, 2014
Messages
53
Don't put a RAID behind ZFS. It essentially removes any chance you have of fixing drive errors / data errors via the built-in mechanisms in ZFS.

If I had to guess, your RAID controller is running consistency checks a lot more often than you realize. They run in the background but they can intrude on user IO if they're not setup properly. Default scan intervals for a PERC card are once a week both for a consistency check and a patrol read (I can't remember how they're spaced out though).

Any user cronjobs that start up around that time?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You will lose all you data one day because you are using raid with zfs. It will also make performance terrible. You should try reading the manual and best practices because you need to rebuild this system.
 

Waco

Explorer
Joined
Dec 29, 2014
Messages
53
You will lose all you data one day because you are using raid with zfs. It will also make performance terrible. You should try reading the manual and best practices because you need to rebuild this system.
It doesn't necessarily make performance terrible, but it certainly doesn't help for IO scheduling.
 
Status
Not open for further replies.
Top