Very slow space reclamation

Status
Not open for further replies.

sandaco

Dabbler
Joined
Jan 15, 2017
Messages
10
Hi Gurus,
I have a Raidz-2 X 3 array built out of enterprise 8TB drives in a SAS enclosure connected to SAS 3008 adapter. Drives seem to be healthy according to smart
After extensive snapshot cleanup and removal of a child dataset and a reboot, It takes a very long time to import the pool. after the import there is a very small amount of Read activity on all the drives and the amount of free space does not increase. The pool reports it has to free 44.1 TB , but the progress is very slow. There is no/minimal processor usage on the server ( 24 cores, 2.4 GHZ , 48GB RAM ) .

I have experimented with vfs.zfs.vdev settings to see if I can improve concurrent IO but they have no effect and IO is limited to ~100 IOPS distributed across the drives 4-5 IOPS per each drive.


Is there a Tunable to improve disk utilization and speedup space reclamation ?



Code:
 zpool status MIR
  pool: MIR
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
		continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jan 27 19:57:06 2018
		0 scanned at 0/s, 0 issued at 0/s, 157T total
		0 resilvered, 0.00% done, no estimated completion time
config: 

		NAME											STATE	 READ WRITE CKSUM
		MIR											 ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/66e02f58-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/674d5c05-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/67bfe02b-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/683dae34-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/68ac69a1-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/691c2fb8-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6990bb62-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6a02b3e3-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
		  raidz2-1									  ONLINE	   0	 0	 0
			gptid/6a7741b5-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6ae8e468-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6b5fea1d-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6bd92302-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6c50b6e6-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6cc71f8c-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6d3d3c2b-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
			gptid/6db228f7-8b45-11e5-848b-90e2ba2bc0b8  ONLINE	   0	 0	 0
		  raidz2-2									  ONLINE	   0	 0	 0
			gptid/4a2bb2e3-876f-11e6-b6b8-7845c4f283a4  ONLINE	   0	 0	 0
			gptid/4aa8bb63-876f-11e6-b6b8-7845c4f283a4  ONLINE	   0	 0	 0
			gptid/4b2825b5-876f-11e6-b6b8-7845c4f283a4  ONLINE	   0	 0	 0
			gptid/4ba9b48e-876f-11e6-b6b8-7845c4f283a4  ONLINE	   0	 0	 0
			gptid/4c235dd0-876f-11e6-b6b8-7845c4f283a4  ONLINE	   0	 0	 0
			gptid/4ca161b5-876f-11e6-b6b8-7845c4f283a4  ONLINE	   0	 0	 0
			gptid/4d1e660f-876f-11e6-b6b8-7845c4f283a4  ONLINE	   0	 0	 0
			gptid/4d9b30f9-876f-11e6-b6b8-7845c4f283a4  ONLINE	   0	 0	 0



Code:
# zfs list MIR
NAME   USED  AVAIL  REFER  MOUNTPOINT
MIR	131T  9.67T  1.54M  /MIR


Code:
 # zpool get all  MIR
NAME  PROPERTY					   VALUE						  SOURCE
MIR   size						   174T						   -
MIR   capacity					   88%							-
MIR   altroot						-							  default
MIR   health						 ONLINE						 -
MIR   guid						   12821844164206093981		   default
MIR   version						-							  default
MIR   bootfs						 -							  default
MIR   delegation					 on							 default
MIR   autoreplace					off							default
MIR   cachefile					  -							  default
MIR   failmode					   continue					   local
MIR   listsnapshots				  off							default
MIR   autoexpand					 on							 local
MIR   dedupditto					 0							  default
MIR   dedupratio					 1.25x						  -
MIR   free						   19.7T						  -
MIR   allocated					  154T						   -
MIR   readonly					   off							-
MIR   comment						-							  default
MIR   expandsize					 -							  -
MIR   freeing						44.1T						  default
MIR   fragmentation				  0%							 -
MIR   leaked						 0							  default
MIR   feature@async_destroy		  active						 local
MIR   feature@empty_bpobj			active						 local
MIR   feature@lz4_compress		   active						 local
MIR   feature@multi_vdev_crash_dump  enabled						local
MIR   feature@spacemap_histogram	 active						 local
MIR   feature@enabled_txg			active						 local
MIR   feature@hole_birth			 active						 local
MIR   feature@extensible_dataset	 active						 local
MIR   feature@embedded_data		  active						 local
MIR   feature@bookmarks			  enabled						local
MIR   feature@filesystem_limits	  enabled						local
MIR   feature@large_blocks		   active						 local
MIR   feature@sha512				 enabled						local
MIR   feature@skein				  enabled						local

 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Huge dataset deletes happen asynchronous in the background. If you reboot, it hanha at import until that delete finishes, you shouldn't reboot a system that is deleting. Everything should be normal now that the import is finished.

Messing with sysctl and tunables almost never gets people what they want because they don't understand the settings.
 

sandaco

Dabbler
Joined
Jan 15, 2017
Messages
10
Yes, the background space reclamation happens in background, I've been monitoring that.
It seems that after a reboot the volume actually gets imported ( after about 5-10 minutes ) , so the space reclamation does not block it. I have been tracking space reclamation, and It has freed over 10 TB ,
I used a simple script to track the freeing process, the machine took about 5.5 hours to reclaim the first ~7 TB , but now it's going much faster , freeing ~1TB every couple of minuets.

Code:
while ( 1 )
echo `date` `zpool get -p -o property,value freeing MIR| tail -n 1 ` >> freeing_report.txt
sleep 60
end

The questions are: why is is extremely slow at the beginning ? and ( if anything) what can be done to improve it ?
Code:
Wed Jan 31 09:31:04 UTC 2018 freeing 48399387149568
....
Wed Jan 31 14:56:55 UTC 2018 freeing 40024879780704
Wed Jan 31 14:57:55 UTC 2018 freeing 39814475151360
Wed Jan 31 14:58:55 UTC 2018 freeing 39594954942816
Wed Jan 31 14:59:55 UTC 2018 freeing 39388459219200
Wed Jan 31 15:00:55 UTC 2018 freeing 39164775485472
Wed Jan 31 15:01:55 UTC 2018 freeing 38995021490880
Wed Jan 31 15:02:55 UTC 2018 freeing 38995021490880
Wed Jan 31 15:03:55 UTC 2018 freeing 38995021490880
Wed Jan 31 15:04:55 UTC 2018 freeing 38835870807552
Wed Jan 31 15:05:55 UTC 2018 freeing 38811553033344
Wed Jan 31 15:06:55 UTC 2018 freeing 38676466946208
Wed Jan 31 15:07:55 UTC 2018 freeing 38676466946208
Wed Jan 31 15:08:55 UTC 2018 freeing 38557146132960
Wed Jan 31 15:09:55 UTC 2018 freeing 38410422252384
Wed Jan 31 15:10:55 UTC 2018 freeing 38293786221600
Wed Jan 31 15:11:55 UTC 2018 freeing 38192557192608
Wed Jan 31 15:12:55 UTC 2018 freeing 38083761896544

Code:
Wed Jan 31 15:13:55 UTC 2018 freeing 37857629558112
Wed Jan 31 15:14:55 UTC 2018 freeing 37646225329440
Wed Jan 31 15:15:55 UTC 2018 freeing 37448160081696
Wed Jan 31 15:16:56 UTC 2018 freeing 37127908053600
Wed Jan 31 15:17:56 UTC 2018 freeing 37127908053600
Wed Jan 31 15:18:56 UTC 2018 freeing 37127908053600
Wed Jan 31 15:19:56 UTC 2018 freeing 37127908053600
Wed Jan 31 15:20:56 UTC 2018 freeing 37127908053600
Wed Jan 31 15:21:56 UTC 2018 freeing 37127908053600
Wed Jan 31 15:22:56 UTC 2018 freeing 37127908053600
Wed Jan 31 15:23:56 UTC 2018 freeing 36984932135424
Wed Jan 31 15:24:56 UTC 2018 freeing 36720284904576
Wed Jan 31 15:25:56 UTC 2018 freeing 36720284904576
Wed Jan 31 15:26:56 UTC 2018 freeing 36720284904576
Wed Jan 31 15:27:56 UTC 2018 freeing 36720284904576
Wed Jan 31 15:28:56 UTC 2018 freeing 36720284904576
Wed Jan 31 15:29:56 UTC 2018 freeing 36720284904576
Wed Jan 31 15:30:56 UTC 2018 freeing 36720284904576
Wed Jan 31 15:31:57 UTC 2018 freeing 36720284904576
Wed Jan 31 15:32:57 UTC 2018 freeing 36720284904576
Wed Jan 31 15:33:57 UTC 2018 freeing 36720284904576
Wed Jan 31 15:34:57 UTC 2018 freeing 36720284904576
Wed Jan 31 15:35:57 UTC 2018 freeing 36720284904576
Wed Jan 31 15:36:57 UTC 2018 freeing 36720284904576
Wed Jan 31 15:37:57 UTC 2018 freeing 36720284904576
Wed Jan 31 15:38:57 UTC 2018 freeing 36720284904576
Wed Jan 31 15:39:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:40:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:41:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:42:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:43:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:44:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:45:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:46:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:47:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:48:57 UTC 2018 freeing 36513627171840
Wed Jan 31 15:49:57 UTC 2018 freeing 36425900268576
Wed Jan 31 15:50:57 UTC 2018 freeing 36326236800672

Code:
Wed Jan 31 16:14:58 UTC 2018 freeing 33135809895552
Wed Jan 31 16:15:58 UTC 2018 freeing 33135809895552
Wed Jan 31 16:16:58 UTC 2018 freeing 33102080994528
Wed Jan 31 16:17:58 UTC 2018 freeing 33013926499008
Wed Jan 31 16:18:58 UTC 2018 freeing 32893026741120
Wed Jan 31 16:19:58 UTC 2018 freeing 32759402160576
Wed Jan 31 16:20:58 UTC 2018 freeing 32606552605536
Wed Jan 31 16:21:58 UTC 2018 freeing 32515364333088
Wed Jan 31 16:22:58 UTC 2018 freeing 32515364333088
Wed Jan 31 16:23:58 UTC 2018 freeing 32515364333088
Wed Jan 31 16:24:58 UTC 2018 freeing 32417295613248
Wed Jan 31 16:33:58 UTC 2018 freeing 30983852434080
Wed Jan 31 16:35:09 UTC 2018 freeing 30726921607200
Wed Jan 31 16:36:09 UTC 2018 freeing 30461738981856
Wed Jan 31 16:37:09 UTC 2018 freeing 30324191832480
Wed Jan 31 16:38:09 UTC 2018 freeing 30048074120064
Wed Jan 31 16:39:09 UTC 2018 freeing 29850841745088
Wed Jan 31 16:40:09 UTC 2018 freeing 29850841745088
Wed Jan 31 16:41:09 UTC 2018 freeing 29727333059424
Wed Jan 31 16:42:09 UTC 2018 freeing 29594220417216
Wed Jan 31 16:43:09 UTC 2018 freeing 29375104816512
Wed Jan 31 16:44:09 UTC 2018 freeing 29135182509888
Wed Jan 31 16:45:09 UTC 2018 freeing 28914044927040
Wed Jan 31 16:46:09 UTC 2018 freeing 28763119729344
 

sandaco

Dabbler
Joined
Jan 15, 2017
Messages
10
I have to add that now the average IOPS is about 100 per drive ( 2-2.5k on the pool ) . which is about 20-25 times more than the earlier stage. . Meaning the resources allow for more performance, but something prevented proper utilization of the resources to speed up the process.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
It looks like you have de-dupe on. That might slow down deletes, because every block that is deleted has to have a reference count reduced. In any case, freeing up space is not a time critical operation, so it is probably rate limited.
 

sandaco

Dabbler
Joined
Jan 15, 2017
Messages
10
Fair points,
But why would such limits affect an otherwise idle machine ? If there are is no IO happening on the pool except for space reclamation, why would it get throttled by 20x ?
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
It might be worth filing a bug report. One reason for staying slow could be consideration for memory pressure with very large dedupe tables, which have a tendency to overfill memory and really degrade performance.
 

mjt5282

Contributor
Joined
Mar 19, 2013
Messages
139
scrubs and resilvers in FN 11.1 use a newer algorithm (you can see your server is using it because it scans the data into memory and "sorts" it before it actually starts the scrub. 0/s is the estimate for the scan, which is really, really slow. Luckily it sped up. I've noticed in my new scrubs this month that the scans can slow down and speed up, and generally are faster than the secondary step, issuing the scrub scan, which is also tracked. Something on your server config (HW or SW) was preventing the scan from proceeding at a normal rate. I also suspect memory and/our dedup (which I have never turned on).
 
Status
Not open for further replies.
Top