remonv76
Dabbler
- Joined
- Dec 27, 2014
- Messages
- 49
Hi there,
I need some help and understanding why we are getting slow performance with 2x RMS-200 8GB in mirror as a SLOG device to our RAIDZ2 pool (3x VDEVs). As you can see in the attachement, after 128KB @704MB/s write performance it drops and yoyo's up and down.
Now when we add these 2x 8GB RMS200 as a stripe device with 16GB SLOG in total, it will max out @1.09GB/s (around max 10Gbps interface) after 128KB. (second attachment)
We also have 2x Samsung nvme DCT 983 1.9T and created 2x 16GB partitions. We added these as a mirror SLOG to the pool and again we get the same maximum performance @1.09GB/s as the 2x RMS-200 in stripe configuration. (no screenshot)
The interesting part is, we created 2x 8GB partitions on the Samsung nvme drives and added these as a mirror SLOG. We get the same performance issue's as the RMS-200 in mirror. So it has to do with the size of the SLOG. And it seems 8GB is just not enough and somehow the system throttles the speed earlier then 60%. But i need to monitor how much is in the transaction group during the test. I tried a dirty data script, but that shows only 260MB in use of the 4GB. So i’m lost…..
We tried tuning dirty_data settings (lowering the transaction group size from 4GB to 3.5 or 2GB), but nothing seem to work. We do get a little bit better performance till 512KB and then the performance dops again.
Also a single RMS-200 gives the same yoyo effect at max 700MB/s.
We also tried changing the VDEV to mirror, but get the performance issue's as a RAIDZ2.
Config test setup:
Memory 512GB
Pool: 18x 1.8T SAS drives configured as 3 VDEVs. (19TB pool size)
SLOG: 2x RMS-200 8GB PCIe nvme
L2ARC: 2x Samsung DCT 983 1.9T nvme
Environment: VMware infra - Truenas Core 13.x / NFS shares - Windows VPS for testing.
sysctl -a | grep zfs.dirty
vfs.zfs.dirty_data_sync_percent: 20
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max: 4294967296
vfs.zfs.dirty_data_max_max_percent: 25
vfs.zfs.dirty_data_max_percent: 10
dtrace dirty data from 512b till around 8MB block size test.
31 80621 none:txg-syncing 25MB of 4096MB used --- start test 512B
21 80621 none:txg-syncing 28MB of 4096MB used
3 80621 none:txg-syncing 38MB of 4096MB used
3 80621 none:txg-syncing 63MB of 4096MB used
23 80621 none:txg-syncing 67MB of 4096MB used
3 80621 none:txg-syncing 84MB of 4096MB used
0 80621 none:txg-syncing 121MB of 4096MB used
3 80621 none:txg-syncing 150MB of 4096MB used
30 80621 none:txg-syncing 217MB of 4096MB used
3 80621 none:txg-syncing 261MB of 4096MB used ----- around 128KB
12 80621 none:txg-syncing 269MB of 4096MB used
29 80621 none:txg-syncing 260MB of 4096MB used
27 80621 none:txg-syncing 258MB of 4096MB used
11 80621 none:txg-syncing 258MB of 4096MB used
8 80621 none:txg-syncing 264MB of 4096MB used
20 80621 none:txg-syncing 259MB of 4096MB used
13 80621 none:txg-syncing 259MB of 4096MB used
1 80621 none:txg-syncing 268MB of 4096MB used
3 80621 none:txg-syncing 261MB of 4096MB used
Who can tell me what the problem could be and if there is something we could try? Maybe @jgreco knows what is going on here.
	
		
			
		
		
	
			
			I need some help and understanding why we are getting slow performance with 2x RMS-200 8GB in mirror as a SLOG device to our RAIDZ2 pool (3x VDEVs). As you can see in the attachement, after 128KB @704MB/s write performance it drops and yoyo's up and down.
Now when we add these 2x 8GB RMS200 as a stripe device with 16GB SLOG in total, it will max out @1.09GB/s (around max 10Gbps interface) after 128KB. (second attachment)
We also have 2x Samsung nvme DCT 983 1.9T and created 2x 16GB partitions. We added these as a mirror SLOG to the pool and again we get the same maximum performance @1.09GB/s as the 2x RMS-200 in stripe configuration. (no screenshot)
The interesting part is, we created 2x 8GB partitions on the Samsung nvme drives and added these as a mirror SLOG. We get the same performance issue's as the RMS-200 in mirror. So it has to do with the size of the SLOG. And it seems 8GB is just not enough and somehow the system throttles the speed earlier then 60%. But i need to monitor how much is in the transaction group during the test. I tried a dirty data script, but that shows only 260MB in use of the 4GB. So i’m lost…..
We tried tuning dirty_data settings (lowering the transaction group size from 4GB to 3.5 or 2GB), but nothing seem to work. We do get a little bit better performance till 512KB and then the performance dops again.
Also a single RMS-200 gives the same yoyo effect at max 700MB/s.
We also tried changing the VDEV to mirror, but get the performance issue's as a RAIDZ2.
Config test setup:
Memory 512GB
Pool: 18x 1.8T SAS drives configured as 3 VDEVs. (19TB pool size)
SLOG: 2x RMS-200 8GB PCIe nvme
L2ARC: 2x Samsung DCT 983 1.9T nvme
Environment: VMware infra - Truenas Core 13.x / NFS shares - Windows VPS for testing.
sysctl -a | grep zfs.dirty
vfs.zfs.dirty_data_sync_percent: 20
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max: 4294967296
vfs.zfs.dirty_data_max_max_percent: 25
vfs.zfs.dirty_data_max_percent: 10
dtrace dirty data from 512b till around 8MB block size test.
31 80621 none:txg-syncing 25MB of 4096MB used --- start test 512B
21 80621 none:txg-syncing 28MB of 4096MB used
3 80621 none:txg-syncing 38MB of 4096MB used
3 80621 none:txg-syncing 63MB of 4096MB used
23 80621 none:txg-syncing 67MB of 4096MB used
3 80621 none:txg-syncing 84MB of 4096MB used
0 80621 none:txg-syncing 121MB of 4096MB used
3 80621 none:txg-syncing 150MB of 4096MB used
30 80621 none:txg-syncing 217MB of 4096MB used
3 80621 none:txg-syncing 261MB of 4096MB used ----- around 128KB
12 80621 none:txg-syncing 269MB of 4096MB used
29 80621 none:txg-syncing 260MB of 4096MB used
27 80621 none:txg-syncing 258MB of 4096MB used
11 80621 none:txg-syncing 258MB of 4096MB used
8 80621 none:txg-syncing 264MB of 4096MB used
20 80621 none:txg-syncing 259MB of 4096MB used
13 80621 none:txg-syncing 259MB of 4096MB used
1 80621 none:txg-syncing 268MB of 4096MB used
3 80621 none:txg-syncing 261MB of 4096MB used
Who can tell me what the problem could be and if there is something we could try? Maybe @jgreco knows what is going on here.
Attachments
			
				Last edited: 
			
		
	
								
								
									
	
		
			
		
		
	
	
	
		
			
		
		
	
								
							
							 
				 
 
		 
			 
			 
			 
 
		 
			
		
	
	
		 
 
		 
 
		 
 
		