Serious doubt about iSCSI in 11.2-rc2

Status
Not open for further replies.

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi,

First server (Dell T-110 ; 8Gig RAM ECC ; Intel CPU i3-2100 CPU @ 3.10GHz ; 5x IronWolf 3TB RaidZ-2) is doing fine with FreeNAS 11.2-rc2.
Second server (Dell T-130 ; 16Gig RAM ECC ; Intel Xeon E1230-v6 3.5GHz ; 5x IronWolf 4TB RaidZ-2), not so much.

After a lot of problems with the USB boot, I changed the USB key and installed 2 of them as mirror because it was detaching all the time within max 2 days.

After few days without problems on USB, I started to move my data out of Server 2 back to Server 1 for me to rebuild the pool. When trying to empty he iSCSI datastore, transfer started in the 400 mbps as expected, but dropped to about 20. On the console, I found a million lines of swap errors. Because the second server is still new, I re-started a second set of memtest to see if the RAM is wrong. Still no error after some more complete memtest runs. I have no jail, no plugins, no SMB, no AFP, only NFS, iSCSI and zfs replication tasks running. About that replication, the full sync was over ago and only very minimalistic daily changes were to be propagated.

Server1 did not suffered any problem at all, with the same data, sending / receiving the same data, offering the same NFS shares, over the same network and all, BUT for the iSCSI. iSCSI is the only thing that I never configured on Server1.

Considering how memory problems often translate in random and strange seemingly unrelated malfunctions, I suspect there may be a memory leak or something like that in iSCSI in v11.2-rc2. Unfortunately, I have to build that server, configure it, deploy the data and stage it for long enough before sending it offsite. As such, I can not keep investigating the incident.

iSCSI client was ESXi 6.7 U1, latest build and patches. Connection was direct (no switch) from server's built-in broadcom NIC and extra PCIe Intel NIC (2 cables) and configured to distribute IO over the 2 links equally (max IOP=1).

Should you need more input about how I was configured, it will be my pleasure to tell you more.

Sorry not to have time to investigate beyond that....

Heracles31
 
D

dlavigne

Guest
Please create a report at bugs.freenas.org and post the issue number here.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi,

Thanks for the reply. I did it and received number #59733

Memtest did another 10 runs without a single error during the night. Pretty sure the RAM is good...

Have a nice day,
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi,

I noticed this bug that can very be related to the same problem:

#58308

The guy has way more RAM then I do, his server is doing basically nothing special but uses iSCSI and despite its 96G RAM with 16 dedicated for the system, he still drops to using SWAP in a way that freeze the system.

Few in commenting his ticket said that they observed the same kind of activity in all 11.2 versions.

Good luck finding the bug and fixing it,
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi again,

For the issue I created, I will dump the sysconfig in the ticket once back at home. While reading, I found this:
https://forums.freenas.org/index.php?threads/swap_pager_getswapspace-failed.70303/

A python script that goes crazy because of snapshots of jails that were removed from the system. Technically, none of the snapshot I replicated between my servers were from jails, existing or not, but the symptom described in that thread also matches what I experienced with my servers.
 
Status
Not open for further replies.
Top