iSCSI stops working

Status
Not open for further replies.

tmueko

Explorer
Joined
Jun 5, 2012
Messages
82
tonight, iSCSI on FreeNAS-8.3.1 stopped working, again.

Installation are 2 ESX-i with about 15 VMs and 1 FreeNAS just running iSCSI. Snapshots are done every hour and mirrored to a FreeBSD-9.1 Maschine.

Error Message was this:
Code:
Apr 17 04:02:41 wolfgang istgt[1818]: istgt_iscsi.c:4764:istgt_iscsi_transfer_out: ***ERROR*** iscsi_read_pdu() failed, r2t_sent=0
Apr 17 04:02:41 wolfgang istgt[1818]: istgt_iscsi.c:5852:worker: ***ERROR*** iscsi_task_transfer_out() failed on <IQN>:wolfgang,t,0x0001(<IQN>.esxi02,i,0x00023d000001)


After stopping and staring of iSCSI-Service using the FreeNAS-GUI the VMs on that maschine came back to life, so o don't think it's a hardware problem (we already changed the Hardware running FreeNAS).

Error Message on the ESXi was
Code:
Device t10.FreeBSD_iSCSI_Disk______120000010________________
_______ performance has deteriorated. I/O latency increased from 
average value of 18663 microseconds to 1812759 microseconds.
warning
17.04.2013 05:24:38
192.168.222.11


There was no Problem with the zpool
Code:
[root@wolfgang] ~# zpool status -v
  pool: daten
 state: ONLINE
  scan: none requested
config:

	NAME                                          STATE     READ WRITE CKSUM
	daten                                         ONLINE       0     0     0
	  gptid/d9f131e8-a1a6-11e2-8e6c-0025909ac99e  ONLINE       0     0     0


Hardware of FreeNAS ist
Supermicro X9DR3-F
2x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
32 GB RAM
3ware 9750-4i with 4 x 3TB WD red

Code:
[root@wolfgang] ~# tw_cli /c0 show all
/c0 Driver Version = 10.80.00.003
/c0 Model = 9750-4i
/c0 Available Memory = 488MB
/c0 Firmware Version = FH9X 5.12.00.013
/c0 Bios Version = BE9X 5.11.00.007
/c0 Boot Loader Version = BT9X 6.00.00.004
/c0 Serial Number = SV23104102
/c0 PCB Version = Rev 001
/c0 PCHIP Version = B4
/c0 ACHIP Version = 05000e00
/c0 Controller Phys = 8
/c0 Connections = 4 of 128
/c0 Drives = 4 of 127
/c0 Units = 1 of 127
/c0 Active Drives = 4 of 127
/c0 Active Units = 1 of 32
/c0 Max Drives Per Unit = 32
/c0 Total Optimal Units = 1
/c0 Not Optimal Units = 0 
/c0 Disk Spinup Policy = 1
/c0 Spinup Stagger Time Policy (sec) = 1
/c0 Auto-Carving Policy = off
/c0 Auto-Carving Size = 2048 GB
/c0 Auto-Rebuild Policy = on
/c0 Rebuild Mode = Adaptive
/c0 Rebuild Rate = 1
/c0 Verify Mode = Adaptive
/c0 Verify Rate = 1
/c0 Controller Bus Type = PCIe
/c0 Controller Bus Width = 8 lanes
/c0 Controller Bus Speed = 5.0 Gbps/lane

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    OK             -       -       256K    8381.87   RiW    ON     

VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   2.73 TB   SATA  0   -            WDC WD30EFRX-68AX9N0
p1    OK             u0   2.73 TB   SATA  1   -            WDC WD30EFRX-68AX9N0
p2    OK             u0   2.73 TB   SATA  2   -            WDC WD30EFRX-68AX9N0
p3    OK             u0   2.73 TB   SATA  3   -            WDC WD30EFRX-68AX9N0


I also attached the output of "sysctl -a", "dmesg" and "arc_summary.py", maybe it helps.
 

Attachments

  • arc_summary.txt.zip
    2.5 KB · Views: 232
  • sysctl-a.txt.zip
    22.1 KB · Views: 274
  • dmesg.txt.zip
    5.6 KB · Views: 240

tmueko

Explorer
Joined
Jun 5, 2012
Messages
82
Ok, System is not under heavy load, but I will try vfs.zfs.txg.synctime_ms=200 and vfs.zfs.txg.timeout=1

I OpenIndiana/Nexenta known to have the same problems?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, you can't just try making random setting changes. You pretty much have to walk through and characterize the specific problem. There are a variety of underlying problems here. In the old days, us UNIX guys were expected to be required to be able to characterize and isolate problems, and then make tuning changes to solve them. ZFS is very much that sort of a beast. You might luck out with OpenIndiana or Nexenta, but generally speaking the problem is probably that your pool is a lot slower than ZFS thinks it is. Also, iSCSI is very intolerant of high latency, which ZFS is very good at creating under high I/O pressures. You actually have to understand the parameters, then figure out what actually makes sense (by testing), then set it, test it again, maybe a few iterations, and then it works very well.

1531 describes my exploration of this. We have boxes that are ultimately destined to run as ESXi but also provide some level of file service, so if you put FreeNAS on the bare metal, what it finds is a very large server platform with just a few drives. A Xeon E3-1230 and 32GB is massive compared to four mid-2000-era 400GB drives in RAIDZ2.

But if you just try to pick some random settings to change, you're pretty much screwed.
 

tmueko

Explorer
Joined
Jun 5, 2012
Messages
82
buff, punch ... you hit me.

OK, we boucht 6x 600GB SAS an 2 x SSD. We parted ssd in 20 GB mirrored zil and rest unmirrored cache.
In addtition we added a 4xPort GB Ethernet Card für multi-path. Every non iSCSI-Job was removed.

So, we'll give it a try and report back in a month-:)
 

tmueko

Explorer
Joined
Jun 5, 2012
Messages
82
All the hardware is in. Now we get this, which is still "broken pipe", right?

Code:
istgt[2879]: istgt_iscsi.c:1261:istgt_iscsi_write_pdu_internal: ***ERROR*** writev() failed (errno=32,<IQN>,time=0)
2013-05-06T13:43:13+02:00 192.168.222.2 istgt[2879]: istgt_iscsi.c:5414:sender: ***ERROR*** iscsi_write_pdu() failed on <IQN>:uvn,t,0x0001(<IQN>,i,0x00023d000008)
istgt[2879]: istgt_iscsi.c:1261:istgt_iscsi_write_pdu_internal: ***ERROR*** writev() failed (errno=32,<IQN>,time=0)
[...]
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Right, but very specifically it's a sign that the initiator has dropped the connection while FreeNAS is trying to write data at it. Which probably doesn't make sense to you. So to fully understand this, basically:

1) Lots of stuff is happening
2) ZFS gets to a point where it wants to flush a huge amount of writes to disk
3) ZFS flushes to disk, probably taking long enough to cause iSCSI to pause
4) Initiator goes "hmm this is unacceptable" and drops connection
5) ZFS finishes, iSCSI target tries to send "command completed" to initiator, finds the socket is no longer connected, logs error

So here's the thing. I ran into this with a kind-of pathological case, a very large server with a very small pool. 1531 walks through my exploration and resolution of the problem. The good news? The trick is basically to not let ZFS go absolutely bonkers, and instead of focusing on throughput in MBytes/sec, you focus on responsiveness. The bad news? Your pool still needs to be fast enough to deal with the overall I/O demands (or suffer slowdowns) but once you've got a reasonable pool for your workload, this devolves into an exercise in tuning.

So you can either read through 1531 - and for best success you really ought to read it all, especially seeing as how your hardware configuration somewhat resembled what I was working with - or we can very slowly step through the process here in the forum.

The TL;DR is this: you have to correctly find the magic numbers for the amount of write your pool can safely sustain, and then configure that into ZFS, and then test, test, test under various stressors, including testing heavy write loads while doing a scrub. The most reliable write numbers I came up with ended up being about 1/2-2/3 the speed that a pure dd file write would have led you to believe the pool was capable of. Since a production pool is never likely to be *just* servicing one particular task, this should even make sort-of sense. But also don't get too aggressive or paranoid. ZFS actually has some code that allows it to self-tune this sort of thing. The underlying problem is that it starts out with no idea of the pool's capabilities, and guesses far too large. So we're just helping teach it.
 
Status
Not open for further replies.
Top