iscsi_execute() failed

Status
Not open for further replies.

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
I use VirtualBox to access a few iSCSI targets. Everything is working fine and the performance is acceptable. But the log file is filled with errors:

Code:
Apr  3 21:10:11 freenas istgt[2473]: istgt_iscsi.c:3640:istgt_iscsi_op_scsi: ***ERROR*** StatSN(2548/2583) QCmdSN(34) error
Apr  3 21:10:11 freenas istgt[2473]: istgt_iscsi.c:5053:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Apr  3 21:10:11 freenas istgt[2473]: istgt_iscsi.c:5728:worker: ***ERROR*** iscsi_execute() failed on iqn.2013-11.freenas:win-office,t,0x0001(iqn.2009-08.com.sun.virtualbox.init
iator:01:10.0.50.21,i,0x80123456dc46)
Apr  3 21:10:12 freenas istgt[2473]: Login from iqn.2009-08.com.sun.virtualbox.initiator:01:10.0.50.21 (10.0.50.21) on iqn.2013-11.freenas:win-office LU1 (10.0.50.15:3260,1), IS
ID=80123456de2d, TSIH=2320, CID=1, HeaderDigest=off, DataDigest=off
Apr  3 21:10:12 freenas istgt[2473]: istgt_iscsi.c:3640:istgt_iscsi_op_scsi: ***ERROR*** StatSN(2/38) QCmdSN(35) error
Apr  3 21:10:12 freenas istgt[2473]: istgt_iscsi.c:5053:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Apr  3 21:10:12 freenas istgt[2473]: istgt_iscsi.c:5728:worker: ***ERROR*** iscsi_execute() failed on iqn.2013-11.freenas:win-office,t,0x0001(iqn.2009-08.com.sun.virtualbox.init
iator:01:10.0.50.21,i,0x80123456de2d)
Apr  3 21:10:13 freenas istgt[2473]: Login from iqn.2009-08.com.sun.virtualbox.initiator:01:10.0.50.21 (10.0.50.21) on iqn.2013-11.freenas:win-office LU1 (10.0.50.15:3260,1), IS
ID=80123456de2e, TSIH=2321, CID=1, HeaderDigest=off, DataDigest=off
Apr  3 21:10:14 freenas istgt[2473]: istgt_iscsi.c:3640:istgt_iscsi_op_scsi: ***ERROR*** StatSN(2/48) QCmdSN(45) error
Apr  3 21:10:14 freenas istgt[2473]: istgt_iscsi.c:5053:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Apr  3 21:10:14 freenas istgt[2473]: istgt_iscsi.c:5728:worker: ***ERROR*** iscsi_execute() failed on iqn.2013-11.freenas:win-office,t,0x0001(iqn.2009-08.com.sun.virtualbox.init
iator:01:10.0.50.21,i,0x80123456de2e)
Apr  3 21:10:15 freenas istgt[2473]: Login from iqn.2009-08.com.sun.virtualbox.initiator:01:10.0.50.21 (10.0.50.21) on iqn.2013-11.freenas:win-office LU1 (10.0.50.15:3260,1), IS
ID=80123456de2f, TSIH=2322, CID=1, HeaderDigest=off, DataDigest=off
Apr  3 21:10:15 freenas istgt[2473]: istgt_iscsi.c:3640:istgt_iscsi_op_scsi: ***ERROR*** StatSN(2/39) QCmdSN(36) error
Apr  3 21:10:15 freenas istgt[2473]: istgt_iscsi.c:5053:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Apr  3 21:10:15 freenas istgt[2473]: istgt_iscsi.c:5728:worker: ***ERROR*** iscsi_execute() failed on iqn.2013-11.freenas:win-office,t,0x0001(iqn.2009-08.com.sun.virtualbox.init
iator:01:10.0.50.21,i,0x80123456de2f)
Apr  3 21:10:16 freenas istgt[2473]: Login from iqn.2009-08.com.sun.virtualbox.initiator:01:10.0.50.21 (10.0.50.21) on iqn.2013-11.freenas:win-office LU1 (10.0.50.15:3260,1), IS
ID=80123456de30, TSIH=2323, CID=1, HeaderDigest=off, DataDigest=off
Apr  3 21:10:16 freenas istgt[2473]: istgt_iscsi.c:3640:istgt_iscsi_op_scsi: ***ERROR*** StatSN(2/45) QCmdSN(42) error
Apr  3 21:10:16 freenas istgt[2473]: istgt_iscsi.c:5053:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Apr  3 21:10:16 freenas istgt[2473]: istgt_iscsi.c:5728:worker: ***ERROR*** iscsi_execute() failed on iqn.2013-11.freenas:win-office,t,0x0001(iqn.2009-08.com.sun.virtualbox.init
iator:01:10.0.50.21,i,0x80123456de30)
Apr  3 21:10:17 freenas istgt[2473]: Login from iqn.2009-08.com.sun.virtualbox.initiator:01:10.0.50.21 (10.0.50.21) on iqn.2013-11.freenas:win-office LU1 (10.0.50.15:3260,1), IS
ID=80123456de31, TSIH=2324, CID=1, HeaderDigest=off, DataDigest=off
Apr  3 21:10:17 freenas istgt[2473]: istgt_iscsi.c:3640:istgt_iscsi_op_scsi: ***ERROR*** StatSN(2/48) QCmdSN(45) error
Apr  3 21:10:17 freenas istgt[2473]: istgt_iscsi.c:5053:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed
Apr  3 21:10:17 freenas istgt[2473]: istgt_iscsi.c:5728:worker: ***ERROR*** iscsi_execute() failed on iqn.2013-11.freenas:win-office,t,0x0001(iqn.2009-08.com.sun.virtualbox.init
iator:01:10.0.50.21,i,0x80123456de31)
Apr  3 21:10:18 freenas istgt[2473]: Login from iqn.2009-08.com.sun.virtualbox.initiator:01:10.0.50.21 (10.0.50.21) on iqn.2013-11.freenas:win-office LU1 (10.0.50.15:3260,1), IS
ID=80123456de32, TSIH=2325, CID=1, HeaderDigest=off, DataDigest=off
Apr  3 21:10:18 freenas istgt[2473]: istgt_iscsi.c:3640:istgt_iscsi_op_scsi: ***ERROR*** StatSN(2/41) QCmdSN(38) error
Apr  3 21:10:18 freenas istgt[2473]: istgt_iscsi.c:5053:istgt_iscsi_execute: ***ERROR*** iscsi_op_scsi() failed


Maybe someone has come across these errors before. Note that the initiator performs a new login every second. This is certainly not supposed to happen.

Any ideas how to fix this?

Specs:
FreeNAS-9.2.1.3-RELEASE-x64
G2020, 32GiB ECC RAM, pool is healthy, cable connections are ok
 

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
No, I did not.

However, yesterday I upgraded to FreeNAS-9.2.1.4.1-RELEASE-x64 and I can't find any iscsi errors in my log file so far. But that might be related to smbd flooding the logs (thanks for making me check, else I wouldn't have noticed), but that's a different issue I have to look into.

I will report back the next days if this has magically been fixed by updating virtualbox or freenas or if it's still an issue.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That error is usually the result of a pool that can't keep up with the I/O load from VMs(yes, this is extremely common for FreeNAS servers hosting VMs). The solution is to do serious upgrades to your box... think 128GB of RAM, different vdev layout in your pool, more spindles(or going strictly SSD), etc.

Those errors are basically telling you that data is being lost.
 

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
OK. I can confirm that the issue is not fixed, yet. I still get those errors.

@cyberjock: I can reproduce the issue with just one VM running being almost idle (no significant IO). I guess 32GiB of memory and two vdevs (four spindles) should be fine to run a single VM. That's at least what you suggest in some forum posts.

128GiB is totally out of the question, since we would need totally new hardware (and a sponsor btw). An SSH for the VMs could be an option, but I'm not sure it will solve the issue.

I'm not convinced that the issue is too much IO for the freenas box. Do you have any reference that this issue is related to high IO? I can't find any.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
@cyberjock: I can reproduce the issue with just one VM running being almost idle (no significant IO). I guess 32GiB of memory and two vdevs (four spindles) should be fine to run a single VM. That's at least what you suggest in some forum posts.

Yeah. I'd say a single VM should be able to run(albeit slow but without errors). There's the *possiblity* your pool is fragmented to an extent that it wouldn't work. While possible, I'd have to think you'd be complaining about the whole server being pitifully slow before you'd be complaining about this.

128GiB is totally out of the question, since we would need totally new hardware (and a sponsor btw). An SSH for the VMs could be an option, but I'm not sure it will solve the issue.

Yeah, most people are like "what!? $2000 worth of hardware!? no thanks! Usually this is accompanied by a comment like "You are an idiot.. I've worked in IT for longer than you've walked upright, and file servers don't need that quantity of RAM!"

As for the SSH.. not sure I follow you on your thought process.

I'm not convinced that the issue is too much IO for the freenas box. Do you have any reference that this issue is related to high IO? I can't find any.

My reference is experience with that issue and troubleshooting stuff over the last 2 years for people. Learned a few things over that time. Like I said before though, that's *usually* the problem. Yours may or may not fall into that category.

As you admitted above, a system with 128GB of RAM isn't an option. The normal flowpath for this problem is:

1. Person X builds a server.. they build it cheap... aka 32GB of RAM or less.
2. They setup the server and its slow, but works.
3. Loading begins to increase as they do more and more VM stuff and/or add more VMs.
4. At some point in the future(sometimes hours, days, weeks, or months) the errors start and they wonder what they did wrong. They start trying to backtrack if they made a settings change recently, upgraded FreeNAS,etc.
5. They show up here asking how to solve the problem and I tell them their system is woefully undersized for the expected task.
6. They freak out, tell me what an idiot I am and that I don't know what I'm talking about. Often they throw the "I've been in IT for (some high number) years.. you are full of sh*t if you think a file server needs more than 32GB of RAM".
7. They go back and start doing stupid things to mitigate the problem. Come up empty handed and decide FreeNAS and ZFS isn't for them. Sometimes they do the forums the honor of a post that says things like "Your devs are idiots if they don't know how to manage resources well enough to have a fast server that requires less than 128GB of RAM" and "this forum is where a bunch of egotistical idiots post.. there's no actual information here".

You didn't do the last 2, but I've been around the block with this more than once.

You'd have to look and see what your initiator is doing. Something weird is going on, but without more detailed logs I really can't help. The number of things that could potentially go wrong causing that error is almost endless. :(
 

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
Yeah. I'd say a single VM should be able to run(albeit slow but without errors). There's the *possiblity* your pool is fragmented to an extent that it wouldn't work. While possible, I'd have to think you'd be complaining about the whole server being pitifully slow before you'd be complaining about this.

I'm quite satisfied with the performance. No issues there. The pool has never been more than 60% full. This doesn't look like a fragmentation issue to me.

As for the SSH.. not sure I follow you on your thought process.

Typo (muscle memory), I meant an SSD could be an option.

1. Person X builds a server.. they build it cheap... aka 32GB of RAM or less.

Before purchasing hardware I did spend quite some time reading all relevant forum posts about the design of a decent rig. You recommended ≥32GiB when VMs are involved. And mirrors. We neither need plety of storage nor high performance. But our data is valuable, that's why I decided to go with ZFS. It's just a few VMs which are idle most of the time and some mutual storage. I might be wrong, but I think the specs are decent for our purpose.

2. They setup the server and its slow, but works.

The performance is decent. The system actually works great if you ignore those errors.

4. At some point in the future(sometimes hours, days, weeks, or months) the errors start and they wonder what they did wrong. They start trying to backtrack if they made a settings change recently, upgraded FreeNAS,etc.

The errors were present from day 1 with just a single VM spun up.

5. They show up here asking how to solve the problem and I tell them their system is woefully undersized for the expected task.

Our system is woefully undersized? Even a single running VM causes those errors. I think 32GiB and four spindles should be fine for that. Performance is not the issue here, it's errors.

You'd have to look and see what your initiator is doing. Something weird is going on, but without more detailed logs I really can't help. The number of things that could potentially go wrong causing that error is almost endless.

I agree. Actually I have the gut feeling that it's something related to virtualbox rather than freenas. I'll try to set up different initiators or run the VMs off another iSCSI target. But it'll take some time.

Before digging into the issue I wanted to ask the freenas community first. Maybe someone had the same problem before and knows the solution. That would save several hours of work for me.

I didn't say you're an idiot (you're a little cocky, though) and I really appreciate your help. I'm thankful for advice and opinions. You have more experience with freenas/zfs than I do. But I can't just buy a new (possibly woefully oversized) rig just to see that it didn't solve the issue, which so far, only is an error in the log file, no known data loss (yet).
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
curious, what does your memory usage look like? There's no swap usage is there?

Can you track disk access latencies in virtualbox?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh, I agree that buying $2500+ of hardware to find out that's not your problem isn't your first best option. I'd try to narrow down the exact result of your problem if possible. That's going to take some hours worth of research and testing though.

And while you say "performance is not the issue here, it's errors" I'm not 100% convinced that the errors aren't related to performance. Not saying they *are*, just that I don't think they've really been ruled out. If ZFS caches writes and starts timing out iscsi it'll get it's panties in a bind and start having problems. So you can absolutely have a problem where your pool is not able to keep up with the I/O but your VMs appear to behave at a respectable speed.

I'd start collecting empirical evidence and try to rule out other things but don't dismiss performance as not being a possibility unless you have solid irrefutable empirical evidence that's not a problem. It may be slim for your situation, but it's pretty regular(and common) around here for many users. It is by far the biggest complaint for iSCSI users. ZFS is pretty complex and far far too many people don't appreciate the excessively complex monster called ZFS. Never be fooled by how simple you think ZFS is. Just yesterday someone asked me why I was searching for a certain string of hex on a disk(I was looking for ZFS metadata). And despite my rather deep knowledge of ZFS I'm shocked regularly. Every day I'm slightly more shocked at how complex and dynamic ZFS is at tuning itself for most situations and handling load changes on the fly.
 

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
There's no swap usage is there?

As you can see from the stats, due to memory leaks (bug #4759) swap was heavily used before the 9.2.1.4 upgrade two days ago. This is now fixed.

LmNP1Dj.png
whIDuoK.png


Can you track disk access latencies in virtualbox?

I have no clue. There's no GUI element for that. It surely does some logging, but I don't know if it also logs disk latency. I'll have to read the manual.
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
Has this happened since the .4 upgrade?

It absolutely could have caused istgt transaction group problems.
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
No, it was also present in FreeNAS-9.2.1.3-RELEASE-x64. Version 9.2.1.4 wasn't even released when I posted the thread.

Ahh sorry, totally missed the post about it not happening since the upgrade.

I would be surprised if that wasn't causing your issues.
 

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
Just to clarify: The errors mentioned in the OP show up at least in the following versions
  • FreeNAS-9.2.1.3-RELEASE-x64
  • FreeNAS-9.2.1.4-RELEASE-x64
  • FreeNAS-9.2.1.4.1-RELEASE-x64
 

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
I could reproduce the errors using a Debian machine as iSCSI target. The messages are different, but there's obviously something wrong, too. So my assumption is that it's an issue with Virtualbox's iSCSI initiator, rather than the target. I will approach the Virtualbox guys and ask for opinions.
 
Status
Not open for further replies.
Top