So run 4 instances or 2 instances total?
My patented, super-annoying "Yes" answer applies here.
Here is running it on two different vm's 2 instances total; however I noticed that even though I started both at the same time they seem to have completed at different times.... may need to find a better way to benchmark this if CrystalDiskMark is not the best tool here.
View attachment 11254
Part of the problem is that in general, ZFS and benchmarks aren't really good bedfellows. ZFS is doing a ton of stuff that isn't particularly deterministic in the sorts of ways that make for a good benchmark.
If you noticed on that last round of testing, you came very close to doubling aggregate performance merely by running two copies at a time. I bet if you run three copies, you start seeing some falloff, because the underlying disk pool is showing signs of being pretty busy. It isn't just the straightline read speeds which are important with ZFS (your drives are probably ~100MB/sec) but also the seek speeds. Since ZFS is a CoW filesystem, the more free space is available on the pool, the less fragmentation there is, and the faster writes will be. Reads are basically sped up through massive amounts of ARC and L2ARC, which you don't have in great supply on an E3 platform.
So my best guess is that if you try again with three VM's, you'll wind up with read speeds ~100-120MB/sec, write speeds marginally lower, and when you look at your disks with gstat they'll be pegged. You've probably found the limits here.
With seven 500GB vdevs, you have 3.5TB of storage. For maximum performance, don't use more than 1.5TB of it. Even at that, you're likely to notice over time that the numbers you get will drop somewhat. They'll get to a certain level and then stabilize. Look at my frequent discussions on fragmentation for help understanding this.