In the ZFS model, the average IOPS of a vdev closely resembles that of the slowest member device. With other RAID implementations, there can be many other factors, because many hardware options are focused on getting a lot of performance out of a single RAID5 or RAID6 group. Part of the strategy for ZFS was not to worry as much about that; the system was really designed for large storage servers, and environments where RAIDZ was really going to be used for nearline storage applications rather than high performance. And RAIDZ can be made much faster by the use of small RAIDZ vdevs that are then striped at the pool level. The design considerations for ZFS are really targeted at systems with dozens of drives, Sun Thumper style.
I don't know what Nimble is actually doing behind the scenes as far as storage, but there are lots of places to optimize if you are offering a product like that. ZFS suffers because it has to try to be a generalized solution. That means that some options that might be available to the Nimble designers are not things that ZFS's designers could rely on.
Reads from mirrors can potentially fulfill the read from either mirror drive, yes.
With ZFS, while we like to call it "RAIDZ", what is actually happening is more of a stripe, where blocks are written followed by a parity block (or two or three for Z2, Z3). This basically causes the disks in the vdev to operate in a seek-synchronous manner, and so the slowest disk becomes the controlling concern for speed. Does that make sense?
With a mirror, there should be an advantage in that the data can be written out in parallel to both disks, plus the lack of needing to write that parity block. However, you can also have multiple vdevs, which is the natural configuration if you have four or more drives, and at that point you should note that each vdev can be busy with a separate I/O operation. So writes should be faster as well.
Since ARC plus an L2ARC will tend to be efficient at caching often-read content, that works in favor of a properly sized ZFS system. It probably doesn't do much for benchmark results, unless you design a benchmark that takes the behaviour of L2ARC into account.