After additional testing and number crunching, this appears to be an effect of Ubuntu (which apparently, no matter what I seem to do, only seems to like writing in 1MB blocks to the PVSCSI driver) and the random nature in which VMware casually handles their I/O queuing.
For new writes to a thin provisioned volume, vSphere Hypervisor appears to throttle the writes to the LUN to a queue depth of 32 (but only about 3 in flight; resulting, as you see above, in lower throughput but sweet latency). However, on overwrites of a block in VMFS, vSphere isn't throttling anything by default at all, instantly filling the queue up with 1MB writes (which at a default depth of 64 on PVSCSI, creates around the 27-30ms of latency I am seeing, which if my math is correct when all done in parallel would equate to around that much latency across a 10 gigabit link). Same goes for my gigabit results (which explains why they so nicely line up at a power of 10 of each other). Adding a second link in multipath muted about 10ms of latency (with IOPS=1 RR tuning; due to having an additional link this spread the 64 queue into two 32s, which lowered throughput but decreased latency; bringing me to 19-20ms) and added about 200MB/s worth of additional write bandwidth, but that was all. If I unleashed the PVSCSI default queue in the guest OS to fill the 128 the driver is set for I'm pretty sure I might have gotten closer to 2GB/s but would still hit the wall at 28ms for latency; or around 50ms if you fill the 128 deep queue on a single link.
After all that, read bandwidth I can only surmise is that low due to for some reasons Linux, even with multiple DD threads running and on metal, is potentially only issuing the reads to the array at QD=1. In ESXTOP, I never see any queuing at all on reads which makes me awfully suspicious of this.
Now, by all means I'm not saying there isn't a problem (random 48ms write latency spikes from vCenter while your environment is idle is difficult to explain, even in the face of all this, as is the 96ms+ NFS and software iSCSI latency just attempting to boot and use a single low-rent Windows VM). But one moral of the story here I think is that when someone suggests you run tests of this nature to see what results you get during troubleshooting, you may be extremely surprised that even going all solid-state for storage, plus the suggested 10G cards, and all the DRAM and CPU in the world can't help your latency when you fill up your transport.