Well, there's a lot of things going on when you are doing that work, many of them won't be limited by your zpool itself. For example:
The processing time needed to accomplish the job. Despite the fact that the task may not be CPU intensive, that added latency as the streams are multiplexed means that while the multiplexing is happening you likely have little to no transfer rate over the LAN. This is likely for a few ms at a time, so unless you go down to that level of resolution you probably can't even see this behavior.
The pool has latency while attempting to read and then write. It can't always do the reading, and it can't always do the writing, so some happy medium has to be found. ZFS takes turns reading, then writing. If ZFS needs to do some writes and your workstation is waiting for reads, that's lost time.
The whole data path between the CPU on your workstation and the actual bits on your zpool is filled with latency. Not much in your eyes and my eyes. But when you request a block of data(regardless of the size) and it takes 3-10ms to retrieve that information(which is about what I'd expect for ideal circumstances) that means you can only retrieve 10-30 blocks of data in a second. That's not very much even if each block is 1MB.
I've ripped audio streams out of my movies before for languages I don't speak. Despite the fact that I can copy the file from my server to local storage in less than 20 seconds, multiplexing the exact same file over the network takes several minutes. So I'd say that what you are seeing is completely normal for your workload. If you have money to blow and you do this as a full-time job, you might want to look into spending money on RAM. After all, time *is* money. But I wouldn't necessarily expect to see your multiplexing drop from 5 minutes to 2 minutes or less by adding more RAM.
If time is money, you may find satisfaction in copying the file locally to an SSD, doing the transcoding and then copying the file back to the server. Or something similar.
The problem isn't with the pool or it's performance, its with the "whole picture"(pun not intended).
I'm not sure what your use-case is for what you do or what program you use. I use mkvtoolnix. It actually has a FreeBSD version that you can absolutely run in a FreeBSD jail. If you are savvy with the CLI and are okay with writing out 100+ character command line arguments you can probably do some seriously fast multiplexing. I've thought about creating a script and putting it on the forums that would automatically strip out languages that aren't some predetermined language. But, when I look at the complexity of it, and the amount of time it would take to do, it's really not worth my time as a volunteer. When I looked at it before I was thinking of a script that was somewhat user configurable yet simple. I guesstimated it would take me something like 3-5 days to put the whole thing together, test it to it's limits, then put it out there. But crowd-sourcing isn't exactly something this community is good at. ;)