Given that the CPU load is confirmed as high, the pcap probably isn't as useful. The rest would be interesting to look at.
Are you able to test with smaller packet sizes? That might help to isolate packet-throughput (ie. connection tracking) versus byte-throughput problems.
- Jonathan Morton