Based on the results, even the 2GHz Quad-Core A53 on TP-Link XDR 6088 can achieve 818 Mbits/sec. I doubt the Raspberry Pi 4's result of only 394 Mbits/sec is accurate as it has Quad-Core A72 @ 1.5GHz. Then, I switched back to the archlinuxarm-based PiKVM distro which my Raspberry PI 4 usually works on with armv7l kernel rather than aarch64 on Raspberry Pi OS, and ran the benchmark. Then, the result made me astonished.
| Device / CPU | OS / Kernel / iperf Param | Speed |
| Raspberry Pi 4 / BCM2711* | Debian bookworm / 6.1.63 | 394 Mbits/sec |
| Raspberry Pi 4 / BCM2711* | archlinux / 6.1.61(armv7l) | 665 Mbits/sec |
Using armv7l Kernel we will get about 69% faster, WHY?
I searched on the web and found a thread that has the same confusion as me but on AES rather than chacha20 used by wg[1]. It might be the chacha20 implementation in the kernel is not optimized in aarch64. I want to leave the issue here to record any further investigation of this performance issue.
[1] https://forums.raspberrypi.com/viewtopic.php?t=317075
Based on the results, even the 2GHz Quad-Core A53 on TP-Link XDR 6088 can achieve 818 Mbits/sec. I doubt the Raspberry Pi 4's result of only 394 Mbits/sec is accurate as it has Quad-Core A72 @ 1.5GHz. Then, I switched back to the archlinuxarm-based PiKVM distro which my Raspberry PI 4 usually works on with armv7l kernel rather than aarch64 on Raspberry Pi OS, and ran the benchmark. Then, the result made me astonished.
Using armv7l Kernel we will get about 69% faster, WHY?
I searched on the web and found a thread that has the same confusion as me but on AES rather than chacha20 used by wg[1]. It might be the chacha20 implementation in the kernel is not optimized in aarch64. I want to leave the issue here to record any further investigation of this performance issue.
[1] https://forums.raspberrypi.com/viewtopic.php?t=317075