Measuring GPU Passthrough Overhead Using vfio-pci on AI Linux. I’m currently in the stage I’m assessing the impact of hypervisor on GPU performance. Does it make difference if I’m running AI workloads on Kubernetes or on hypervisor? It looks like the impact on performance of using GPU Passthrough seems negligable.
Abylay Ospan who is one of Kernel maintainers:
The performance impact of GPU passthrough via vfio-pci in AI Linux (Sbnb Linux) is impressively low-averaging around 1-2% across a range of LLM models. This makes it a highly viable option for running accelerated inference inside virtual machines, enabling isolation and flexibility without compromising performance.