Proxmox GPU Passthrough: Configuring for AI/ML Workloads

The rapid adoption of AI/ML workloads has driven explosive demand for GPU computing. Proxmox VE's GPU passthrough capability lets virtual machines access physical GPUs directly, delivering near-bare-metal performance for LLM inference, model training, and image generation. This article provides a detailed walkthrough from IOMMU/VFIO configuration to running AI workloads.

For running AI/ML infrastructure on Kubernetes, Kubo On-Premise provides a fully managed K8s environment including GPU node management, with automated GPU resource scheduling and operations.

GPU Passthrough Fundamentals

GPU passthrough is a virtualization technique that provides VMs with direct access to physical GPUs through VFIO (Virtual Function io), minimizing hypervisor overhead.

Understanding IOMMU

IOMMU (Input-Output Memory Management Unit) is the hardware component that translates device addresses, creating isolated "lanes" for each device. This is the prerequisite for passthrough.

intel: vt-d (virtualization technology for directed i-o)
AMD: AMD-Vi (AMD io Virtualization Technology)

Supported GPUs

Category	GPU	VRAM	Recommended Use
Entry	RTX 3060 / RTX 4060	8-12 GB	Small-scale LLM inference
Mid-range	RTX 3080 / RTX 4060 Ti	10-16 GB	Medium models, fine-tuning
High-end	RTX 3090 / RTX 4090	24 GB	Large LLMs, multi-model
Datacenter	NVIDIA P40 / A100	24-80 GB	Production AI workloads

The Proxmox forum GPU passthrough guide notes that NVIDIA GPUs have the broadest support across AI stacks. AMD GPUs require ROCm but work with Ollama on supported distributions.

BIOS and Proxmox Host Configuration

BIOS Settings

Verify the following BIOS settings:

Disable Secure Boot: Eliminates certificate registration requirements
Enable VT-d (Intel) / IOMMU (AMD): Activate in the motherboard BIOS menus
Update firmware to latest version: Improves passthrough compatibility

Kernel Parameters

Add IOMMU parameters to the GRUB command line:

bash

# Edit /etc/default/grub
# For Intel CPUs
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

# For AMD CPUs
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

# Update GRUB and reboot
update-grub
reboot

The iommu=pt (passthrough mode) tells the kernel to engage the IOMMU only for devices being passed through, improving overall performance.

After reboot, verify IOMMU is enabled:

bash

dmesg | grep -e DMAR -e IOMMU
# Should display "DMAR: IOMMU enabled" or similar message

VFIO Module Configuration

Load VFIO modules and blacklist host GPU drivers:

bash

# Add VFIO modules to /etc/modules
cat >> /etc/modules <<EOF
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
EOF

# Blacklist host GPU drivers
cat > /etc/modprobe.d/blacklist-gpu.conf <<EOF
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist snd_hda_intel
EOF

Binding the GPU to VFIO

Identify the GPU's PCI address and vendor IDs, then bind to the VFIO driver:

bash

# Find GPU PCI address
lspci -v | grep -i nvidia
# Example: 01:00.0 VGA compatible controller: NVIDIA Corporation ...
# Example: 01:00.1 Audio device: NVIDIA Corporation ...

# Get vendor IDs
lspci -n -s 01:00
# Example: 01:00.0 0300: 10de:2684 (rev a1)
# Example: 01:00.1 0403: 10de:22ba (rev a1)

# Bind GPU to VFIO
cat > /etc/modprobe.d/vfio.conf <<EOF
options vfio-pci ids=10de:2684,10de:22ba disable_vga=1
softdep nvidia pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
EOF

# Update initramfs and reboot
update-initramfs -u -k all
reboot

Verify the binding after reboot:

bash

lspci -nnk -s 01:00
# Should show "Kernel driver in use: vfio-pci"

With Kubo, integrating GPU nodes into Kubernetes and automating GPU resource scheduling via NVIDIA Device Plugin becomes straightforward.

Creating the VM and Attaching the GPU

VM Configuration

Create a VM optimized for GPU passthrough:

bash

# Create VM via Web UI or CLI
qm create 200 --name ai-workstation --memory 32768 --cores 8 \
  --machine q35 --bios ovmf \
  --cpu host \
  --scsihw virtio-scsi-single \
  --net0 virtio,bridge=vmbr0

# Add EFI disk
qm set 200 --efidisk0 local-lvm:1,efitype=4m

# Add OS disk
qm set 200 --scsi0 local-lvm:100,ssd=1,discard=on

Critical settings:

Setting	Value	Reason
Machine type	q35	Required for PCIe passthrough
BIOS	OVMF (UEFI)	Proper PCIe device initialization
CPU type	host	Full CPU feature exposure
SCSI Controller	VirtIO SCSI	Maximum disk performance

GPU Passthrough Configuration

Add the GPU device to the VM configuration:

bash

# Add to /etc/pve/qemu-server/200.conf
# Pass through GPU (VGA + Audio)
hostpci0: 01:00,pcie=1,x-vga=on

# Or via Web UI:
# Hardware → Add → PCI Device → Select GPU
# Check "All Functions" and "PCI-Express"

Installing NVIDIA Drivers in the Guest

Install NVIDIA drivers and CUDA toolkit inside the VM:

bash

# For Ubuntu 24.04
# Install NVIDIA driver
sudo apt update
sudo apt install -y nvidia-driver-560

# Install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-12-6

# Verify GPU operation
nvidia-smi

Running AI/ML Workloads

LLM Inference with Ollama

Ollama is the easiest way to run LLMs in a GPU passthrough environment:

bash

# Install Ollama
curl -fsSL https://ollama.com/install | sh

# Verify service status
systemctl status ollama

# Download and run a model
ollama pull llama3.1:70b
ollama run llama3.1:70b

# API access (default: port 11434)
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:70b",
  "prompt": "Explain GPU passthrough in Proxmox"
}'

High-Performance Inference with vLLM

For production environments requiring high throughput, vLLM is the optimal choice:

bash

# Install vLLM
pip install vllm

# Start OpenAI-compatible API server
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-70B-Instruct \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.9 \
  --port 8000

Docker + NVIDIA Container Toolkit

For container-based AI workloads, use the NVIDIA Container Toolkit:

bash

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
# Add repository and install
sudo apt install -y nvidia-container-toolkit

# Run GPU-accelerated container
docker run --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi

For running AI workloads at scale on Kubernetes, Kubo On-Premise provides a fully managed environment with automatic NVIDIA Device Plugin configuration, GPU resource quotas, and multi-tenant GPU sharing.

Troubleshooting

Common Issues and Solutions

Issue	Cause	Solution
IOMMU groups not isolated	No ACS support	Add `pcie_acs_override=downstream,multifunction` to GRUB
Out of memory on VM start	PCIe memory pinning	Reduce VM RAM, configure hugepages
NVIDIA Code 43 error	Driver detects virtual environment	Verify `cpu: host`, use latest drivers
GPU reset issues	Common with AMD GPUs	Use `vendor-reset` kernel module
Audio device conflicts	GPU audio collision	Blacklist `snd_hda_intel`

bash

# Check IOMMU groups
for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/}; n=${n%%/*}
  printf 'IOMMU Group %s ' "$n"
  lspci -nns "${d##*/}"
done

The Proxmox Wiki PCI Passthrough guide contains detailed troubleshooting procedures.

Conclusion

Proxmox VE GPU passthrough delivers near-bare-metal GPU performance for AI/ML workloads when IOMMU and VFIO are properly configured. Modern hardware and driver maturity have significantly simplified the setup process.

For running GPU-accelerated AI workloads at production scale on Kubernetes, Kubo On-Premise is the optimal platform. Combine Proxmox GPU passthrough with Kubo's fully managed K8s to build scalable AI infrastructure.

For consultation on GPU passthrough environment design, contact us to discuss your requirements.

Related Links:

Proxmox GPU Passthrough: Configuring for AI/ML Workloads

GPU Passthrough Fundamentals

Understanding IOMMU

Supported GPUs

BIOS and Proxmox Host Configuration

BIOS Settings

Kernel Parameters

VFIO Module Configuration

Binding the GPU to VFIO

Creating the VM and Attaching the GPU

VM Configuration

GPU Passthrough Configuration

Installing NVIDIA Drivers in the Guest

Running AI/ML Workloads

LLM Inference with Ollama

High-Performance Inference with vLLM

Docker + NVIDIA Container Toolkit

Troubleshooting

Common Issues and Solutions

Conclusion

Related articles