Skip to main content

Proxmox GPU Passthrough: Configuring for AI/ML Workloads

The rapid adoption of AI/ML workloads has driven explosive demand for GPU computing. Proxmox VE's GPU passthrough capability lets virtual machines access physical GPUs directly, delivering near-bare-metal performance for LLM inference, model training, and image generation. This article provides a detailed walkthrough from IOMMU/VFIO configuration to running AI workloads.

For running AI/ML infrastructure on Kubernetes, Kubo On-Premise provides a fully managed K8s environment including GPU node management, with automated GPU resource scheduling and operations.

GPU Passthrough Fundamentals

GPU passthrough is a virtualization technique that provides VMs with direct access to physical GPUs through VFIO (Virtual Function io), minimizing hypervisor overhead.

Understanding IOMMU

IOMMU (Input-Output Memory Management Unit) is the hardware component that translates device addresses, creating isolated "lanes" for each device. This is the prerequisite for passthrough.

  • intel: vt-d (virtualization technology for directed i-o)
  • AMD: AMD-Vi (AMD io Virtualization Technology)

Supported GPUs

CategoryGPUVRAMRecommended Use
EntryRTX 3060 / RTX 40608-12 GBSmall-scale LLM inference
Mid-rangeRTX 3080 / RTX 4060 Ti10-16 GBMedium models, fine-tuning
High-endRTX 3090 / RTX 409024 GBLarge LLMs, multi-model
DatacenterNVIDIA P40 / A10024-80 GBProduction AI workloads

The Proxmox forum GPU passthrough guide notes that NVIDIA GPUs have the broadest support across AI stacks. AMD GPUs require ROCm but work with Ollama on supported distributions.

BIOS and Proxmox Host Configuration

BIOS Settings

Verify the following BIOS settings:

  1. Disable Secure Boot: Eliminates certificate registration requirements
  2. Enable VT-d (Intel) / IOMMU (AMD): Activate in the motherboard BIOS menus
  3. Update firmware to latest version: Improves passthrough compatibility

Kernel Parameters

Add IOMMU parameters to the GRUB command line:

bash
# Edit /etc/default/grub
# For Intel CPUs
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

# For AMD CPUs
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

# Update GRUB and reboot
update-grub
reboot

The iommu=pt (passthrough mode) tells the kernel to engage the IOMMU only for devices being passed through, improving overall performance.

After reboot, verify IOMMU is enabled:

bash
dmesg | grep -e DMAR -e IOMMU
# Should display "DMAR: IOMMU enabled" or similar message

VFIO Module Configuration

Load VFIO modules and blacklist host GPU drivers:

bash
# Add VFIO modules to /etc/modules
cat >> /etc/modules <<EOF
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
EOF

# Blacklist host GPU drivers
cat > /etc/modprobe.d/blacklist-gpu.conf <<EOF
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist snd_hda_intel
EOF

Binding the GPU to VFIO

Identify the GPU's PCI address and vendor IDs, then bind to the VFIO driver:

bash
# Find GPU PCI address
lspci -v | grep -i nvidia
# Example: 01:00.0 VGA compatible controller: NVIDIA Corporation ...
# Example: 01:00.1 Audio device: NVIDIA Corporation ...

# Get vendor IDs
lspci -n -s 01:00
# Example: 01:00.0 0300: 10de:2684 (rev a1)
# Example: 01:00.1 0403: 10de:22ba (rev a1)

# Bind GPU to VFIO
cat > /etc/modprobe.d/vfio.conf <<EOF
options vfio-pci ids=10de:2684,10de:22ba disable_vga=1
softdep nvidia pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
EOF

# Update initramfs and reboot
update-initramfs -u -k all
reboot

Verify the binding after reboot:

bash
lspci -nnk -s 01:00
# Should show "Kernel driver in use: vfio-pci"

With Kubo, integrating GPU nodes into Kubernetes and automating GPU resource scheduling via NVIDIA Device Plugin becomes straightforward.

Creating the VM and Attaching the GPU

VM Configuration

Create a VM optimized for GPU passthrough:

bash
# Create VM via Web UI or CLI
qm create 200 --name ai-workstation --memory 32768 --cores 8 \
  --machine q35 --bios ovmf \
  --cpu host \
  --scsihw virtio-scsi-single \
  --net0 virtio,bridge=vmbr0

# Add EFI disk
qm set 200 --efidisk0 local-lvm:1,efitype=4m

# Add OS disk
qm set 200 --scsi0 local-lvm:100,ssd=1,discard=on

Critical settings:

SettingValueReason
Machine typeq35Required for PCIe passthrough
BIOSOVMF (UEFI)Proper PCIe device initialization
CPU typehostFull CPU feature exposure
SCSI ControllerVirtIO SCSIMaximum disk performance

GPU Passthrough Configuration

Add the GPU device to the VM configuration:

bash
# Add to /etc/pve/qemu-server/200.conf
# Pass through GPU (VGA + Audio)
hostpci0: 01:00,pcie=1,x-vga=on

# Or via Web UI:
# Hardware → Add → PCI Device → Select GPU
# Check "All Functions" and "PCI-Express"

Installing NVIDIA Drivers in the Guest

Install NVIDIA drivers and CUDA toolkit inside the VM:

bash
# For Ubuntu 24.04
# Install NVIDIA driver
sudo apt update
sudo apt install -y nvidia-driver-560

# Install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-12-6

# Verify GPU operation
nvidia-smi

Running AI/ML Workloads

LLM Inference with Ollama

Ollama is the easiest way to run LLMs in a GPU passthrough environment:

bash
# Install Ollama
curl -fsSL https://ollama.com/install | sh

# Verify service status
systemctl status ollama

# Download and run a model
ollama pull llama3.1:70b
ollama run llama3.1:70b

# API access (default: port 11434)
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:70b",
  "prompt": "Explain GPU passthrough in Proxmox"
}'

High-Performance Inference with vLLM

For production environments requiring high throughput, vLLM is the optimal choice:

bash
# Install vLLM
pip install vllm

# Start OpenAI-compatible API server
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-70B-Instruct \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.9 \
  --port 8000

Docker + NVIDIA Container Toolkit

For container-based AI workloads, use the NVIDIA Container Toolkit:

bash
# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
# Add repository and install
sudo apt install -y nvidia-container-toolkit

# Run GPU-accelerated container
docker run --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi

For running AI workloads at scale on Kubernetes, Kubo On-Premise provides a fully managed environment with automatic NVIDIA Device Plugin configuration, GPU resource quotas, and multi-tenant GPU sharing.

Troubleshooting

Common Issues and Solutions

IssueCauseSolution
IOMMU groups not isolatedNo ACS supportAdd pcie_acs_override=downstream,multifunction to GRUB
Out of memory on VM startPCIe memory pinningReduce VM RAM, configure hugepages
NVIDIA Code 43 errorDriver detects virtual environmentVerify cpu: host, use latest drivers
GPU reset issuesCommon with AMD GPUsUse vendor-reset kernel module
Audio device conflictsGPU audio collisionBlacklist snd_hda_intel
bash
# Check IOMMU groups
for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/}; n=${n%%/*}
  printf 'IOMMU Group %s ' "$n"
  lspci -nns "${d##*/}"
done

The Proxmox Wiki PCI Passthrough guide contains detailed troubleshooting procedures.

Conclusion

Proxmox VE GPU passthrough delivers near-bare-metal GPU performance for AI/ML workloads when IOMMU and VFIO are properly configured. Modern hardware and driver maturity have significantly simplified the setup process.

For running GPU-accelerated AI workloads at production scale on Kubernetes, Kubo On-Premise is the optimal platform. Combine Proxmox GPU passthrough with Kubo's fully managed K8s to build scalable AI infrastructure.

For consultation on GPU passthrough environment design, contact us to discuss your requirements.

Related Links:

← Back to all posts