hello friends! new(ish)!

PCI passthrough: Difference between revisions

From InstallGentoo Wiki v2
Jump to navigation Jump to search
>Echelon1
>Echelon1
Line 169: Line 169:
Now that we have a VM with a functional GPU over PCI passthrough, we will now piece together a VM.
Now that we have a VM with a functional GPU over PCI passthrough, we will now piece together a VM.


Start out by creating the HD image. When creating your image, consider the disk space your games will take, add at least 10 GB to account for temporary installation files, and then add another 10 GB for Windows alone. For the sake of this tutorial we will be building a minimal testing environment with only Windows 7 and League of Legends, which combined take up about 20 GB of storage.
Start out by creating the HD image. When creating your image, consider the disk space your games will take, add at least 25 GB to account for temporary installation files, and then add another 40 GB for Windows alone. For the sake of this tutorial we will be building a minimal testing environment with only Windows 7 and League of Legends, which combined take up about 20 GB of storage.


You have two choices when creating a disk image: a raw image, which is a simple bit-by-bit representation of an actual HD's contents, or a qcow2 image, which is a compact image format that only contains non-blank sectors (blank here means containing all binary zeroes) and therefore takes up less disk space, plus some cool features like snapshots or compression, all of that at the expense of CPU overhead when performing disk I/O as the system must figure out on which byte of the file a given sector actually is.
You have two choices when creating a disk image: a raw image, which is a simple bit-by-bit representation of an actual HD's contents, or a qcow2 image, which is a compact image format that only contains non-blank sectors (blank here means containing all binary zeroes) and therefore takes up less disk space, plus some cool features like snapshots or compression, all of that at the expense of CPU overhead when performing disk I/O as the system must figure out on which byte of the file a given sector actually is.

Revision as of 18:47, 4 March 2020

PCI passthrough is a technology that allows you to directly present an internal PCI device to a virtual machine. The device acts as if it were directly driven by the VM, and the VM detects the PCI device as if it were physically connected. PCI passthrough is also often known as IOMMU, although this is a bit of a misnomer, since the IOMMU is the hardware technology that provides this feature but also provides other features such as some protection from DMA attacks or ability to address 64-bit memory spaces with 32-bit addresses.

As you can imagine, the most common application for PCI passthrough at least on the chansphere is vidya, since PCI passthrough allows a VM direct access to your graphics card with the end result of being able to play games with nearly the same performance as if you were running your game directly on your computer. PCI passthrough, however, has many other applications: for example, it lets you use a Digium phone card on an Asterisk VOIP server contained within a VM, if you have a PCI RAID controller not supported by a propietary VM hypervisor like VMware ESX you can pass it through to your VM, or it can let you use a high-end audio card for professional audio production that only has drivers for Windows and Mac OS X.

Intended audience

The following guide is intended for people with at least a working knowledge of Linux configuration using the command line. If you're capable of installing Arch Linux and setting up a desktop system, or if you're capable of setting up a web, mail, DNS/DHCP/LDAP/SMB or a business application server you should be capable of following through this guide without too much trouble. If for whatever reason you can't stray away from the graphical utilities of Linux Mint or must rely on Ubuntu to make your system work because everything there Just Werks™, this guide is probably beyond your capabilities.

On top of the previous assumptions, the following specific areas of knowledge will greatly help you to understand this guide:

  • Virtual machines: Being familiar with at least Oracle VM VirtualBox, and having a basic understanding of virtualization on AMD64, virtual machines, simulated devices and hypervisors will go a long way in helping you on this tutorial.
  • Intermediate-level configuration tasks: You will have to load kernel modules, modify your bootloader's kernel boot parameters, manually add a file to your system's directories, and understand the output of lspci and lsusb.

Why PCI passthrough?

  1. It lets you ditch Windows while letting you play vidya safely. PCI passthrough is your big fat middle finger against Macrohard and its recent quest to aggressively strong-arm the entire world into surrendering to the privacy-raping botnet that is Windows 10. A lot of free software enthusiasts have felt the need to leave videogames behind on account of not being libre and try as hard as possible to justify themselves with "I've outgrown them" or "Videogames are empty entertainment". Others have to begrudgingly dual boot between Windows for gaming and Linux for everything else. With PCI passthrough, you don't need to do any of that -- you will be running Windows 10 safely sealed in a VM with simulated hardware and no access (in theory) to the rest of your system, keeping your Linux and your hardware protected from NSA backdoors and from Microsoft's indiscriminate mass surveillance campaign.
  2. It protects you from GPU malware. Guess what? It is possible to run nearly undetectable malware on your graphics card. But if your graphics card is facing a VM, the compromise will be limited to your VM, keeping your physical OS safe.
  3. You can actually encrypt your Windows with it. The old days of safe encrypted Windows installations with Truecrypt ended with 64-bit Windows 7, which requires one boot partition and one system partition; now, you have to entrust yourself to Microsoft BitLocker, which is shit crypto by virtue of being tightly closed source and inauditable. But... if you use Linux as physical OS and keep your Windows on a VM, you will be able to profit from the tried-and-known-good solutions that are VeraCrypt and cryptsetup, and Microsoft BitLocker is not going to be a problem as your physical Linux is going to be the one that provides full disk encryption.
  4. It helps you keep a completely libre operating system. Device drivers are usually the very first place where a strictly libre Linux starts getting tainted with commercial code. But if you delegate the one or two devices that have no libre drivers to a VM through PCI passthrough, that's a whole 'nother story.
  5. Some niche applications require it. For example, if you want to use a MIDI sequencer or other kinds of advanced audio production hardware, you only have two choices: you dual boot between Windows and Linux, or you present this hardware to your VM through PCI passthrough. Or if you want to run an Asterisk phone exchange with Digium land-line cards, it is a very good idea to run it on a VM with PCI passthrough in order to isolate your server from the rest of your system.
  6. You can block Microsoft's botnet servers without having to spend on a separate router. Simply block them on your host system and your Windows guest will never be able to send anything to Microsoft.
  7. It's the closest thing there is to an entirely libre computer. Your Windows instance will be running on simulated hardware powered by open source, auditable code, and everything your Windows will see except for the passed-through devices will be libre. It's still not enough for an entirely libre computer, but it's at least a layer of freedom between a propietary OS and propietary hardware.

Prerequisites

  • A CPU that supports Intel VT-d or AMD-Vi. Check your CPU datasheet to confirm this. Just about any CPU made after 2010 will support it, but there's always the oddball el cheapo CPU that doesn't.
  • A motherboard that supports the aforementioned technologies. To find this out, check in your motherboard's BIOS configuration for an option to enable IOMMU or something similar. Chances are that your motherboard will support it if it's from 2013 or newer, but make sure to check since this is a niche technology and some manufacturers may save costs by axing it from their motherboards or delivering a defective implementation (such as Gigabyte's 2015-2016 series) simply because NORPs never use it.
  • At least two GPUs: one for your physical OS, another for your VM. (You can in theory run your computer headless through SSH or a serial console, but it might not work and you risk locking yourself away from your computer if you do so).
  • A Linux distribution with recent-ish packages. This means Debian and CentOS Stable will probably not work, as they run very old kernels and even older versions of QEMU that might have buggy, partial or broken PCI passthrough support. If you have a "stable" distro and PCI passthrough doesn't work for you, try switching your package manager to the "testing" branch before you try anything else.
  • Optional but recommended: a KVM switch. If your monitor has only one single input, you'll find yourself switching a lot between your physical OS's GPU and your passed-through GPU.
  • Possibly required: an extra mouse and keyboard. Depending on your KVM switch, you might have to get another keyboard for your physical OS, and another mouse if you intend to run graphical sessions on your physical OS. Due to the great variety of KVM switches out there (mine, for example, has PS/2 mouse and keyboard inputs and draws power from these connectors), you'll probably be able to determine what you need only after you have bought your switch.

Step 0: Compile IOMMU support if you use Gentoo

If you run /g/entoo, the distro where everything Just Never Works and you have to set it up on your own, you'll have to compile IOMMU support into your kernel. You will need to set up the following options:

  1. CONFIG_IOMMU_SUPPORT (Device Drivers -> IOMMU Hardware Support)
  2. The following options if you have an AMD CPU:
    1. CONFIG_AMD_IOMMU (Device Drivers -> IOMMU Hardware Support -> AMD IOMMU Support)
    2. CONFIG_AMD_IOMMU_V2 (Device Drivers -> IOMMU Hardware Support -> AMD IOMMU Support -> AMD IOMMU Version 2 driver)
  3. The following options if you have an Intel CPU:
    1. CONFIG_INTEL_IOMMU (Device Drivers -> IOMMU Hardware Support -> Support for Intel IOMMU using DMA Remapping Devices)
    2. CONFIG_INTEL_IOMMU_SVM (Device Drivers -> IOMMU Hardware Support -> Support for Intel IOMMU using DMA Remapping Devices -> Support for Shared Virtual Memory with Intel IOMMU)
  4. CONFIG_IRQ_REMAP (Device Drivers -> IOMMU Hardware Support -> Support for Interrupt Remapping)
  5. CONFIG_VFIO (Device Drivers -> VFIO Non-Privileged userspace driver framework)
  6. CONFIG_VFIO_PCI (Device Drivers -> VFIO Non-Privileged userspace driver framework -> VFIO support for PCI devices)
  7. CONFIG_VFIO_PCI_VGA (Device Drivers -> VFIO Non-Privileged userspace driver framework -> VFIO support for PCI devices -> VFIO PCI support for VGA devices)

Make sure you have these options enabled, rebuild your kernel with your favorite method (plain normal make or a nice and easy genkernel, doesn't matter), but don't reboot yet. You will do that on the next step. (Note: you'll probably want to postpone the build for the next step if you use an EFI stub kernel, because we're going to change the kernel command line which must be hard-coded in that kind of kernel).

Step 1: Check for IOMMU support, enable if you don't have it

Start out by modifying the kernel command line on your bootloader to enable IOMMU, and rebuild your kernel if you build it as an EFI stub. For this, you need two parameters: iommu=on, and then amd_iommu=on or intel_iommu=on depending on whether you have an AMD or Intel CPU. Your kernel command line should look a bit like this:

linux /vmlinuz-4.6.0-1-amd64 root=UUID=XYZUVWIJK ro quiet iommu=on amd_iommu=on

Reboot your system, and check if AMD-Vi or Intel VT-d were enabled by checking your kernel log. On AMD, use grep AMD-Vi; on Intel, use grep -e DMAR -e IOMMU':

# dmesg | grep AMD-Vi
[    0.953668] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    0.953669] AMD-Vi:  Extended features:  PreF PPR GT IA
[    0.953672] AMD-Vi: Interrupt remapping enabled
[    0.953768] AMD-Vi: Lazy IO/TLB flushing enabled

Gentoo users: It is possible that you might be missing a kernel configuration option that prevents the kernel from confirming that IOMMU is up and running. If you have already enabled all the kernel options described above and you still can't see anything about IOMMU or VT-d on the kernel logs, try looking for your platform's IOMMU driver running on a kernel-space process:

# ps -ef | grep -i iommu
root        66     2  0 04:19 ?        00:00:00 [amd_iommu_v2]

Step 2: Find out your IOMMU groups

Your devices will be organized into IOMMU groups, which are the smallest sets of physical devices that can be passed to a VM and depend on how your motherboard is wired and organized. For example, if you want direct access to your motherboard's audio chip, but it's on the same IOMMU group as your IDE controller and your SMBus (the one that provides access to thermometers, voltage sensors, fan speedometers and so on), you're going to have to give up your audio chip, your IDE controller and your SMBus all combined.

To check this info, use this command:

$ for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d); do echo "IOMMU group $(basename "$iommu_group")"; for device in $(ls -1 "$iommu_group"/devices/); do echo -n $'\t'; lspci -nns "$device"; done; done
IOMMU group 1
        00:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Root Port [1022:1412]
        01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
        01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98]
IOMMU group 7
        00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:780b] (rev 14)
        00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH IDE Controller [1022:780c]
        00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller [1022:780d] (rev 01)
        00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:780e] (rev 11)

As shown here, we're not exactly fine with our IOMMU group 7. That group has tons of devices apart from the audio chip, which means that if we want direct access audio on a VM, our physical OS is going to have to give up the audio chip along with many other devices -- fortunately, in our case this is probably going to be a minor nuisance since these devices happen to be rarely used nowadays, they're the IDE controller (i.e. no more old DVD-RW drive for you), the SMBus sensor access chip, and the ISA bridge (which is obsolete since like 1998). As for IOMMU group 1, we're more or less fine. As shown on the Arch Linux guide on PCI passthrough, we have a PCI bridge on the same IOMMU group as our graphics card and HDMI audio output. This means our PCI slot is provided by both the PCH (the successor of the north/south bridge) and the CPU, which in turn means it is possible for other devices apart from our GPU to be on that IOMMU port. This is OK in our case because we were lucky to have only the GPU, its HDMI audio and the PCI bridge on that group, but you're going to have to be wary to not tell the VM to grab your PCI bridge.

Step 3: Block access on your physical OS to the GPU

Now that we have identified the IOMMU group where our GPU lives, we will now prevent your OS from letting the display driver gain access to the GPU by binding it to a placeholder driver. By far the easiest way to do so is with vfio-pci, which is a modern PCI passthrough driver designed to pretty much Just Work out of the box with minimal configuration. You can use pci-stub if you want to (or if your kernel is older than 4.1), but you might have a hard time getting it to work.

Start out by checking if you have vfio-pci on your system:

# modinfo vfio-pci
filename:       /lib/modules/4.6.0-1-amd64/kernel/drivers/vfio/pci/vfio-pci.ko
description:    VFIO PCI - User Level meta-driver
author:         Alex Williamson <alex.williamson@redhat.com>
license:        GPL v2
version:        0.2
srcversion:     E7D052C136278ABB60D003E
depends:        vfio,irqbypass,vfio_virqfd
intree:         Y
vermagic:       4.6.0-1-amd64 SMP mod_unload modversions
parm:           ids:Initial PCI IDs to add to the vfio driver, format is "vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]" and multiple comma separated entries can be specified (string)
parm:           nointxmask:Disable support for PCI 2.3 style INTx masking.  If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag. (bool)
parm:           disable_vga:Disable VGA resource access through vfio-pci (bool)
parm:           disable_idle_d3:Disable using the PCI D3 low power state for idle, unused devices (bool)

We do have vfio-pci, so now we will tell it to isolate our GPU and HDMI audio from our physical OS. You can use configuration file /etc/modprobe.d/vfio.conf, but in my case I prefer to use a little script I found on an Arch Linux Forum post. Save this script under /usr/bin/vfio-bind, and make it executable with chmod 755 /usr/bin/vfio-bind:

/usr/bin/vfio-bind
#!/bin/bash
modprobe vfio-pci
for dev in "$@"; do
        vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
        device=$(cat /sys/bus/pci/devices/$dev/device)
        if [ -e /sys/bus/pci/devices/$dev/driver ]; then
                echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
        fi
        echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done

Return to your IOMMU groups and take note of the device number of your GPU and HDMI audio, which in this case are 01:00.0 (the GPU) and 01:00.1 (the HDMI audio), and call that script as follows:

vfio-bind 0000:[device number]

In this case:

vfio-bind 0000:01:00.0 0000:01:00.1

Step 4: Set up your VM

We're finally done configuring our physical OS, so the next step is to set up a VM that takes the GPU. Depending on your GPU you might have to use the SeaBIOS firmware (good ol' IBM PC BIOS) or the OVMF firmware (a brand new and shiny libre UEFI BIOS).

For that, the best suggestion is to be a man, break away from the coziness of virt-manager and libvirt, and call QEMU directly from the command line, because some commands are not supported by libvirt or are difficult or complex to set up as a libvirt VM definition. Don't worry, QEMU's parameters are rather straightforward and you pretty much just need to properly indent and space them to make sense out of them.

Start out by plugging your monitor to your passed-through GPU and summoning a VM as follows:

qemu-system-x86_64 -enable-kvm -m 512 -cpu host,kvm=off \
-smp <number of virtual CPUs>,sockets=1,cores=<number of CPU cores>,threads=<2 if your Intel or AMD Ryzen CPU has HyperThreading, 1 otherwise> \
-device vfio-pci,host=<device number of your GPU>,x-vga=on -device vfio-pci,host=<device number of your HDMI output, omit this entire section if you don't have it> \
-vga none

Filling in the values for our system:

qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-vga none

Or if you want to use OVMF:

qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/ovmf_code_x64.bin \
-drive if=pflash,format=raw,file=/usr/share/ovmf/x64/ovmf_vars_x64.bin \
-vga none


Explanation:

  • kvm=off: This parameter hides the KVM hypervisor signature. Nvidia's display drivers don't play nicely with hypervisors and can cause weird, cryptic errors if they find out they're running on a VM, because Nvidia is the Apple of GPUs and makes normalfag cards for normalfags with normalfag setups and Windows 10. However, it turns out that these drivers, as of July 2016, just check for a hypervisor signature and that's it, so hiding it should do the trick.
  • x-vga=on: Required for VGA assignment.
  • -vga none: Also required for VGA assignment. This disables QEMU's simulated VGA device, the one that receives graphical output and passes it on to a VNC/Spice server so you can open it with virt-manager.

Now prepare your VM launch command on a console, plug your monitor to your passed-through GPU, and hit Enter. If you see BIOS output on your screen and a message that says the BIOS couldn't find a bootable medium (this is expected since our VM doesn't have any storage media), you just finished setting up GPU passthrough on your VM and that BIOS output you're seeing is your VM assuming direct control over your GPU.

Step 5: Piece together a VM suitable for vidya

Gentoo users: Before proceeding with this step, check to see if you have compiled QEMU with USE="usb". Otherwise you will not have the USB passthrough function and you will therefore be unable to assign an USB keyboard and mouse to your VM. Notice that this flag is not the same as usbredir -- the usb flag means USB passthrough, whereas the usbredir flag means supporting a namesake tool that allows you to transmit USB I/O to a remote VM over TCP/IP.

Now that we have a VM with a functional GPU over PCI passthrough, we will now piece together a VM.

Start out by creating the HD image. When creating your image, consider the disk space your games will take, add at least 25 GB to account for temporary installation files, and then add another 40 GB for Windows alone. For the sake of this tutorial we will be building a minimal testing environment with only Windows 7 and League of Legends, which combined take up about 20 GB of storage.

You have two choices when creating a disk image: a raw image, which is a simple bit-by-bit representation of an actual HD's contents, or a qcow2 image, which is a compact image format that only contains non-blank sectors (blank here means containing all binary zeroes) and therefore takes up less disk space, plus some cool features like snapshots or compression, all of that at the expense of CPU overhead when performing disk I/O as the system must figure out on which byte of the file a given sector actually is.

Create your qcow2 disk image as follows:

qemu-img create -f qcow2 /root/IOMMU.qcow2 20G

Or if you want to create a raw image:

dd if=/dev/zero of=/root/IOMMU.img bs=1G count=0 seek=20

Now we piece together a VM with some basic hardware

qemu-system-x86_64 -enable-kvm -m <MB of RAM you want to use> -cpu host,kvm=off \
-smp 2,sockets=1,cores=2,threads=1 \
 \
-device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \
-vga none \
 \
-drive file=/root/IOMMU.qcow2,id=disk,format=qcow2,if=none \
-device ide-hd,bus=ide.1,drive=disk \
-boot order=dc \
 \
-soundhw hda \
 \
-drive file=/home/niconiconii/torrents/WINDOWS-7-SDLG-EDITION.100%REALNOFAKE.FULL.CRACK-MEDICINA-PIRATA-KEYGEN.1ZELDA-INTERCAMBIABLE.FULL-NO-RIP.+TUTORIAL-LOQUENDO.iso,id=virtiocd,if=none \
-device ide-cd,bus=ide.1,drive=virtiocd \

Explanation:

  • -m: Worth mentioning to make sure you choose the right amount of RAM for your VM. Windows Vista and later eat at least 1 GB of it. However, if you plan to divide your computer usage between vidya and a Linux desktop session, you won't be able to use all your system's RAM.
  • -drive file=...,id=XYZ,format=qcow2: To define a logical storage you must first define the location of the drive's image file, its format, and assign it a name.
  • -device ide-hd,bus=ide.1,drive=XYZ: This is where you assign the image file define before to a storage hardware. On Windows it is a good idea to use IDE HDs, because Windows doesn't really likes SCSI storage hardware and might do anything from refusing to install without a virtio SCSI driver disc or throwing a BSOD while booting due to lack of virtio SCSI driver.
  • -device ide-cd: Same as before, but with a CD drive.
  • -boot order=dc: Indicates the boot order via a nomenclature based off traditional drive letters on Windows. As configured here, we're scanning for bootable media first on the CD drive, which is drive "d" (in Windows it's usually D:\), then on the HD, which is drive "c" (in Windows it's usually C:\). This boot order has the advantage that if you assign a virtual CD drive, e.g. to install Windows or use a partitioning tool, it will load the CD instead of the HD, but if you remove the CD you'll only have the HD and your system will therefore start the HD. Of course, you can replace this predefined boot order with a menu that will default to booting from the hard drive, in which case you use boot menu=on instead.
  • -soundhw hda: Adds a simulated sound chip that outputs your computer's sound through your physical OS's audio system. If you have issues with it (which is rather likely given just how difficult it is to deal with audio on Linux), consider also giving your VM access to your sound chip over PCI passithrough.

Now that we have our gaming VM, you can now begin installing Windows. We will assume that if you've gotten this far through the tutorial you have the skills required to install Windows and therefore don't need any explanation. Just don't panic if your display appears in funky 16 color 640x480 mode: if it does so, it's most likely because either Windows or your Linux kernel can't into your video card's legacy VGA addresses, and chances are it will work normally once you install your video driver.

Once you have Windows installed, get a DirectX game with 3D graphics (such as League of Legends) and test how it works. Your game should work without a single hitch and with little or no performance difference between your setup and a plain ordinary physically installed Windows. To make yourself an idea, with some basic tuning Overwatch runs at 90 FPS on my AMD FX 8350 with a passed-through Radeon RX 480 using Ultra settings, about 90% of the framerate I believe I'd get if I run it natively on my system. If it does, then congratulations -- you have finally broken your vidya free from the botnet.

Do note that it is a good idea to gradually sunset your games on Windows instead of suddenly migrating everything to your VM, e.g. start out by playing League only once a day on your VM, then gradually start playing more on your VM and less on your physical OS, as the extra layers of complexity involved in a VM PCI passthrough scenario greatly increase the amount of points of failure and can cause unexpected issues.

Tuning

After you're done configuring your KVM gaming VM, you can try some system tweaks and adjustments to see if you get better performance.

Memory hugepages

Your programs don't access your memory directly. Instead, they address pages that amount to 4 KB of memory at a time, and these pages are translated by your CPU's MMU (the normal one, not the IOMMU) into actual physical addresses. Of course, this extra step makes memory lookups slower -- and when your program is an x86 VM that has yet another paged memory model, that can slow down your VM's performance quite a bit in some scenarios. To mitigate this, you can use a technology called hugepages, which replaces some of your 4 KB pages with large 2 MB pages and increases performance by cutting down on page lookups and amount of memory set aside for your system's page table. The downside of this is that you will have to manually remove your hugepages after stopping your VM, and that you'll have to reserve physical memory for your VM because hugepages cannot be swapped to hard disk or ZRAM. (Transparent hugepages supposedly address this limitation, but they have a history of causing more problems than solutions).

Gentoo users: Before using hugepages you need to compile the hugetlbfs into your kernel. To do so, on your Linux kernel configuration enable CONFIG_HUGETLBFS (File systems -> Pseudo filesystems -> HugeTLB file system support) and check that CONFIG_HUGETLB_PAGE is also enabled.

Start out by setting up your hugepages filesystem (note: that's hugetlbfs twice in the first command)

# mount -t hugetlbfs hugetlbfs /dev/hugepages
# sysctl vm.nr_hugepages=1024 # 1024 pages * 2 MB/page = 2 GB of hugepages

Then use your hugepages FS as memory location by adding this parameter to your KVM command line:

-mem-path /dev/hugepages

When you're done using your VM, remove your hugepages and unmount your hugepage FS:

# sysctl vm.nr_hugepages=0
# umount /dev/hugepages


CPU affinity pinning

Your VM's virtual CPU cores are represented by a standard POSIX thread fork()ed away from QEMU's main process. By default, the Linux kernel will dispatch each thread to whatever CPU core it feels like, and as a result your virtual CPU 0 can be running at one moment on your physical CPU 0, at another moment on your physical CPU 6, and so on. This is usually fine for ordinary processes, but not when it comes to gaming VMs, because as you might have guessed, this CPU change adds a very slight delay in the thread's execution flow... and it casually happens that VMs running videogames are real-time processor-intensive apps that suffer greatly with even a very slight processing delay.

While there is no way to completely prevent this issue without outright ditching the concept of multitasking, you can mitigate it by locking (pinning) your virtual CPU threads to each one of your physical CPU cores. This will prevent the operating system's scheduler from migrating your VM threads between CPU cores (it will only pause your VM's threads when dispatching another program to the same CPU core), which in turn will greatly increase the responsiveness and processing power of your VM -- as in, about 30 extra FPS on Overwatch at 1440x900 using an AMD FX 8350 and a Radeon RX 480. (With this technique you can also get a lot of stuff done with very little performance difference: if you have an 8 core CPU you can set aside one core to download stuff, two cores and your physical OS's USB 3.0 controller for a Windows VM running HDD Regenerator on two external hard drives, and 5 cores for gaming -- that's in fact what virtualization was invented for, to optimize resource utilization on IBM mainframes).

While libvirt supports CPU pinning right out of the box with minimal configuration required, if you don't use libvirt you can still easily perform CPU pinning with this little script here (note: you will have to install the Korn shell, because it uses the while read loop that saves standard input to an iterator variable which is specific to KSH):

#! /usr/bin/ksh
physicalcpu=$3;
ps -Leo tid,%cpu,args | grep qemu | grep $1 | grep -v grep | sort -k 2 -rn | sed "s/^ *\([0-9]*\).*/\1/" |
head -$2 | while read QEMU_pid; do
   if [[ $4 = "-n" ]]; then
      echo taskset -pc $physicalcpu $QEMU_pid;
   else
      taskset -pc $physicalcpu $QEMU_pid;
   fi;
   let physicalcpu+=1;
done;

Save this script under /usr/bin/qemu-cpupin, then call it as follows:

qemu-cpupin <VM identifier> <# of VM logical CPUs> <first physical CPU to pin> <-n for dry run>

Example: If you have a VM whose drive image file is /var/lib/libvirt/images/IOMMU.qcow2, which has 6 virtual CPU cores, the following command will show you how it would pin its 6 virtual CPU threads to your physical CPU cores 0, 1, 2, 3, 4 and 5:

qemu-cpupin /var/lib/libvirt/images/IOMMU.qcow2 6 0 -n

Explanation:

  • /var/lib/libvirt/images/IOMMU.qcow2 -- This is what we use to identify the VM whose virtual CPUs will be pinned. It can be anything that shows up on the command you used to summon QEMU, as long as you don't have any other running VM with the same string in its command line (it shows up on ps -ef); the QEMU option -name is supposed to fulfill this function, but your storage image file's full path is fine too, as it would be a bit of a violation of physics to have two different VMs simultaneously accessing the same image file.
  • 6 -- Your VM has 6 virtual CPU cores.
  • 0 -- You will start from physical CPU core 0, and work upwards to CPU core 5.
  • -n -- You know better than trusting something an internet stranger just pasted on a wiki, so you will do a dry run to confirm that your virtual CPUs will actually be pinned properly to your physical CPU cores.

Troubleshooting

If your setup doesn't work, or if it does but has some strange issues, do yourself a favor and save probably a lot of time and effort by checking these common points of failure:

  1. Does PCI passthough even work with your hardware? Your motherboard might not support it even if it claims it does. Don't blame yourself for having purchased incompatible hardware though: manufacturers often find it tempting to skimp on IOMMU support in the name of cost reduction just because only a few enthusiasts and professionals actually use it, and the only way of finding out is by doing quite a bit of homework before purchasing something. Gigabyte actually released back in late 2015 a motherboard series that turned out to be at best partially compatible with PCI passthrough, and don't expect your computer to have fully functional IOMMU support if you bought it at a big name store like Best Buy that sells computers designed under the assumption that their buyers are too unskilled to run anything other than a normal, ordinary setup of Windows 10.
  2. Are your kernel and KVM actually up to date? This means more than just issuing a system update on your package manager and calling it a day: you need to actually check your kernel and KVM version and see if they match with the most up to date versions as advertised on the Linux kernel and QEMU's web sites. This is important, because "stable" or "long-term support" distros like Debian, CentOS or Ubuntu LTS run Jurassic-age packages that can be literally two years behind the latest version in the name of overall stability. If you don't have the latest kernel or QEMU and your system is already fully up to date, you'll have to either switch your package manager to track your distro's "testing"/"unstable"/"UAT" release train and then upgrade your entire package tree (e.g. Debian Stretch as of June 2016), migrate your entire system to a more cutting-edge distro like Arch, or manually download, compile and side-load the latest version of QEMU published on the web site.
  3. Is the IOMMU enabled on your BIOS configuration? Some CPU features like IOMMU or even hardware-accelerated virtualization are disabled by default on your motherboard's BIOS just because normalfags never use them. Look in your BIOS configuration for something like "virtualization", "virtual I/O", "IOMMU" or similar terms and make sure you have them all enabled.
  4. Is your system accessing your GPU during boot before assigning it to your VM? Some strange issues may arise if your GPU receives video output before getting bound to vfio-pci (this often happens if you're passing through an AMD APU's video chip). If this is your case, try adding video=efifb:off to your kernel command line.
  5. If you're not using hugepages, is your system swapping memory? Having a VM's memory space paged to the HD can greatly hurt its performance even if you use an SSD, and it can do so in annoying ways like having choppy, stuttering and distorted audio output. If you find out your VM is having its core paged, you'll have to lower your VM's memory allocation or use hugepages.
  6. Are you using fancy vendor-specific GPU crap on your guest OS? Due to the business model of the video card industry, it's not uncommon to have your display drivers come with lots of vendor-specific bundleware. The Asus Radeon RX 480 DUAL card, for example, comes with an Asus GPUTweak utility that lets you tinker with your card's power and overclock levels. However, these tools often don't like QEMU's simulated hardware, and as a result they can glitch your entire system or make it bluescreen. To prevent this, try uninstalling your display drivers and any bundleware that might have gotten into your computer (use Safe Mode if you need to), then download Nvidia or AMD's official drivers and install these and only these.
  7. If your VM hangs when installing passed-through hardware drivers, try temporarily replacing your passed-through hardware with QEMU's usual simulated hardware. For example, if you're having a hard time going through a Windows 10 display driver or a Geforce Experience driver update, you can modify your Windows VM's QEMU command line to use the usual simulated video system, start another VM with access to your graphics card, VNC into your virtual machine (you might have to use an SSH tunnel), crank down the RAM allocation on your gaming VM and start it, and once you're done you will be able to get your VM to process the offending driver updates.

Additional links and resources