hello friends! new(ish)!
PCI passthrough
PCI passthrough is a technology that allows you to directly present an internal PCI device to a virtual machine. The device acts as if it were directly driven by the VM, and the VM detects the PCI device as if it were physically connected. PCI passthrough is also often known as IOMMU, although this is a bit of a misnomer, since the IOMMU is the hardware technology that provides this feature but also provides other features such as some protection from DMA attacks or ability to address 64-bit memory spaces with 32-bit addresses.
As you can imagine, the most common application for PCI passthrough at least on the chansphere is vidya, since PCI passthrough allows a VM direct access to your graphics card with the end result of being able to play games with nearly the same performance as if you were running your game directly on your computer. PCI passthrough, however, has many other applications: for example, it lets you use a Digium phone card on an Asterisk VOIP server contained within a VM, if you have a PCI RAID controller not supported by a propietary VM hypervisor like VMware ESX you can pass it through to your VM, or it can let you use a high-end audio card for professional audio production that only has drivers for Windows and Mac OS X.
Intended audience
The following guide is intended for people with at least a working knowledge of Linux configuration using the command line. If you're capable of installing Arch Linux and setting up a desktop system, or if you're capable of setting up a web, mail, DNS/DHCP/LDAP/SMB or a business application server you should be capable of following through this guide without too much trouble. If for whatever reason you can't stray away from the graphical utilities of Linux Mint or must rely on Ubuntu to make your system work because everything there Just Werks™, this guide is probably beyond your capabilities.
Why PCI passthrough?
- It lets you ditch Windows while letting you play vidya safely. PCI passthrough is your big fat middle finger against Macrohard and its recent quest to aggressively strong-arm the entire world into surrendering to the privacy-raping botnet that is Windows 10. A lot of free software enthusiasts have felt the need to leave videogames behind on account of not being libre and try as hard as possible to justify themselves with "I've outgrown them" or "Videogames are empty entertainment". Others have to begrudgingly dual boot between Windows for gaming and Linux for everything else. With PCI passthrough, you don't need to do any of that -- you will be running Windows 10 safely sealed in a VM with simulated hardware and no access (in theory) to the rest of your system, keeping your Linux and your hardware protected from NSA backdoors and from Microsoft's indiscriminate mass surveillance campaign.
- It protects you from GPU malware. Guess what? It is possible to run nearly undetectable malware on your graphics card. But if your graphics card is facing a VM, the compromise will be limited to your VM, keeping your physical OS safe.
- You can actually encrypt your Windows with it. The old days of safe encrypted Windows installations with Truecrypt ended with 64-bit Windows 7, which requires one boot partition and one system partition; now, you have to entrust yourself to Microsoft BitLocker, which is shit crypto by virtue of being tightly closed source and inauditable. But... if you use Linux as physical OS and keep your Windows on a VM, you will be able to profit from the tried-and-known-good solutions that are VeraCrypt and cryptsetup, and Microsoft BitLocker is not going to be a problem as your physical Linux is going to be the one that provides full disk encryption.
- It helps you keep a completely libre operating system. Device drivers are usually the very first place where a strictly libre Linux starts getting tainted with commercial code. But if you delegate the one or two devices that have no libre drivers to a VM through PCI passthrough, that's a whole 'nother story.
- Some niche applications require it. For example, if you want to use a MIDI sequencer or other kinds of advanced audio production hardware, you only have two choices: you dual boot between Windows and Linux, or you present this hardware to your VM through PCI passthrough. Or if you want to run an Asterisk phone exchange with Digium land-line cards, it is a very good idea to run it on a VM with PCI passthrough in order to isolate your server from the rest of your system.
- You can block Microsoft's botnet servers without having to spend on a separate router. Simply block them on your host system and your Windows guest will never be able to send anything to Microsoft.
- It's the closest thing there is to an entirely libre computer. Your Windows instance will be running on simulated hardware powered by open source, auditable code, and everything your Windows will see except for the passed-through devices will be libre. It's still not enough for an entirely libre computer, but it's at least a layer of freedom between a propietary OS and propietary hardware.
Prerequisites
- A CPU that supports Intel VT-d or AMD-Vi. Check your CPU datasheet to confirm this.
- A motherboard that supports the aforementioned technologies. To find this out, check in your motherboard's BIOS configuration for an option to enable IOMMU or something similar. Chances are that your motherboard will support it if it's from 2013 or newer, but make sure to check since this is a niche technology and some manufacturers may save costs by axing it from their motherboards or delivering a defective implementation (such as Gigabyte's 2015-2016 series) simply because NORPs never use it.
- At least two GPUs: one for your physical OS, another for your VM. (You can in theory run your computer headless through SSH or a serial console, but you risk locking yourself away from your computer if you do so).
- A Linux distribution with recent-ish packages. This means Debian and CentOS Stable will probably not work, as they run very old kernels and even older versions of QEMU that might have buggy, partial or broken PCI passthrough support. If you have a "stable" distro and PCI passthrough doesn't work for you, try switching your package manager to the "testing" branch before you try anything else.
Step 0: Compile IOMMU support if you use Gentoo
If you run /g/entoo, the distro where everything Just Never Works and you have to set it up on your own, you'll have to compile IOMMU support into your kernel. You will need to set up the following options:
- CONFIG_IOMMU_SUPPORT (Device Drivers -> IOMMU Hardware Support)
- The following options if you have an AMD CPU:
- CONFIG_AMD_IOMMU (Device Drivers -> IOMMU Hardware Support -> AMD IOMMU Support)
- CONFIG_AMD_IOMMU_V2 (Device Drivers -> IOMMU Hardware Support -> AMD IOMMU Support -> AMD IOMMU Version 2 driver)
- The following options if you have an Intel CPU:
- CONFIG_INTEL_IOMMU (Device Drivers -> IOMMU Hardware Support -> Support for Intel IOMMU using DMA Remapping Devices)
- CONFIG_INTEL_IOMMU_SVM (Device Drivers -> IOMMU Hardware Support -> Support for Intel IOMMU using DMA Remapping Devices -> Support for Shared Virtual Memory with Intel IOMMU)
- CONFIG_IRQ_REMAP (Device Drivers -> IOMMU Hardware Support -> Support for Interrupt Remapping)
- CONFIG_VFIO (Device Drivers -> VFIO Non-Privileged userspace driver framework)
- CONFIG_VFIO_PCI (Device Drivers -> VFIO Non-Privileged userspace driver framework -> VFIO support for PCI devices)
- CONFIG_VFIO_PCI_VGA (Device Drivers -> VFIO Non-Privileged userspace driver framework -> VFIO support for PCI devices -> VFIO PCI support for VGA devices)
Make sure you have these options enabled, rebuild your kernel with your favorite method (plain normal make
or a nice and easy genkernel
, doesn't matter), but don't reboot yet. You will do that on the next step.
Step 1: Check for IOMMU support, enable if you don't have it
Start out by modifying the kernel command line on your bootloader to enable IOMMU. For this, you need two parameters: iommu=on, and then amd_iommu=on or intel_iommu=on depending on whether you have an AMD or Intel CPU. Your kernel command line should look a bit like this:
linux /vmlinuz-4.6.0-1-amd64 root=UUID=XYZUVWIJK ro quiet iommu=on amd_iommu=on
Reboot your system, and check if AMD-Vi or Intel VT-d were enabled by checking your kernel log. On AMD, use grep AMD-Vi; on Intel, use grep -e DMAR -e IOMMU':
# dmesg | grep AMD-Vi
[ 0.953668] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40 [ 0.953669] AMD-Vi: Extended features: PreF PPR GT IA [ 0.953672] AMD-Vi: Interrupt remapping enabled [ 0.953768] AMD-Vi: Lazy IO/TLB flushing enabled
Gentoo users: It is possible that you might be missing a kernel configuration option that prevents the kernel from confirming that IOMMU is up and running. If you have already enabled all the kernel options described above and you still can't see anything about IOMMU or VT-d on the kernel logs, try looking for your platform's IOMMU driver running on a kernel-space process:
# ps -ef | grep -i iommu
root 66 2 0 04:19 ? 00:00:00 [amd_iommu_v2]
Step 2: Find out your IOMMU groups
Your devices will be organized into IOMMU groups, which are the smallest sets of physical devices that can be passed to a VM and depend on how your motherboard is wired and organized. For example, if you want direct access to your motherboard's audio chip, but it's on the same IOMMU group as your IDE controller and your SMBus (the one that provides access to thermometers, voltage sensors, fan speedometers and so on), you're going to have to give up your audio chip, your IDE controller and your SMBus all combined.
To check this info, use this command:
$ for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d); do echo "IOMMU group $(basename "$iommu_group")"; for device in $(ls -1 "$iommu_group"/devices/); do echo -n $'\t'; lspci -nns "$device"; done; done
IOMMU group 1 00:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 10h-1fh) Processor Root Port [1022:1412] 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779] 01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] IOMMU group 7 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:780b] (rev 14) 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH IDE Controller [1022:780c] 00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller [1022:780d] (rev 01) 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:780e] (rev 11)
As shown here, we're not fine with our IOMMU group 7, because it has tons of devices apart from the audio chip, which means that if we want direct access audio on a VM, our physical OS is going to have to give up the audio chip, the IDE controller (i.e. no more old DVD-RW drive for you), the SMBus sensor access chip, and the ISA bridge (that one is fortunately a minor nuisance since ISA is obsolete since like 1998). As for IOMMU group 1, we're more or less fine. As shown on the Arch Linux guide on PCI passthrough, we have a PCI bridge on the same IOMMU group as our graphics card and HDMI audio output. This means our PCI slot is provided by both the PCH (the successor of the north/south bridge) and the CPU, which in turn means it is possible for other devices apart from our GPU to be on that IOMMU port. This is OK in our case because we were lucky to have only the GPU, its HDMI audio and the PCI bridge on that group, but you're going to have to be wary to not tell the VM to grab your PCI bridge.
Step 3: Block access on your physical OS to the GPU
Now that we have identified the IOMMU group where our GPU lives, we will now prevent your OS from letting the display driver gain access to the GPU by binding it to a placeholder driver. By far the easiest way to do so is with vfio-pci, which is a modern PCI passthrough driver designed to Just Work out of the box with straightforward configuration. You can use pci-stub if you want to (or if your kernel is older than 4.1), but you might have a hard time getting it to work.
Start out by checking if you have vfio-pci on your system:
# modinfo vfio-pci
filename: /lib/modules/4.6.0-1-amd64/kernel/drivers/vfio/pci/vfio-pci.ko description: VFIO PCI - User Level meta-driver author: Alex Williamson <alex.williamson@redhat.com> license: GPL v2 version: 0.2 srcversion: E7D052C136278ABB60D003E depends: vfio,irqbypass,vfio_virqfd intree: Y vermagic: 4.6.0-1-amd64 SMP mod_unload modversions parm: ids:Initial PCI IDs to add to the vfio driver, format is "vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]" and multiple comma separated entries can be specified (string) parm: nointxmask:Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag. (bool) parm: disable_vga:Disable VGA resource access through vfio-pci (bool) parm: disable_idle_d3:Disable using the PCI D3 low power state for idle, unused devices (bool)
We do have vfio-pci, so now we will tell it to isolate our GPU and HDMI audio from our physical OS. You can use configuration file /etc/modprobe.d/vfio.conf, but in my case I prefer to use a little script I found on an Arch Linux Forum post. Save this script under /usr/bin/vfio-bind, and make it executable with chmod 755 /usr/bin/vfio-bind:
/usr/bin/vfio-bind
#!/bin/bash modprobe vfio-pci for dev in "$@"; do vendor=$(cat /sys/bus/pci/devices/$dev/vendor) device=$(cat /sys/bus/pci/devices/$dev/device) if [ -e /sys/bus/pci/devices/$dev/driver ]; then echo $dev > /sys/bus/pci/devices/$dev/driver/unbind fi echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id done
Return to your IOMMU groups and take note of the device number of your GPU and HDMI audio, which in this case are 01:00.0 (the GPU) and 01:00.1 (the HDMI audio), and call that script as follows:
vfio-bind 0000:[device number]
In this case:
vfio-bind 0000:01:00.0 0000:01:00.1
Step 4: Set up your VM
We're finally done configuring our physical OS, so the next step is to set up a VM that takes the GPU. Depending on your GPU you might have to use the SeaBIOS firmware (good ol' IBM PC BIOS) or the OVMF firmware (a brand new and shiny libre UEFI BIOS).
For that, the best suggestion is to be a man, break away from the coziness of virt-manager and libvirt, and call QEMU directly from the command line, because some commands are not supported by libvirt or are difficult or complex to set up as a libvirt VM definition. Don't worry, QEMU's parameters are rather straightforward and you pretty much just need to properly indent and space them to make sense out of them.
Start out by plugging your monitor to your passed-through GPU and summoning a VM as follows:
qemu-system-x86_64 -enable-kvm -m 512 -cpu host,kvm=off \ -smp <number of virtual CPUs>,sockets=1,cores=<number of CPU cores>,threads=<2 if your Intel CPU has Hyper-Threading, 1 otherwise> \ -device vfio-pci,host=<device number of your GPU>,x-vga=on -device vfio-pci,host=<device number of your HDMI output, omit this entire section if you don't have it> \ -vga none
Filling in the values for our system:
qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off \ -smp 2,sockets=1,cores=2,threads=1 \ -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \ -vga none
Or if you want to use OVMF:
qemu-system-x86_64 -enable-kvm -m 1024 -cpu host,kvm=off \ -smp 2,sockets=1,cores=2,threads=1 \ -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \ -drive if=pflash,format=raw,readonly,file=/usr/share/ovmf/x64/ovmf_code_x64.bin \ -drive if=pflash,format=raw,file=/usr/share/ovmf/x64/ovmf_vars_x64.bin \ -vga none
Explanation:
- kvm=off: This parameter hides the KVM hypervisor signature. Nvidia's display drivers don't play nicely with hypervisors and can cause weird, cryptic errors if they find out they're running on a VM, because Nvidia is the Apple of GPUs and makes normalfag cards for normalfags with normalfag setups and Windows 10. However, it turns out that these drivers, for now, just check for a hypervisor signature and that's it, so hiding it should do the tric.
- x-vga=on: Required for VGA assignment.
- -vga none: Also required for VGA assignment. This disables QEMU's simulated VGA device, the one that receives graphical output and passes it on to a VNC/Spice server so you can open it with virt-manager.
If you see BIOS output on your GPU and a message that says the BIOS couldn't find any bootable media (normal since we're not currently passing any storage media), you just finished setting up GPU passthrough on your VM and that BIOS output you're seeing is your VM assuming direct control over your GPU.
Step 5: Piece together a VM suitable for vidya
Gentoo users: Before proceeding with this step, check to see if you have compiled QEMU with USE="usb"
. Otherwise you will not have the USB passthrough function and you will therefore be unable to assign an USB keyboard and mouse to your VM. Notice that this flag is not the same as usbredir -- the usb flag means USB passthrough, whereas the usbredir flag means supporting a namesake tool that allows you to transmit USB I/O to a remote VM over TCP/IP.
Now that we have a VM with a functional GPU over PCI passthrough, we will now piece together a VM.
Start out by creating the HD image. When creating your image, consider the disk space your games will take, add at least 10 GB to account for temporary installation files, and then add another 10 GB for Windows alone. For the sake of this tutorial we will be building a minimal testing environment with only Windows 7 and League of Legends, which combined take up about 20 GB of storage.
You have two choices when creating a disk image: a raw image, which is a simple bit-by-bit representation of an actual HD's contents, or a qcow2 image, which is a compact image format that only contains non-blank sectors (blank here means containing all binary zeroes) and therefore takes up less disk space, plus some cool features like snapshots or compression, all of that at the expense of CPU overhead when performing disk I/O as the system must figure out on which byte of the file a given sector actually is.
Create your qcow2 disk image as follows:
qemu-img create -f qcow2 /root/IOMMU.qcow2 20G
Or if you want to create a raw image:
dd if=/dev/zero of=/root/IOMMU.img bs=1G count=0 seek=20
Now we piece together a VM with some basic hardware
qemu-system-x86_64 -enable-kvm -m <MB of RAM you want to use> -cpu host,kvm=off \ -smp 2,sockets=1,cores=2,threads=1 \ \ -device vfio-pci,host=01:00.0,x-vga=on -device vfio-pci,host=01:00.1 \ -vga none \ \ -drive file=/root/IOMMU.qcow2,id=disk,format=qcow2,if=none \ -device ide-hd,bus=ide.1,drive=disk \ -boot order=dc \ \ -soundhw hda \ \ -drive file=/home/niconiconii/torrents/WINDOWS-7-SDLG-EDITION.100%REALNOFAKE.FULL.CRACK-MEDICINA-PIRATA-KEYGEN.1ZELDA-INTERCAMBIABLE.FULL-NO-RIP.+TUTORIAL-LOQUENDO.iso,id=virtiocd,if=none \ -device ide-cd,bus=ide.1,drive=virtiocd \
Explanation:
- -m: Worth mentioning to make sure you choose the right amount of RAM for your VM. Windows Vista and later eat at least 1 GB of it. However, if you plan to divide your computer usage between vidya and a Linux desktop session, you won't be able to use all your system's RAM.
- -drive file=...,id=XYZ,format=qcow2: To define a logical storage you must first define the location of the drive's image file, its format, and assign it a name.
- -device ide-hd,bus=ide.1,drive=XYZ: This is where you assign the image file define before to a storage hardware. On Windows it is a good idea to use IDE HDs, because Windows doesn't really likes SCSI storage hardware and might do anything from refusing to install without a virtio SCSI driver disc or throwing a BSOD while booting due to lack of virtio SCSI driver.
- -device ide-cd: Same as before, but with a CD drive.
- -boot order=dc: Indicates the boot order via a nomenclature based off traditional drive letters on Windows. As configured here, we're scanning for bootable media first on the CD drive, which is drive "d" (in Windows it's usually D:\), then on the HD, which is drive "c" (in Windows it's usually C:\). This boot order has the advantage that if you assign a virtual CD drive, e.g. to install Windows or use a partitioning tool, it will load the CD instead of the HD, but if you remove the CD you'll only have the HD and your system will therefore start the HD. Of course, you can replace this predefined boot order with a menu that will default to booting from the hard drive, in which case you use
boot menu=on
instead. - -soundhw hda: Adds a simulated sound chip that outputs your computer's sound through your physical OS's audio system. If you have issues with it (which is rather likely given just how difficult it is to deal with audio on Linux), consider also giving your VM access to your sound chip over PCI passithrough.
Now that we have our gaming VM, you can now begin installing Windows. We will assume that if you've gotten this far through the tutorial you have the skills required to install Windows and therefore don't need any explanation. Just don't panic if your display appears in funky 16 color 640x480 mode: if it does so, it's most likely because either Windows or your Linux kernel can't into your video card's legacy VGA addresses, and chances are it will work normally once you install your video driver.
Once you have Windows installed, get a DirectX game with 3D graphics (such as League of Legends) and test how it works. Your game should work without a single hitch and with little or no performance difference between your setup and a plain ordinary physically installed Windows. If it does, then congratulations -- you have finally broken your vidya free from the botnet.
Do note that it is a good idea to gradually sunset your games on Windows instead of suddenly migrating everything to your VM, e.g. start out by playing League only once a day on your VM, then gradually start playing more on your VM and less on your physical OS, as the extra layers of complexity involved in a VM PCI passthrough scenario greatly increase the amount of points of failure and can cause unexpected issues.
Tuning
After you're done configuring your KVM gaming VM, you can try some system tweaks and adjustments to see if you get better performance.
Memory hugepages
Your x86 CPU doesn't access your memory directly. Instead, it addresses pages that amount to 4 KB of memory at a time. Of course, this memory model makes memory lookups slower -- and when you do that on an x86 VM that has yet another paged memory model, that can slow down your VM's performance quite a bit in some scenarios. To mitigate this, you can use a technology called hugepages, which replaces some of your 4 KB pages with large 2 MB pages and increases performance by cutting down on page lookups and amount of memory set aside for your system's page table. The downside of this is that you will have to manually remove your hugepages after stopping your VM, and that you'll have to reserve physical memory for your VM because hugepages cannot be swapped to hard disk or ZRAM.
Gentoo users: Before using hugepages you need to compile the hugetlbfs into your kernel. To do so, on your Linux kernel configuration enable CONFIG_HUGETLBFS (File systems -> Pseudo filesystems -> HugeTLB file system support) and check that CONFIG_HUGETLB_PAGE is also enabled.
Start out by setting up your hugepages filesystem (note: that's hugetlbfs twice in the first command)
# mount -t hugetlbfs hugetlbfs /dev/hugepages # sysctl vm.nr_hugepages=1024 # 1024 pages * 2 MB/page = 2 GB of hugepages
Then use your hugepages FS as memory location by adding this parameter to your KVM command line:
-mem-path /dev/hugepages
When you're done using your VM, remove your hugepages and unmount your hugepage FS:
# sysctl vm.nr_hugepages=0 # umount /dev/hugepages
Troubleshooting
If your setup doesn't work, or if it does but has some strange issues, do yourself a favor and save probably a lot of time and effort by checking these common points of failure:
- Are your kernel and KVM actually up to date? This means more than just issuing a system update on your package manager and calling it a day: you need to actually check your kernel and KVM version and see if they match with the most up to date versions as advertised on the Linux kernel and QEMU's web sites. This is important, because "stable" or "long-term support" distros like Debian, CentOS or Ubuntu LTS run Jurassic-age packages that can be literally two years behind the latest version in the name of overall stability. If you don't have the latest kernel or QEMU and your system is already fully up to date, you'll have to either switch your package manager to track your distro's "testing"/"unstable"/"UAT" release train and then upgrade your entire package tree (e.g. Debian Stretch as of June 2016), or migrate your entire system to a more cutting-edge distro like Arch.
- Is the IOMMU enabled on your BIOS configuration? Some CPU features like IOMMU or even hardware-accelerated virtualization are disabled by default on your motherboard's BIOS just because normalfags never use them. Look in your BIOS configuration for something like "virtualization", "virtual I/O", "IOMMU" or similar terms and make sure you have them all enabled.
- Is your system accessing your GPU during boot before assigning it to your VM? Some strange issues may arise if your GPU receives video output before getting bound to vfio-pci. If this is your case, try adding
video=efifb:off
to your kernel command line. - If you're not using hugepages, is your system swapping memory? Having a VM's memory space paged to the HD can greatly hurt its performance even if you use an SSD, and it can do so in annoying ways like having choppy, stuttering and distorted audio output. If you find out your VM is having its core paged, you'll have to lower your VM's memory allocation or use hugepages.
Additional links and resources
- How to do PCI passthrough on Arch Linux using OVMF as your KVM guest's BIOS
- GPU passthrough database -- add your hardware configuration, distro, kernel and QEMU versions and any comments if you could or could never get it working.