Testing of, and Configuring of PCI Passthrough in Proxmox

Server Information        3

Specifications        3

Initial Setup Configuration        3

Objective        4

IOMMU (Input-Output Memory Management Unit)        5

Summary        5

Verifying that IOMMU is enabled        5

Verifying that IOMMU interrupt mapping is enabled        8

Verifying IOMMU isolation        9

Additional Proxmox Configuration        11

VFIO Modules        11

Blacklisting Drivers        11

Adding GPU to VFIO        12

Vendor Reset        15

Setting Up the Windows VM        17

Initial Setup        17

General        17

OS        17

System        17

Disk        17

CPU        17

Memory        17

Network:        18

Additional Setup        18

Windows 11 Setup        19

Server Information

Specifications

CPU: Intel Core i7-12700K

GPU: Gigabyte Radeon RX 5700 XT

Motherboard: ASRock B760M PG Lightning LGA 1700

RAM: 32GB Team T-Force DDR5 6000Mhz

PSU: EVGA Supernova G2 750W

SSD: Samsung 980 Pro 2TB


Initial Setup Configuration

Filesystem: ext4

Disk(s): /dev/nvme0n1

Country: Canada

Timezone: America/Toronto

Keymap: en-us

Email:

Management Interface: enp5s0

Hostname: piscesxxiii

IP CIDR: 10.0.0.2/24

Gateway: 10.0.0.1

DNS: 10.0.0.1

Objective


        The main use of Proxmox is to create and host virtual machines on one physical device. One of the main attractions of this is to share physical hardware across multiple machines (i.e. sharing one processor across multiple devices, or splitting up RAM). This, in turn, makes virtual machines extremely versatile, portable, and easy to work with. However, when trying to perform certain actions, you need to ensure that certain devices have certain physical devices. This leads to the concept of passthrough, and the focus of this document - PCI Passthrough.

        PCI Passthrough is the concept of allowing you to use a physical device inside of a virtual machine, without the host coming in between in any sort of way. This allows more direct communication between the virtual machine and the hardware. However, it comes with the drawback of it not being available to the host anymore, and therefore, not accessible to any other virtual machine.[1] This means that doing PCI Passthrough should only be done when it is absolutely necessary, as it goes against many pros of virtual machines.

        The main objective of this practice is to pass a dedicated graphics card (AMD Radeon RX 5700XT) to a Windows 11 virtual machine, being hosted on Proxmox, so that the CPU and RAM can be shared to other machines. This is so that part of the server in question can be used for gaming by anyone who remotes in, preferably using Parsec or Steam In-Home Streaming. This isn’t without its challenges, which will be talked about more as the document goes on, but as time shows, I’m sure I will be able to get something working properly, or at least semi-properly.

        I will be pulling from a few sources for this - the official Proxmox wiki, some Github repositories for 3rd party fixes, and some Reddit and forum links. So without any further ado, let’s start with the show, and look through some of the steps to take to get PCI Passthrough working.

IOMMU (Input-Output Memory Management Unit)


Summary

        One of the basic things we need to check first is to make sure that IOMMU (Input-Output Memory Management Unit) is enabled and working on our host. Without going into too much detail, IOMMU is responsible for connecting an I/O bus to system memory. The reason this is useful for virtualization is because it allows device-visible virtual addresses to be mapped to physical devices.[2] To break it down further, it allows you to explicitly tell a virtualized device to use a very certain physical device. In our use case, it allows you to pass the physical device of the GPU to a virtual address that can be passed to a virtual machine.

        In the past, IOMMU was a bit more of an obscure feature, but in recent years, its’ become a standard on most motherboards, CPUs, and GPUs.[3] There are a few things to keep in mind however - one of the main things to know is that it is recommended to use OVMF (Open Virtual Machine Firmware) instead of SeaBIOS (essentially UEFI vs Legacy). It’s also worth knowing that if you use older hardware, you want to double check and make sure it supports IOMMU. With the hardware we have, we will have no issues with IOMMU support.


Verifying that IOMMU is enabled

        First step is to go into the shell of the host, reboot, and then run the following command:

dmesg | grep -e DMAR -e IOMMU

The command above looks into the kernel ring buffer (dmesg)[4], and then uses grep to display any pattern that contains either “DMAR” or “IOMMU”.[5]

If you get a message along the lines of “DMAR: IOMMU enabled”, then you are all good to go. If you don’t then something is wrong, and you need to dig deeper.[6] We did not get that message, so we will need to continue. We can start by trying to modify our GRUB file to enable Intel IOMMU. If you aren’t aware of what GRUB is, it is essentially the program on Linux systems that loads and manages the boot process.[7] First, we need to type in the following command:

nano /etc/default/grub

Doing this will bring us into our GRUB configuration file. Most of the file will be commented out, say for 5 lines towards the top. It will look either the same, or similar to this:

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=
0
GRUB_TIMEOUT=
5
GRUB_DISTRIBUTOR=`lsb_release -i -s
2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT=
"quiet"
GRUB_CMDLINE_LINUX=
""

# If your computer has multiple operating systems installed, then you
# probably want to run os-prober. However, if your computer is a host
# for guest OSes installed via LVM or raw disk devices, running
# os-prober can cause damage to those guest OSes as it mounts
# filesystems to look for things.
#GRUB_DISABLE_OS_PROBER=false

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"


What we’re gonna try next is to modify the line that says “GRUB_CMDLINE_LINUX_DEFAULT” to the following:
[8]

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off"

After doing so, we’re going to press Ctrl+X to exit, press Y to accept the changes, and then press Enter to confirm the name of the file. Then we’re going to perform the following commands to update GRUB and then reboot the host.

update-grub
shutdown -r now

After the restart, we’re going to run the following command again:

dmesg | grep -e DMAR -e IOMMU

Doing so gets us the following output:

[    0.005666] ACPI: DMAR 0x0000000061DDF000 000088 (v02 INTEL  EDK2     00000002      01000013)
[    
0.005697] ACPI: Reserving DMAR table memory at [mem 0x61ddf000-0x61ddf087]
[    
0.036141] DMAR: IOMMU enabled
[    
0.087434] DMAR: Host address width 39
[    
0.087435] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    
0.087438] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 29a00f0505e
[    
0.087439] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    
0.087443] DMAR: dmar1: reg_base_addr fed91000 ver 5:0 cap d2008c40660462 ecap f050da
[    
0.087444] DMAR: RMRR base: 0x0000006c000000 end: 0x000000707fffff
[    
0.087446] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    
0.087447] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    
0.087447] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    
0.088325] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    
0.243484] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[    
0.283013] DMAR: No ATSR found
[    
0.283013] DMAR: No SATC found
[    
0.283014] DMAR: IOMMU feature fl1gp_support inconsistent
[    
0.283014] DMAR: IOMMU feature pgsel_inv inconsistent
[    
0.283015] DMAR: IOMMU feature nwfs inconsistent
[    
0.283015] DMAR: IOMMU feature dit inconsistent
[    
0.283015] DMAR: IOMMU feature sc_support inconsistent
[    
0.283016] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    
0.283016] DMAR: dmar0: Using Queued invalidation
[    
0.283018] DMAR: dmar1: Using Queued invalidation
[    
0.284687] DMAR: Intel(R) Virtualization Technology for Directed I/O
[    
2.380897] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.

As you can see, on the third line, we now have the following message:

[    0.036141] DMAR: IOMMU enabled

With this message, we can confirm that IOMMU is in fact enabled, and functional. Knowing this leads us to our next step.


Verifying that IOMMU interrupt mapping is enabled

        Now that we know that IOMMU is working, we need to make sure that interrupt mapping is working properly as well. Interrupt mapping basically allows inputs from peripheral devices to be intercepted, and sent to certain CPU cores.[9] Very essential when we want to intercept the data from the GPU, and pass it back to the CPU in question on the particular VM in question. Not having it will throw errors, saying the device could not be assigned, or something else along those lines.

Checking is very simple, and can be done with one command, just like before:

dmesg | grep 'remapping'

This gets us the following result:

[    0.087447] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    
0.088325] DMAR-IR: Enabled IRQ remapping in x2apic mode

The line we want to pay attention to is the second line, which says that IRQ remapping is in x2apic mode. Having this line tells us that our interrupt mapping is working with no issues.[10] If you do not have this line, you can enable unsafe interrupts with the following line:

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf

As I did not need to enable this, I am unsure of the consequences, and I would recommend doing more research before enabling it to ensure that it is safe for your system.


Verifying IOMMU isolation

        The last step we need to take for IOMMU is to have a dedicated IOMMU group for all of the PCI devices you want to pass to a VM. You can see your groups by entering the following command:[11]

pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""

Of course, replace {nodename} with the name of your node. So in our case, it would be the following:

pvesh get /nodes/piscesxxiii/hardware/pci --pci-class-blacklist ""

Entering that command on my server lets me know that my Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (Radeon RX 5700 XT Gaming OC) is in iommugroup 12 all by itself, so we’re all good to go.

One thing to mention is that the sound for the GPU is also in a different iommugroup (13). I am unsure if this will cause any issues with sound later, but it is something we can keep in mind. With all three of these steps taken, we can sure that our IOMMU is setup and fully working.

Additional Proxmox Configuration


        Going forward, the steps taken are a little more scattered and less resolved around one main technology, so we’re just going to take it step by step going forwards.

VFIO Modules

        Additional modules have to be loaded for VFIO (Virtual Function Input Output) to be fully functional, which we need for the VM to have direct access to a piece of PCIE hardware. To enable the modules required, enter the following command:

nano /etc/modules

This opens the modules file, in which we have to add the following lines:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Press Ctrl+X to exit, press Y to accept the changes, and then press Enter to confirm the name of the file.[12]


Blacklisting Drivers

        In addition, we don’t want the Proxmox host to access the dedicated GPU either. We can do this by using the following command:

echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
echo
"blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo
"blacklist nvidia" >> /etc/modprobe.d/blacklist.conf

echo "blacklist amdgpu" >> /etc/modprobe.d/blacklist.conf

This will add any driver labeled as ‘radeon’, ‘nouveau’, ‘nvidia’, and ‘amdgpu’ from being used in the host machine.


Adding GPU to VFIO

        Now with IOMMU and VFIO setup, and with the dedicated GPU being blacklisted from being used by Proxmox, we can add GPU to VFIO so that we can pass it to our VM. This is a bit of lengthy process that will require a bit of note taking, but it is otherwise straightforward. We start with this command:

lspci -v

This will list out a large amount of data, but we want to narrow it down to anything mentioning our dedicated GPU. In this use case, we have the following:

03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c1) (prog-if 00 [VGA controller])
       Subsystem: Gigabyte Technology Co., Ltd Radeon RX
5700 XT Gaming OC
       Flags: bus master, fast devsel, latency
0, IRQ 154, IOMMU group 12
       Memory at
6000000000 (64-bit, prefetchable) [size=8G]
       Memory at
6200000000 (64-bit, prefetchable) [size=256M]
       I/O ports at
4000 [size=256]
       Memory at
70800000 (32-bit, non-prefetchable) [size=512K]
       Expansion ROM at
70880000 [disabled] [size=128K]
       Capabilities: [
48] Vendor Specific Information: Len=08 <?>
       Capabilities: [
50] Power Management version 3
       Capabilities: [
64] Express Legacy Endpoint, MSI 00
       Capabilities: [a0] MSI: Enable+ Count=
1/1 Maskable- 64bit+
       Capabilities: [
100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
       Capabilities: [
150] Advanced Error Reporting
       Capabilities: [
200] Physical Resizable BAR
       Capabilities: [
240] Power Budgeting <?>
       Capabilities: [
270] Secondary PCI Express
       Capabilities: [
2a0] Access Control Services
       Capabilities: [
2b0] Address Translation Service (ATS)
       Capabilities: [
2c0] Page Request Interface (PRI)
       Capabilities: [
2d0] Process Address Space ID (PASID)
       Capabilities: [
320] Latency Tolerance Reporting
       Capabilities: [
400] Data Link Feature <?>
       Capabilities: [
410] Physical Layer 16.0 GT/s <?>
       Capabilities: [
440] Lane Margining at the Receiver <?>
       Kernel driver
in use: amdgpu
       Kernel modules: amdgpu

03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio
       Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi
10 HDMI Audio
       Flags: bus master, fast devsel, latency
0, IRQ 151, IOMMU group 13
       Memory at
708a0000 (32-bit, non-prefetchable) [size=16K]
       Capabilities: [
48] Vendor Specific Information: Len=08 <?>
       Capabilities: [
50] Power Management version 3
       Capabilities: [
64] Express Legacy Endpoint, MSI 00
       Capabilities: [a0] MSI: Enable+ Count=
1/1 Maskable- 64bit+
       Capabilities: [
100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
       Capabilities: [
150] Advanced Error Reporting
       Capabilities: [
2a0] Access Control Services
       Kernel driver
in use: snd_hda_intel
       Kernel modules: snd_hda_intel

The main thing we want to focus on are the numbers at the beginning of each device (the 03:00:0 and 03:00:1 in this case). Seeing as they both start with ‘03:00’, these two software devices come from the same hardware device, and is the first piece of information we need to note. Let’s call that our prefix.

After gathering this information, we then have to enter the following command. Note that we are using the same prefix at the end of the command that we found in the previous step.

lspci -n -s 03:00

Entering this brings us this as a result:

03:00.0 0300: 1002:731f (rev c1)
03:00.1 0403: 1002:ab38

Now the piece of information we are looking for are the alphanumeric codes at the end (1002:731f and 1002:ab38 in this case). These are our vendor ID codes. Now that we have our vendor ID codes, we now enter the following command:

echo "options vfio-pci ids=1002:731f,1002:ab38 disable_vga=1"> /etc/modprobe.d/vfio.conf

Now, after doing all of this, and updating all of these modules, we enter the following command:

update-initramfs -u

And then restart:[13]

shutdown -r now

Now with this, our GPU should be ready for passing through to Windows. However, extra steps will probably have to be taken as there is an infamous ‘reset bug’ with AMD cards. An additional 3rd party fix may have to be applied.

Vendor Reset


        This brings us to the AMD specific part of this document. For a few generations of AMD GPUs, there was a nasty bug where whenever a VM restarted that had a GPU dedicated towards it, it wouldn’t be able to get the GPU back until the host restarted. Not quite ideal for a Proxmox setup. This is why we have to work around this with a fix called “vendor-reset”. This is a 3rd party fix developed on GitHub. The repository can be found below:

GitHub - gnif/vendor-reset: Linux kernel vendor specific hardware reset module for sequences that are too complex/complicated to land in pci_quirks.c

This tool will stop the GPU from being held hostage by the host, and allow it to be passed back to the designated VM.

First, we want to ensure we have the package repository setup properly. By default, Proxmox uses the enterprise repository, but without a license, you have a limited ability to update. I opted to use the no-subscription repository, which has a reputation to be less stable, but is fully accessible. This can be set by selecting your node in the Proxmox Web GUI, and then clicking on Updates, and then Repositories. Click on Add, and then select No-Subscription from the dropdown menu. Lastly, disable the enterprise repository.

Once this is done, run the following command:

apt update && apt dist-upgrade

Allow all updates to finish. After this, we can use the following command to update the Proxmox kernel headers:

apt install pve-headers

Then, we want to install the required build tools for vendor-reset:

apt install git dkms build-essential

Then, we want to copy the repository, and then move to the directory of the repository we just downloaded:

git clone https://github.com/gnif/vendor-reset.git
cd vendor-reset

Now, typically you would run another command, but I had an error with a dependency having issues, and I had to jump through hoops to get it uninstalled and then reinstalled, so I had to run the following command:

apt-get install linux-headers-6.5.11-5-pve

After doing this, then I was able to build and install vendor-reset (don’t miss the period!):

dkms install .

Now, just like the VFIO modules we loaded in earlier, we have to add vendor-reset to our modules so that it is loaded on boot. Instead of going into nano, we can just echo it to the file with the following command:

echo "vendor-reset" >> /etc/modules

And since we updated modules, we have to enter this command to rebuild our boot configuration:

update-initramfs -u

Lastly, to load everything up, restart the host system:[14]

shutdown -r now

With this done, the reset bug should be fixed, and the binding of the GPU to the specified VM should persist across reboots, even if the host does not restart.

https://github.com/gnif/vendor-reset/issues/46#issuecomment-1295482826

^^^^ SEEMS TO HAVE FIXED MORE THINGS???

Setting Up the Windows VM


Initial Setup

        Now that we have our GPU passthrough configured and our vendor reset fixed, we can actually create the Windows VM. In this case, I will be going with a Windows 11 VM, just for the sake of it being the newest available, and for long term support. So let’s go into the Proxmox Web GUI, and then click on Create VM in the top right.

General

        Set your node to the same node you’ve done all the previous configuration to. Give it a VM ID and a Name.

OS

        Select “CD/DVD disc image file (iso)”, and then select your local storage where your ISOs are. Assuming we have our Windows 11 iso ready to go, select it from the ISO image drop down. On the right side where it says Guest OS, select Microsoft Windows as the Type and 11/2022 as the Version. We will add the additional drive for VirtIO drivers later.

System

        Leave Graphic card as Default. Set Machine to q35. Ensure BIOS is set to OVMF (UEFI), and add a EFI disk, selecting any storage drive you want. Set SCSI controller to VirtIO SCSI single, and check off Add TPM. Select a storage drive for TPM Storage, and ensure the verison is set to v2.0.

Disk

        Set your disk size to whatever you desire. In this case, it will be set to 500GB.

CPU

        Set as many sockets and cores as desired. In this case, I will be setting it to 1 socket, 10 cores.

Memory

        Set your memory as desired. In this case, I will be setting it to 16384 (equivalent to 16GB in megabytes).

Network:

        Set your model to VirtIO (paravirtualized), and leave the rest as is.


Additional Setup

        Alright, now that our VM is created, let’s go into Hardware and add our GPU. Click on Add at the top and then click on PCI Device. Select Raw Device, and then select your dedicated GPU from the dropdown list. You’ll notice they will have the same prefixes as before - neat! Check off All Functions as well. Then click on Advanced, and enable PCI-Express, and ensure ROM-Bar is enabled as well.

        Then, click on Add again and then add a CD/DVD drive. Select your storage with the VirtIO Drivers .iso, and then select it from the ISO image dropdown.

Alright! We should be all good to go. Fire up the VM, and go through the setup like normal, ensuring that we select Windows 11 Pro during the setup, as we want to be able to RDP in.


Windows 11 Setup


        Alright! We are now at a point where we have a Windows VM. There are just a few more quirks to work out. The first thing we’re going to do is to either set a static IP inside of Windows, or reserved a DHCP address with your router. This will help with troubleshooting later, as we may have to use RDP to access the computer.

        The second thing we will want to do is to either plug the GPU physically into a monitor, or use an HDMI dummy plug. Most GPUs will not output properly without a monitor connected, so using either will allow any remote connection to display properly.

        Next, we will want to configure Windows to automatically login. This may seem counter-intuitive, and it can be skipped for the sake of security, but the point of this is that our programs that allow remote control (Parsec and Steam) do not launch until the user is logged in, causing us to need to login with RDP or locally first before we can access it with Parsec or Steam. Therefore, if the account automatically logins in whenever the VM starts, Parsec and Steam will be accessible immediately.

        To configure Windows to automatically login, open up Registry Editor and move to the following location:

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon

After navigating there, either create or modify the following keys:[15]

Name

Type

Data

AutoAdminLogon

Reg_SZ

1

DefaultPassword

Reg_SZ

*your password here*

DefaultUserName

Reg_SZ

*your username here*

ForceAutoLogon

Reg_DWORD

1

ForceUnlockLogon

Reg_DWORD

1

Of course, we will want to install graphics drivers, Steam, etc. Everything that you would usually install on a fresh gaming machine. After all, this is essentially what it is.


[1] https://pve.proxmox.com/wiki/PCI_Passthrough#Introduction

[2] https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iommu-model

[3] https://pve.proxmox.com/wiki/PCI_Passthrough#Requirements

[4] https://man7.org/linux/man-pages/man1/dmesg.1.html

[5] https://linuxcommand.org/lc3_man_pages/grep1.html

[6] https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_is_enabled

[7]https://www.codecademy.com/resources/blog/grub-linux/#:~:text=GRUB%20is%20the%20program%20on,the%20few%20still%20being%20maintained.

[8] https://forum.proxmox.com/threads/pci-passthrough-iommu-on-not-working.124111/

[9] https://summerofcode.withgoogle.com/archive/2016/projects/5087715448061952

[10] https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_interrupt_remapping_is_enabled

[11] https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_isolation

[12]https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/

[13]https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/

[14] https://www.nicksherlock.com/2020/11/working-around-the-amd-gpu-reset-bug-on-proxmox/

[15] https://docs.learnondemandsystems.com/lod/vm-auto-login.md#edit-windows-registry