VMware vSphere / ESXi 7.0 GA work-around for GPU passthrough issues including disabled-after-reboot bug and UI bug
A new and improved workaround is available as of Jun 18 2020, details below.
When I was upgrading my primary Supermicro SuperServer Workstation / Datacenter, I ran into some strange problems with getting passthrough working. What would happen is that I'd get everything squared away, and boot my Windows 10 VM with my AMD Radeon 7750 GPU successfully passed through, as I've been doing for many years, see:
- What fits in any home virtualization lab, has 8 Xeon cores, 6 drives, 128 GB memory, and 3 4K outputs from a Windows 10 VM? Your new Supermicro SuperServer Workstation!
Jul 15 2015
I went all-in with my ESXi 7.0 upgrade, so in my most crucial (but back-up protected) Windows 10 VM, I also upgraded my virtual hardware to version 17, and updated my VMware tools as well. After a reboot of my ESXi host, I noticed my Windows 10 VM wouldn't boot up. Then I noticed the reason, it turns out my passthrough settings weren't persisting through reboots. This was nerve-wracking, as I had work the next morning and had to figure out a way to get things square again, without resorting to falling back to 6.7U3, and/or reverting to backups of 1.8TB of data.
Gladly, I found a work-around for my new ESX 7.0, warning, it's pretty wonky, but quick-and-easy. It's not permanent though, you have to do this after every ESXi host reboot. If you found a better way around this, by all means drop a comment below to let us all know!
Note that I currently have no valid way of reporting such bugs to VMware, still working on that. When a new dot zero release like vSphere 7.0 came out on April 2nd, and isn't on the VMware Compatibility Guide, at least not yet, opening a per-incident ticket isn't an option. I tried! I'll explain all that in another article soon.
Meanwhile, after a few dozen attempts and reboots, I found a workaround that I published a video of back on April 9 of 2020, and now this article will hopefully help others as well. Strangely enough, over 500 folks have seen that video already, so unfortunately, I suspect I'm not along with my issue. I hope the next patch release fixes this issue, which I've also posted to the VMTN forum.
New as of June 18 2020, and tested successfully in my home lab!
Note, this is not a fix, it's merely a stop-gap workaround until a hopefully much more elegant fix comes along. At least it persists, it's not something you have to do after every reboot, so that's good.
Follow the method shown in William Lam's new article Passthrough of Integrated GPU [iGPU] for standard Intel NUC, where he explains that the issue is about ESXi claiming the VGA driver, but beware, you will no longer see ESXi boot before your auto-started VM with GPU passthrough comes up! Here's the one-line SSH command to issue, then reboot, that's it!
esxcli system settings kernel set -s vga -v FALSE
Presumably, when a better resolution comes along in a subsequent ESXi 7.x release, we can issue this command to undo the change:
esxcli system settings kernel set -s vga -v TRUE
returning to the original ESXi 6.x behavior where your display shows your familiar black and yellow ESXi DCUI boot sequence, followed by a VGA hand-over to your GPU accelerated VM later on when that VM is auto-started.
Alternatively, you can work around this GPU mapping issue without changing anything via ESXi, but it's not sticky, so you'll need to do this UI operation after every ESXi reboot:
- In vSphere Client or ESXi Host Client, set both of your AMD GPU devices (video & audio) to passthrough
- Reboot the server
- After the reboot, if you use ESXi Host client and notice Passthrough status shows "Enabled/Needs reboot" instead of active, toggle both AMD devices off and then on again, you'll now see them both active, with no reboot required
- Now you can start your VM that uses the PCI device
- If you find your mappings are wrong and your VM still won't start, remove the PCI devices from the VM then re-add them again. This is covered in more detail in the video below.
All vSphere 7 articles.
All vSphere 7 videos.
- Solved: Upgrade from 6.7 to 7.0 and unsupported hardware
Apr 10 2020 by zwbee at VMware Technology Network Forums
Home > VMTN > VMware vSphere™ > ESXi > Discussions
I went to Host/Manage/Hardware and the HBA was showing up with the correct name. However, its passthrough setting had been switched to inactive. I toggled it to active, but there was an error in addition to the usual "Reboot required" message. I figured it didn't work, but I tried rebooting anyway. After reboot, the device now showed as passthrough active. Promising!