VMware vSphere / ESXi 7.0 GA work-around for GPU passthrough issues including disabled-after-reboot bug and UI bug

Posted by Paul Braren on Apr 25 2020 (updated on Apr 26 2020) in
  • ESXi
  • HowTo
  • HomeLab
  • HomeServer
  • Virtualization
  • vSphere7
  • When I was upgrading my primary Supermicro SuperServer Workstation / Datacenter, I ran into some strange problems with getting passthrough working. What would happen is that I'd get everything squared away, and boot my Windows 10 VM with my AMD Radeon 7750 GPU successfully passed through, as I've been doing for many years, see:

    superserverworkstation
    2020-04-25_21-07-31
    VM Hardware Version 17.

    I went all-in with my ESXi 7.0 upgrade, so in my most crucial (but back-up protected) Windows 10 VM, I also upgraded my virtual hardware to version 17, and updated my VMware tools as well. After a reboot of my ESXi host, I noticed my Windows 10 VM wouldn't boot up. Then I noticed the reason, it turns out my passthrough settings weren't persisting through reboots. This was nerve-wracking, as I had work the next morning and had to figure out a way to get things square again, without resorting to falling back to 6.7U3, and/or reverting to backups of 1.8TB of data.

    Gladly, I found a work-around for my new ESX 7.0, warning, it's pretty wonky, but quick-and-easy. It's not permanent though, you have to do this after every ESXi host reboot. If you found a better way around this, by all means drop a comment below to let us all know!

    Note that I currently have no valid way of reporting such bugs to VMware, still working on that. When a new dot zero release like vSphere 7.0 came out on April 2nd, and isn't on the VMware Compatibility Guide, at least not yet, opening a per-incident ticket isn't an option. I tried! I'll explain all that in another article soon.

    Meanwhile, after a few dozen attempts and reboots, I found a workaround that I published a video of back on April 9 of 2020, and now this article will hopefully help others as well. Strangely enough, over 500 folks have seen that video already, so unfortunately, I suspect I'm not along with my issue. I hope the next patch release fixes this issue, which I've also posted to the VMTN forum.

    Workaround

    1. In vSphere Client or ESXi Host Client, set both of your AMD GPU devices (video & audio) to passthrough
    2. Reboot the server
    3. After the reboot, if you use ESXi Host client and notice Passthrough status shows "Enabled/Needs reboot" instead of active, toggle both AMD devices off and then on again, you'll now see them both active, with no reboot required
    4. Now you can start your VM that uses the PCI device
    5. If you find your mappings are wrong and your VM still won't start, remove the PCI devices from the VM then re-add them again. This is covered in more detail in the video below.
    Apr 10 2020 - VMware ESXi 7.0 GPU passthrough configuration doesn't persist after rebooting, here's a work-around

    Screenshots

    thumbnail1
    2020-04-25_20-35-08
    Select both AMD items, then click "Toggle passthrough" button.
    2020-04-25_20-36-16
    Click "Toggle passthrough" button again.
    2020-04-25_20-37-59
    Notice the Passthrough column at right, both AMD devices now show Active, you're all set, no reboot required.
    2020-04-25_20-41-21
    vSphere Client. Everything is working peachy now on my ESXi 7.0, but I have some quick toggling to do after every reboot.
    2020-04-25_20-43-58
    ESXi Host Client.

    See also at TinkerTry

    All vSphere 7 articles.

    All vSphere 7 videos.


    See also

    • Solved: Upgrade from 6.7 to 7.0 and unsupported hardware
      Apr 10 2020 by zwbee at VMware Technology Network Forums
      Home > VMTN > VMware vSphere™ > ESXi > Discussions

      I went to Host/Manage/Hardware and the HBA was showing up with the correct name. However, its passthrough setting had been switched to inactive. I toggled it to active, but there was an error in addition to the usual "Reboot required" message. I figured it didn't work, but I tried rebooting anyway. After reboot, the device now showed as passthrough active. Promising!