How to fix inaccurate Xeon D Hardware Health sensor readings on VMware ESXi 6.5.x, since sfcbd CIM service is off by default

Posted by Paul Braren on Apr 23 2017 (updated on Nov 26 2017) in
  • Virtualization
  • ESXi
  • HowTo
  • HomeLab
  • HomeServer
  • Backstory

    See, everything is very healthy, ESXi 6.0 or 6.5!

    Back during the early test period of VMware's ESXi 6.5, I had inquired with VMware about a problem I had noticed on my Supermicro SuperServer SYS-5028D-TN4T 8 Core Bundle 2, featuring Xeon D-1541. You see, with 6.0, all VMware GUIs would show accurate hardware health details, without any extra effort beyond just installing ESXi 6.0. Yep, no need to add OEM specific CIM drivers here. You would expect this, since the system is the only home-lab friendly (quiet mini-tower) that's on the ESXi VCG, aka, the VMware Compatibility Guide.

    Fan RPMs and Voltage readings were off

    When I had fresh installed ESXi 6.5, I noticed my fan speeds (RPMs) and my voltages numbers were showing values that were way off. And some health info was just missing entirely, seen in screenshots below. I had been wondering if this offset was related to the postponed-inclusion of the Xeon D-1541 on the VCG for 6.5, a potential issue I'm currently re-investigating very carefully. I'm also wondering when the rest of the Xeon D systems, like the Supermicro SuperServer SYS-E200-8D that William Lam has recently been writing great things about), will be on the VCG too. Will let you all know my findings soon, sign-up to get notified.

    Meanwhile, back to my issue, which seemed related to CIM, the Common Information Model. Did VMware change something related to CIM in their 6.5 release? I also noticed the issue was there with my SYS-5028D-TN4T Xeon D-1567 12 Core Bundle 2 system.

    Honestly, I didn't worry about it much, as I could get around the issue by just setting up health alerts from within the IPMI/BMC interface itself, rather than from ESXi. But I certainly prefer to have things working correctly.

    Turns out I later got my answer. The sfcbd CIM service is now off by default for new installs, which was exactly my circumstance. I was informed that this new default reduced memory and CPU footprint, and that hostd is now reporting hardware status from the /dev/ipmi driver, the same library that the CIM provider uses.

    The workaround for the unfortunate side-effects of this change is detailed below, which basically reverts back to same behavior as 6.0.x releases, with the wbem services back on. This persists-across-reboot fix can be done from the ESX Shell, or from ESXCLI.

    VMware has recently also updated related KB 1025757:

    1. you are performing these operations as root, at your own risk
    2. this fix was tested using VMware ESXi 6.5.0d, other 6.5.x versions haven't been as carefully tested
    3. this fix was tested using BIOS 1.1c and IPMI 3.46, with these exact BIOS settings
    4. this article is not official VMware documentation, it's merely a convenient work-around technique that may help in Xeon D labs.
    5. before you start, back up the ESXi 6.5.x you've already got! If it's USB or SD, then use something like one of the home-lab-friendly methods such as USB Image Tools under Windows, as detailed by Florian Grehl here

    How to restore full Health Status monitoring on Xeon D

    Shutdown all your VMs or put your system in maintenance mode, then issue the following 3 commands in an SSH session on your affected ESXi, as root:

    esxcli system wbem set --ws-man false
    esxcli system wbem set --enable true

    That's it, once the reboot is complete, you'll see normal values, as featured in the photo gallery below, best viewed fullscreen on a 1920x1080 monitor.

    How to revert to partial Health Status monitoring on Xeon D

    Shutdown all your VMs or put your system in maintenance mode, then issue the following 3 commands in an SSH session on your affected ESXi, as root:

    esxcli system wbem set --ws-man true
    esxcli system wbem set --enable false

    Video - short loop

    Mouse-over to reveal playback controls.

    Video walkthrough

    How to fix inaccurate Xeon D Volt and RPM Hardware Health sensor readings on VMware ESXi 6.5

    Before and after screenshots

    Best viewed on a greater than 1920x1080 monitor by clicking on any of the images to start the picture gallery, then click the "Toggle fullscreen" option at top-right.


    Jul 28 2017 Update

    With this fix in place, when looking at the Hardware Health of your Storage, you will see the following warning:

    The Small Footprint CIM Broker Daemon (SFCBD) is running, but no data has been reported. You may need to install a CIM provider for your storage adapter.

    Also noteworthy that ESXi 6.5 Update 1 seems to still have this same wrong RPM/Voltage issue, so this workaround is still needed to get yourself accurate readings.

    Nov 26 2017 Update

    I opened a VMware Service Request on this issue, and have been informed by VMware technical support that these inaccurate readings that I've reported should be fixed in the next release of ESXi. The exact timeframe of that release is TBD. See also related discussion below, where Bob mentions that his Xeon D hardware status is showing "No host data available."

    See also at TinkerTry

    See also