How to fix inaccurate Xeon D Hardware Health sensor readings on VMware ESXi 6.5.x, since sfcbd CIM service is off by default
Backstory
Back during the early test period of VMware's ESXi 6.5, I had inquired with VMware about a problem I had noticed on my Supermicro SuperServer SYS-5028D-TN4T 8 Core Bundle 2, featuring Xeon D-1541. You see, with 6.0, all VMware GUIs would show accurate hardware health details, without any extra effort beyond just installing ESXi 6.0. Yep, no need to add OEM specific CIM drivers here. You would expect this, since the system is the only home-lab friendly (quiet mini-tower) that's on the ESXi VCG, aka, the VMware Compatibility Guide.
Fan RPMs and Voltage readings were off
When I had fresh installed ESXi 6.5, I noticed my fan speeds (RPMs) and my voltages numbers were showing values that were way off. And some health info was just missing entirely, seen in screenshots below. I had been wondering if this offset was related to the postponed-inclusion of the Xeon D-1541 on the VCG for 6.5, a potential issue I'm currently re-investigating very carefully. I'm also wondering when the rest of the Xeon D systems, like the Supermicro SuperServer SYS-E200-8D that William Lam has recently been writing great things about), will be on the VCG too. Will let you all know my findings soon, sign-up to get notified.
Meanwhile, back to my issue, which seemed related to CIM, the Common Information Model. Did VMware change something related to CIM in their 6.5 release? I also noticed the issue was there with my SYS-5028D-TN4T Xeon D-1567 12 Core Bundle 2 system.
Honestly, I didn't worry about it much, as I could get around the issue by just setting up health alerts from within the IPMI/BMC interface itself, rather than from ESXi. But I certainly prefer to have things working correctly.
Turns out I later got my answer. The sfcbd CIM service is now off by default for new installs, which was exactly my circumstance. I was informed that this new default reduced memory and CPU footprint, and that hostd is now reporting hardware status from the /dev/ipmi driver, the same library that the CIM provider uses.
The workaround for the unfortunate side-effects of this change is detailed below, which basically reverts back to same behavior as 6.0.x releases, with the wbem services back on. This persists-across-reboot fix can be done from the ESX Shell, or from ESXCLI.
VMware has recently also updated related KB 1025757:
- How to disable the CIM agent on the ESX/ESXi host (1025757)
https://kb.vmware.com/kb/1025757
While it explains a problem I've not I've experienced, the article includes many of the commands used in the fix below.
- you are performing these operations as root, at your own risk
- this fix was tested using VMware ESXi 6.5.0d, other 6.5.x versions haven't been as carefully tested
- this fix was tested using BIOS 1.1c and IPMI 3.46, with these exact BIOS settings
- this article is not official VMware documentation, it's merely a convenient work-around technique that may help in Xeon D labs.
- before you start, back up the ESXi 6.5.x you've already got! If it's USB or SD, then use something like one of the home-lab-friendly methods such as USB Image Tools under Windows, as detailed by Florian Grehl here
How to restore full Health Status monitoring on Xeon D
Shutdown all your VMs or put your system in maintenance mode, then issue the following 3 commands in an SSH session on your affected ESXi, as root
:
esxcli system wbem set --ws-man false
esxcli system wbem set --enable true
reboot
That's it, once the reboot is complete, you'll see normal values, as featured in the photo gallery below, best viewed fullscreen on a 1920x1080 monitor.
How to revert to partial Health Status monitoring on Xeon D
Shutdown all your VMs or put your system in maintenance mode, then issue the following 3 commands in an SSH session on your affected ESXi, as root
:
esxcli system wbem set --ws-man true
esxcli system wbem set --enable false
reboot
Video - short loop
Video walkthrough
Before and after screenshots
Best viewed on a greater than 1920x1080 monitor by clicking on any of the images to start the picture gallery, then click the "Toggle fullscreen" option at top-right.
Jul 28 2017 Update
With this fix in place, when looking at the Hardware Health of your Storage, you will see the following warning:
The Small Footprint CIM Broker Daemon (SFCBD) is running, but no data has been reported. You may need to install a CIM provider for your storage adapter.
Also noteworthy that ESXi 6.5 Update 1 seems to still have this same wrong RPM/Voltage issue, so this workaround is still needed to get yourself accurate readings.
Nov 26 2017 Update
I opened a VMware Service Request on this issue, and have been informed by VMware technical support that these inaccurate readings that I've reported should be fixed in the next release of ESXi. The exact timeframe of that release is TBD. See also related discussion below, where Bob mentions that his Xeon D hardware status is showing "No host data available."
See also at TinkerTry
-
How to easily update your VMware vCenter Server Appliance from VCSA 6.5.x to 6.5.0d
Apr 18 2017 -
How to easily update your VMware Hypervisor from ESXi 6.5.x to 6.5.0d
Apr 18 2017 -
vSAN 6.6 arrives, baked right into those VMware vSphere 6.5.0d bits that went GA today!
Apr 18 2017 -
VMware ESXi 6.5 runs well on Xeon D Supermicro ServerServers, here's what you need to know
Nov 18 2016 -
Easy fix for Supermicro and other Xeon D systems experiencing SATA3/AHCI slowdown on ESXi 6.5
Nov 16 2016 -
New Supermicro BIOS 1.1c and IPMI 3.46 for Xeon D SuperServers features HTML5 iKVM
Nov 04 2016 - Recommended BIOS Settings for Supermicro SuperServer SYS-5028D-TN4T
Jan 15 2016
See also
-
Curious if others have noticed problems with Hardware Health readings since sfcbd CIM service is now off for new 6.5 installs
Apr 24 2017 by Paul Braren at VMTN Communities
Home > VMTN > VMware vSphere™ > VMware ESXi 6 > Discussions -
VMWare ESXi 6.5 CIM Data Disabled by Default
Feb 06 2017 by Cube Dweller at Squidworks Network Systems Engineers Consortium - ESXi 5.x host is disconnected from vCenter Server due to sfcbd exhausting inodes (2037798)
Jun 29 2016