VMware vSphere

VMware recently released a fairly important update to vCenter (details here:  https://blogs.vmware.com/vsphere/2021/05/vmsa-2021-0010.html).  Many folks are probably thinking: “Well, since I need to update vCenter anyway I might as well take the vSphere 7 plunge”.  It’s a great thought but, like any release, please make sure you carefully read the release notes – especially if you boot from SD card (USB or other low-endurance media).  We’ve been seeing a significant increase in boot disk corruption and/or inaccessibility related to a combination of vSphere 7 and U1/U2.

For background, vSphere 7 introduces a new disk layout and partition table for the OS/boot drive.  It was (likely) a long-overdue enhancement, allowing for greater flexibility and capability in ESXi / vSphere itself.  The details are in this VMware blog:  https://blogs.vmware.com/vsphere/2020/05/vsphere-7-esxi-system-storage-changes.html.

When vSphere 7.0 U1 (and U2) was released, the disk (partition) formatting was changed from a fairly typical (old-fashioned) FAT to VMFS-L.  Per VMware: “This new format allows much more and faster I/O to the partition.”  In addition, vSphere 7.0 U1/U2 is “no longer throttling I/O to local boot drives”.  This can easily result in lower-endurance media to “become overwhelmed” and possibly corrupt (https://kb.vmware.com/s/article/83376).  VMware has been actively making updates to the KB article, even over the last few days.

In the field, we’ve observed this manifested as disconnected hosts, “hung hosts”, PSODs, etc.  Unfortunately, a reboot will only mask the problem and (if the server boots) continue to wear on the media.  There is no HCL/matrix for SD-type media either.

While every situation is different, WTG is suggesting:

  1. Apply the critical vSphere vCenter security patch (even if it doesn’t include an upgrade to vSphere 7).
  2. Work with your hardware vendor (“OEM”) for specific guidance for this issue.
  3. Leverage suggested workarounds included in the KB (linked above).
  4. Consider acquiring SDXC / UHS-I / Class 10 SD cards (typically designed for professional video recording/production).
  5. If your server has a disk controller, look into appropriate SSD or HDD media (ideally mirrored).
  6. (similar to other releases) “If you install ESXi on M.2 or other non-USB low-end flash media, delete the VMFS datastore on the device immediately after installation to prevent the storage of virtual machine data” (p.19 “VMware ESXi Installation and Setup, Update 2 vSphere 7.0 / ESXi 7.0”)

In any of the cases of swapping media, we suggest making a backup of ESXi, performing a clean install, then reloading ESXi again (and restoring from backup).  From SSH/shell, this is how to generate a system backup (if you’re using host profiles and distributed switching, you might not need to do this):

vim-cmd hostsvc/firmware/sync_config
vim-cmd hostsvc/firmware/backup_config
(download backup via link that the cli will output!  You can use this to rebuild, if needed, later on.)

If you’ve already upgraded to vSphere 7.0 (especially U1 or U2), you’re going to want to do this sooner or later – before you find a hung host (or hosts) in production.  WTG is happy to provide professional services to assist – just contact us for more details.

About the Author
Matthew Kozloski
WTG's Vice President of Professional Services