How to free up VMware ESXi local drive VMFS datastore space after deleting data in a thin provisioned Windows VM - SDeleting or Powershelling and Hole Punching avoids vMotion workaround

Posted by Paul Braren on Nov 9 2015 (updated on Jul 6 2016) in

This is one of those home lab housekeeping duties that I just hadn't gotten around to. Kind of gets your attention when your datastores start to fill, or you can't Storage vMotion VMs in. Eventually you get weary of the warnings, and you certainly don't want to get to 100% full, when VMs get frozen.

So it was time to dig in and fix it.

Backstory

I have a VM that I occasionally use for Hamachi VPN Gateway duties, at least until I get my OpenVPN appliance wired straight to my cablemodem again. This was my last of my home lab's VMs stuck back in Windows 7. I was curious how an upgrade would go, straight to Windows 10. Of course, I also wanted to be able to roll-back very easily, if things went sideways. So I grabbed a snapshot of the VM before I began, as a very quick backup of sorts. I then mounted the Windows 10 windows.iso file, and ran through the usual upgrade process. After I was comfortable that all was well with the VM, and finished up some minor tweaks to Hamachi's gateway function to get it going again, I then deleted all the snapshots. This level of commitment also removed my instant way to revert to Windows 7.

I next looked at the C: drive in the Windows 10 VM after the upgrade, and it showed 42GB of data.

After running the "Disk Clean-up for C:" wizard, I now had 21GB of data, halving my storage need. But guess what? The VMFS datastore space in use indication didn't drop at all. And this was precious SSD storage, where I leave all my most used VMs are left running.

The dirty secret here is that here is no automatic space reclamation in this scenario. Uh oh. That's right, your VMFS filesystem that has a bunch of thin provisioned VMs will invariably run into trouble, especially obvious with smaller SSDs.

All my VMs that I leave running on SSD storage are thinly provisioned, even VCSA. That way, the VM "thinks" it has, say, a 750GB drive (could go all the way up to 62TB with EFI virtual BIOS type). The physical SSD is actually much much smaller, merely 256GB in this case, actually. Using thin provisioning for everything, I don't have to try to figure out in advance how much storage the VM will ever need. This tends to waste space, and increases the hassle, having to grow that Windows VM's C: drive. My alternative means I do have to monitor datastore usage careful. Those are both topics for another day. For now, let's just say I'm done with thick provisioning in my home lab, even on spinning drives, with my whole storage strategy laid out here.

I don't want to set side storage that I wind up never using, and I've noticed that SSDs are fast enough that the impact of not pre-allocating/thick provisioning isn't noticeable.

Technically, you could Storage vMotion to another drive, then Storage vMotion back again, as discussed by others folks in disbelief about local disk reclaim, over in the VMware Community forums here.

What if I not only don't want to do the double-vMotion workaround, or if I don't even have a drive with enough room available to do such a time consuming, multi-step workaround?

So started the Googling, and some more Googling, to discover many complicated ways to do reclaim. Along with many ways that are not applicable, such as the VAAI Primitive called UNMAP that just isn't available on local datastores, that part of the "HW acceleration" you see on SAN LUNs and iSCSI/NFS NAS datastores, see also Synology VAAI and:

Using esxcli in vSphere 5.5 and 6.0 to reclaim VMFS deleted blocks on thin-provisioned LUNs (2057513)

Back to local disks. Another of the reclaim methods that I read about somewhere suggested I turn off Change Block Tracking on the VM, which seemed to me would give backup products such as Veeam or NAKIVO issues some fits later on, especially for my huge Windows 2012 R2 Essentials VM, with several terabytes of VEB PC backups and video projects on there.

Can it really be this difficult? Isn't there an easier way? Yes, there is!

So the hunt for a better way continued. Once I stumbled upon this article:

Reclaim disk space from thin provisioned VMDK files in ESX
Dec 12 2014 by lunarg at Black Manticore

I was now confident enough that SDelete/vmkfstools combo was likely to work out, and looked more appealing than the prior tinkering I had done with the much more complicated CBT method that required many steps and tools, for my much larger spinning disk (it took overnight). Read more about the implications of turning off CBT in VMware KBs here and here.

SDelete has built up many years of trust. But note that folks that are trying to shrink modern Windows OS's disk usage likely also have Powershell. Then it might be better to use David Tan's Powershell script, which is supposed to be much faster for zeroing out those now-unused bits. I haven't tested that, yet.

So with the new new instructions handy, I began the video recording. That's right, you'll hear me share my thought process as I went about developing and testing a simpler process, my first time, in the video below. In the end, it worked out nicely, and I was able to type up the below instructions from it. Learned a new thing or two along the way to, hope you do too! If you have a much bigger job, say 4TB of stuff with 1TB recently deleted, you can expect the each of the two steps to take hours on such spinning drives, maybe even days. Also, your VM will need to be left powered off for that 2nd step, the hole punching with vmkfstools.

The raw video footage from my October test has been on the cutting room floor for a few weeks. Then this amazingly timed tweet arrived, motivating me to get this October video finally published. Here's some excerpts from the conversation:

Q: On ESXi VMFS with thin provisioning, if the disk usage is 30gb and I reduce it to 25gb, is it possible to reclaim?

I believe your storage would have to support sdelete/unmap commands. Assuming local disk?

thanks, but our NVMe or 850 still do not have HW acceleration support, so I guess this will not work

Perhaps I'm missing something. I'd quite love for there to be an even easier/better/faster/GUI way to do this, ideally with no downtime. Couldn't VMware tools "talk" to the Windows VM, and ask it to effectively do an SDelete for you? Is there a licensing or API or liability restriction at play here?

At least for now, I have the newly-simplified procedure I'm happy to finally share with you.

Please let me know if you have a better, faster, and/or easier method to reclaim VMFS datastore space on a local drive. Just drop a comment below, so we'll all know. Thank you!

Pre-requisites

a VM you thinly-provisioned
root access to your ESXi host (I tested with ESXi 6.0 Update 1a)
a willingness to endure downtime for this VM, and a mighty busy drive, for however many minutes (or hours) it takes

TinkerTry VMFS Storage Reclamation almost sounds fun, with SDeleting and Hole Punching, and even a little Drive Zapping!

These instructions are assuming an intermediate level of sysadmin and VMware ESXi computer skills. Your data is your responsibility, proceed at your own risk, and review the disclaimer at the far bottom left of all TinkerTry pages.

in the vSphere Client or vSphere Web Client, open Datastore Browser, and have a look at the size of the files in the VM's directly
C'mon, you never typed "ls" when you meant "dir"?
cleaning up files in that Windows VM, such as Previous Windows installation(s)
download SDelete v1.61
open an Administrative Command Prompt and change to the director where sdelete.exe resides
type
sdelete -z c:
and wait for it to be 100% complete
It's done! "Free space cleaned" sounds good, but perhaps better words could have been chosen than "1 drives zapped." Don't worry, your data is fine.
open an SSH session (PuTTY) to your ESXi server.
I tested with ESXi 6.0 Update 1a, but this should work with any ESXi version.
(if you forgot to enable SSH, here's how)
cd your way into the VMFS datastore using tab for type-ahead, for example
cd /vmfs/volumes
ls -l
now you should be able to recognize the name of the datastore you wish to go do:
cd yourdatastorename
ls -l
now you can see the directory (folder) that the VM lives in
cd yourvmsfoldername
ls -l
vmkfstools -K WIN10-test1.vmdk
type-ahead made getting this syntax right much easier. I started typing WIN10, then it tab, and it typed the rest, in the video below, this "Hole Punching" process took about 5 minutes, as the roughly 20GB of shrinkage took place, on my SATA3 SSD
back at the Datastore Browser, have a look at the size of the files in the VM's directory, if nothing changed, click the refresh icon (not F5)
now you can power your VM back up again

You're good to go, with more space, and less worry!

vmfs-before-needs-refresh — Before the 2 steps were followed.

vmfs-after — After both steps were complete, I needed to click on Refresh to see the happy results. Now I have room for the bigger VM I was originally trying to move onto this SSD when I noticed space was not sufficient. There was obviously other stuff I had cleared out too, besides just the 21GB.

Thorough video walk-through of SDeleting and Hole Punching, with detailed narration.

Bonus Tip

If you have a VM that you thick provisioned back when you made it, and now you regret that decision because it's way too small or too big, no problem! Just vMotion to another drive, choosing "Change storage only," then vMotion back again, choosing "Thin provioned." It will now take up the smaller amount of space, without needing vmkfstools. The catch is that you need enough swing space to move it to, and a lot of time, for a multiterabyte scenario. Thus, the value of the in-place method of SDelete/vmkfstools.

Jul 06 2016 Update

Was listening to Virtually Speaking: Episode 18: Veeam ON with Rickatron recently, and my ears perked up when I got to this exact spot, 23 minutes and 35 seconds into the excellent podcast:

pca.st/dAij#t=1415
where Veeam's Rick Vanover explains:

The BitLooker is pretty cool...how many times have you seen like a VMDK grow...with like a thin provisioned one especially...nobody wants to get into the PowerShell or whatever and do like a thin provision reclaim? You've seen these, the disks get big, but you know that they're not really using all of that. Well, BitLooker, when we backup, we're actually looking at the filesystem table to ignore what we call dirty blocks. So, it's the simplest thing. We started this actually in 2011 by skipping the page file, because we would just look at the file table, determine the coordinates of the page file and the disk geometry, and then skip that on the data mover. But we're expannding that to actually read parts of the file table that include deleted blocks, which have still claimed space on the VMDK, but we're actually now able to skip those, and here's a little tip for any listeners out there, and you actually even do this with a free trial of Veeam Backup & Replication. You can use BitLooker even in a replication job. So if you have a virtual disk, and you don't have an array, or you don't have the PowerShell or PowerCLI prowess to do any thin provision reclaim, you can actually do a replication job and do what I would call a planned failover. So replicate something to something new, and when you fail it over, the disk that you'll have left has all those dirty blocks removed. So if you have some VM that you had to hold a terabyte stuff for, and then you deleted that terabyte of stuff, if you replicated it and failed it over, that terabtye of stuff is gone...Little things like that can be a nice little pro-tip that people can have. The right thing that I should do is make a script that would like do that for you. There's a small amount of downtime, but it can really put you in a better spot, because...an end all be all thin provision reclaim is really hard to come by...I was in a pinch with one of my little things at home, and I'm like I could use that, and I did, and it was awesome.

See also Vaughn Stewart's April 9th conversation about space reclamation at: