When VMware says it will support a maximum partition size of 2TB – 512B, they’re not kidding. And when you somehow create a NetApp LUN of exactly 2TB, and somehow get VMware to create a datastore on it, and somehow manage to operate production servers on it for a couple years without a problem, you will likely review the configuration at some point and realize that you’re sitting on a ticking time bomb.
Now, if at this point you cautiously move all your virtual machines off to another datastore, then you will probably suffer no ill consequences. If, however, one were to misguidedly resize the NetApp LUN to 1.99TB while VMware still thinks it’s a 2.00TB LUN, expect to stay up late trying desperately to salvage your servers from the deathly grips of I/O errors.
Luckily in the situation that inspired this post, only two virtual machines were storing data on the far edge of the datastore that was affected by the sudden change in LUN size. Neither of these servers were home to critical applications or data, and all services and data necessary to production were moved to a stable datastore without issue. After struggling with the data recovery attempts from the corrupt datastore for a day, I decided to cut my losses and restore them from backup.
All was well this time, but things could have been much worse. The moral of the story is, never ever reduce the size of your storage LUN while you have data on it that you want to keep.
… and no, I was not the one who resized the LUN. I did, however, get to clean up the resulting mess!