A virtual machine (VM) snapshot can be a boon when it comes to recovering from a sudden system or VM failure. A virtual machine’s snapshot preserves the state of a VM and all its data. When data is lost due to a system or VM failure, the VM can be recovered from the snapshot such that all its attributes are restored to the original state. While all hypervisor technologies follow the same path, there are a few considerations for
Set a defined VMware snapshots policy
Most organizations with VMware virtual environments consider VMware snapshots as the ideal way of data backup. While snapshots are a good tool for data backup, they are not best suited for backing up all types of VMs.
The VMware snapshot process consists of a copy of the VM image file (.vmdk file) being created on the backup storage. Any subsequent changes to the VM are copied as changes into a delta file. The data from the delta file is applied to the original .vmdk file to restore it.
This approach is suitable for non-I/O intensive VMs such as a web or application server VM, where the server state does not change frequently. A mail server or a database server VM is I/O intensive with rapid data changes. A lot of data on such servers is locked during transactions and would not be available to the VMware snapshot tool. This can result in data inconsistencies for users, post a VM file restore.
Organizations need to clearly define the server VMs that can be backed up with the VMware snapshot tool. Alternatively, they can choose third-party snapshot tools that make use of the VMware storage application programming interfaces (APIs) and are application-aware. They can track and log the data or metadata changes and copy data accordingly, thereby allowing efficient snapshots of mail server or database server VMs.
Creating and storing VMware snapshots
Every VMware snapshot creates a new delta file, while the older one becomes read-only. Since changes to the original VM are made directly to the delta file, the file size keeps increasing. Multiple snapshots of a VM should be taken only if its composition changes frequently or in case of system level changes such as patches or driver updates.
A good practice is to not store multiple snapshots of a VM beyond 72 hours. Snapshots should be used more as a versioning tool than for pure data backup. Unless an organization believes that it might need to revert to an older snapshot after few months, it should be deleted soon.
Deleting VMware snapshots
Unless the snapshots are creating storage overhead, they should not be deleted in a bulk delete operation. A delete operation with the VMware snapshot tool commits all the changes stored in the delta file to the original VM.
If a VM file has three snapshots, deleting all will cause the changes from the third delta file to be copied to the second file. Data from the second delta file will be copied to the first and from the first to the original VM. The delta files are deleted only after the commit process is complete.
If each delta file size is 50 GB, an additional 150 GB storage space would be required to accommodate the growing size of delta files. The commit process for a 100 GB snapshot typically takes three to five hours. This process will therefore put an additional performance overhead on the original VMs and the virtual server infrastructure. A degraded system and application performance, especially in a production environment would be unacceptable. If the commit process locks the original VM, it could result in a timeout for users sending requests to the virtual server.
A good practice would be to allocate 30% more storage to each VMware snapshot. Also delta files should be deleted one at a time, with the oldest copy being deleted first. Bulk delete operations should be performed when I/O activity on the original VMs is at its lowest.
About the author: Pankaj Nath is the senior manager of solutions for virtual private clouds/availability services at Netmagic Solutions.
(As told to Harshal Kallyanpur)
This was first published in May 2011