Based on four years of experience in providing VMware Cloud cloud services, we have seen several situations where the customer has had problems with his/her´s services to do lack of knowledge with the management of virtual machines. We will now summarize several cases in a generalized form in this blog post so that it would be a good reminder or gaining new knowledge. The main benefit for the customer is primarily an overview of best practices and major mistakes to avoid on the VMware Cloud platform.
VM Virtual Hardware Version
It is best practice to use the latest hardware version. This ensures the best compatibility with WaveCom's latest vSphere platform and allows the use of corresponding newer features that significantly speed up the work of virtual machines.
VMware vSphere is constantly updated. A new hardware version is created at least once a year. The version can be changed under the virtual machine “All actions -> Upgrade Virtual Hardware Version ”(the virtual machine must be stopped first).
When using a very old hardware version (lower than v14), serious virtual machine operating system (OS) errors and errors in running the virtual machine on the Cloud Directory platform can be considered.
An outdated version can also affect Availability when backing up and restoring a virtual machine.
As of 05.03.2021, the default hardware version is v18, so it makes sense to upgrade virtual machines to the same version.
Defining the Operating System
To avoid compatibility issues, you must always define the correct OS. If Centos 6 or just Linux is defined and you are actually using Centos 7 (or later) as an OS, sooner or later problems will arise. You can set the OS at “Virtual Machine -> General -> Edit ”.
It is mandatory to use VMware Tools on virtual machines. The package contains essential drivers (network, disk, graphics, etc.). Vmware Tools is installed on the templates provided by WaveCom.
Linux uses the open-vm-tools package, which comes with OS updates. So all you have to do is keep your OS up to date. Just make sure it exists.
Windows servers need to upgrade by yourself, and it's an online activity that doesn't need a restart. To upgrade, select “Virtual Machine -> Actions -> Install VMware Tools ”. After the selection, a new installer will be mounted and you can already update the OS using the wizard.
For example, without VMware Tools, no VMXNET network will work on any virtual machine, and there is no mouse in the Windows console.
VMware vSphere is constantly updated, and new versions of Tools come several times a year.
It is strongly recommended that you use the same storage policy on the virtual machine for both virtual machine files and disks.
VMware Cloud allows you to store virtual machine configuration and swap files in separate locations, and assign a separate location to different disks. However, storing a virtual machine in different locations is the worst possible practice.
In the event of a single storage policy failure or overload, such a practice will cause the virtual machine to shut down, as well as potential data loss that would not otherwise occur.
Storage policy for virtual machine files and swaps can be set at “Virtual Machine -> General -> Edit ”. For disks, this can be set under “Virtual Machine -> Hard Disk -> Edit ”. There we recommend using the "VM default policy", ie the one that is also specified for the configuration files under "General". Then it is also easier to transfer the virtual machine in real time to another storege by setting the new policy only under “General”.
In addition, it is recommended that you ever create multiple disks on a virtual machine or keep different disks on different policies without the urgent need.
This significantly slows down the latency of the virtual machine and can even cause OS errors.
The “TIER2 storage” offered by WaveCom is not intended for mounting on a virtual machine, but primarily for storing templates, iso images and, for example, virtual machines that are not used at the moment.
This is not highly available storage. During maintenance, the policy becomes inaccessible and the virtual machine to which it is mounted may stop working.
Don't forget to unmount the ISO image. In the worst case, the virtual machine will not be able to start later and support intervention will be required.
Leaving the ISO engaged will cause the virtual machine to malfunction during “TIER2 storage” maintenance.
You can easily check the data of virtual machines by exporting their list to a .csv file. To do this, go to “Virtual Machines -> at the top right, select Show Virtual Machines in List (Picture) -> Export VMs ”.
Snapshot - this is not a backup
To create a snapshot, you must have twice as much disk space as the size of the virtual machine. Otherwise, "Snapshot" will not be created. It is worth remembering that holding “Snapshot” for more than a few days will slow down the disk system by about two times. For example, a month-old Snapshot can make a disk system four times slower. At some point, this will result in a possible virtual machine crash and data loss.
Therefore, use Snapshot at your own risk if you want to save the state of the virtual machine. In particular, we recommend that you use it only for short-term for software updates that are at risk of problems.
The same function is performed by Cloud Director Availability, where restore points can be fixed. However, the Availability saved point should be removed immediately when it is no longer needed.
Otherwise, the backup will become useless after a few weeks and can no longer be used for recovery.
Our service allows the amount of vCPU and vRAM to be added hot when the “hot add” feature is activated.
When "Virtual CPU Hot add" is switched off, the virtual machine is about 7% faster. Also there are several known bugs in Windows Server affecting performance if cpu hot add is enambled. To change, turn off the machine and enable or disable the “hot add” in the cpu and memory section under “Compute”. For memory, we recommend keeping the "hot add" active
Due to the limitations of the Linux kernel, no more than 4GB of memory can be added to a running virtual machine (with allready less than 4GB of ram memory).
To get rid of the limitation, the virtual machine must be turned off and the memory increased to more than 4GB.
After that, the memory can be increased without any problems on a running virtual machine.
If you need to change, add, delete a network or ip address, we always recommend selecting “IP Pool”. In this case, the virtual machine is given the first free IP out of the IP range assigned to you. You can also select “Manual IP” and set it yourself. In both cases, the automation configures the network settings in the OS when using the "force recustomization" option. We do not recommend using DHCP. However, if you want to do this, you need to set up the appropriate services in advance. It is also very easy to get a Spoofguard block with networks if you change the IP address in the OS and the Cloud Director is not aware of it, ie there is no corresponding network and ip in the vcloud director.
Or if you put an IP on an OS that is already in use on another machine. It can only be removed by customer support.
Heavy loads or noisy neighborhood
WaveCom's cloud service is a shared environment designed to host low-latency business-critical applications. It is forbidden to place unreasonably large loads on CPU and IO resources in a virtual machine.
Heavy loads are caused by under-provisioned virtual machines, where the capacity of the CPUs is constantly exceeded, because significantly smaller volumes are selected than necessary. For example, a situation where a server has 4 CPU cores but the load is 8. This means that the number of CPUs is not enough and a queue occurs for CPU time. However, if the load increases to 16 or more, this is already a situation similar to a service barrier attack. In this case, the services of this machine will not work either.
Longer benchmark tests and scripts also cause a heavy load that might results in exceeding core and IO capacity. In particular, we are referring to a situation where the test is not well thought out and reasonably performed, or means are used that do not allow it. If necessary, we can always enable longer and more voluminous tests by moving the test virtual machine to a suitable host, where disturbance to other customers is minimal.
Also, our platform is not designed to host large backup applications, even if they only run at night. We offer other options for hosting large-scale backup applications, such as a private server (or other solutions), which are mostly significantly cheaper.
To avoid serious problems, backup to the production virtual data center (vDC) should be avoided.
For this purpose, WaveCom has a corresponding service with a "dedicated DR vDC". When using Availability, you should always back up to a backup / replication site. Otherwise, the backup puts a very heavy load on the storage and the network, which disrupts the production vDC virtual machines. However, if you have done so and also for the production storage policy, it is very easy to get the vDC storage space full from which strong anomalies begin.
Availability monitors the changed blocks of the virtual machine, from which deltas are made on the destination site according to the set recovery time objective (RTO).
If you have deleted content and created new content on a virtual machine with a larger disk, recovery to the last RPO is slow and errors may occur.
Errors occur because, in reality, a virtual machine cannot consist of a large disk and a large delta. If the data is consolidated by restoring, then the delta machine is considered a disk and recovery is possible. This can take hours on a larger disk. For longer backup times, remove replication and do it again.
It is best practice to constantly check and test your backup solution.
It is wise to do a failover test once a month to assess the speed and consistency of recovery. During the test, the desired vApps or virtual machines are started in the DR data center, but their networks are isolated. The networks are available to the restored virtual machines at the same time, which in turn makes it possible to make sure that the services work in the restored environment.
Availability allows you to fix restore points. However, the Availability saved item should be removed immediately when it is no longer needed. Otherwise, the backup will become useless after a few weeks or months and can no longer be used for recovery.
We definitely recommend enabling vCloud Availability notifications, which can be done in the menu "Events -> Notifcations". This will give you important email notifications about backup work and possible disruptions.