Is it Real or is it Virtual?
Virtualization and virtual machines have been getting a lot of attention lately. This is mainly due to the increased availability and reliability of commercial virtual machine software from VMware and Microsoft. There is nothing new under the sun though – Gerald J. Popek and Robert P. Goldberg’s seminal 1974 paper. “Formal Requirements for Virtualizable Third Generation Architecture” pretty much defined the playing field for virtual machines – even though the field they were playing on was big iron. We may be heading down the road of esoterica today – my spell checker doesn’t even recognize the word virtualizable.
So what is reality?
Just what do we mean when we use the term virtual? It is generally applied to something that is not conceived or perceived as real but that acts like a real thing. So what is reality? Ren? Descartes’ greatest contribution to western culture was his work in mathematics but he is still best known for his contention that his reality was defined by his ability to think or reason – Cogito ergo sum. His ideas provided the basis of western rational philosophy which was in turn opposed by empiricism. The Irish philosopher Bishop George Berkeley took that idea to its extreme by asserting that the only real existence of anything is the perception we have of that thing in our mind. Thus nothing is real beyond its perception – and you know what that means for trees falling in the forest. Dr. Samuel Johnson the 18th century poet, essayist and biographer was particularly frustrated with Bishop Berkeley’s theories and is reputed to have kicked a heavy stone saying “Thus I refute him”.
Reality (as opposed to virutality) has taken on new meanings in the world of digital computing and is still as difficult to define as ever. Virtual memory is something we are all familiar with. A digital computer operating system uses a persistent storage device (like a hard drive) to expand the volatile memory it has at its command. Random Access Memory or RAM is generally the working area – the area where data is manipulated and processed. When we use up all available RAM we can move portions of data to a persistent storage device and virtually expand main memory. This whole process also allows us to create multi-processing machines. Entire running process can be swapped in and out of RAM allowing us to time share the central processor. Ideally we do all the swapping in RAM thereby eliminating the latency of hard disk writes and reads.
Virtual Application Processes
Sun Microsystems coined the phrase “Write once, run anywhere” to describe the portability of Java bytecode – or more generally code written using the Java programming language. By portability we mean the ability to run on a wide variety of machines using diverse and often incompatible operating systems. That portability dictates that the Java code cannot be compiled before distribution or it would not run on diverse systems. That little problem was solved by distributing Java virtual machines which interpret the code at run time or by compiling it at run time using a Just In Time (JIT) compiler. A java virtual machine is computer software that creates an isolated environment in which the Java bytecode can be interpreted and then translates that into processes that can be accomplished by the host operating system. Sun “only” needs to distribute Java Virtual Machines for all major operating systems and then deliver on the “write once, run anywhere” claim. Of course we all know this got a little confusing when Microsoft decided to distribute its own Java VM….which broke a lot of Sun stuff. So we learned to code the Microsoft Java VM. Then the justice system made us return to the Sun Java VM which broke all our new code. So much for standards.
Our present interest in virtual computing does not really jive with what we have been discussing so far. VMware and Virtual PC are products that allow us to run entire computer systems within the context of another computer system. Just what is it that makes a digital computer a digital computer? Consider these three things or three layers:
Hardware – the CPU and associated input, output and storage devices.
The Operating System – this is a stretch but for our purposes lets call an operating system computer software that provides a link between the hardware and productivity software.
Application software – the reason we are using computers in the first place – software that can only function within the operating system.
So for our purposes right now an example of a “real” machine would be an Intel P4 box running Windows XP and whatever productivity applications we want to assume. Virtual machine software like Microsoft’s Virtual PC runs in our layer three above. In fact it can be considered as another application – an application that emulates a separately running operating system. This virtual machine is hosted in a real OS – and that real OS is the software that is actually interacting with the hardware. On that fact revolves both the advantages and disadvantages of hosted virtual machines.
Let’s assume you need to distribute policy management software to thousands of independent agents. Helpdesk support for this group is already a nightmare. Individual hand holding for each office is out of the question. What if you could distribute your software not just as an application but as a fully configured application running in its own virtual machine? Your users would only need to install the OS specific VM and configured application. A few years ago I spent a few days in the data center of major insurance carrier installing an application we had licensed to them. Installation of the actual software was a snap but there were specific configurations that had to be manually accomplished after installation. That process took hours because I was not permitted to touch a keyboard. How much easier the whole thing would have been had I been able to install a virtual machine with the software installed and preconfigured.
Here is another scenario. You are still going to distribute policy management software to thousand of independent agents. You know that your customers are running operating systems ranging from Windows 98 to XP Professional. So your QA team decides they are going to test on hosted VM’s. They set up multiple virtual test environments with different operating systems and different configurations (available RAM, processor speed, etc.). They intend to load up a virtual machine then install, run and load test the product in that VM. If it passed it is then released for that operating system. That doesn’t sound like such a bad idea but I don’t like it. A hosted virtual machine is just that – an operating system running within the context of another operating system. If my OS isn’t actually touching the hardware I have no guarantees that a “real” system running that OS is going to act in exactly the same way.
Running virtual machines in a hosted environment is a kludgy process. Operating systems are already so bloated – and manufactures try and force so much proprietary stuff on us that system resources are already pushed to the limits. My very nice laptop computer with a well know TLA nameplate came from the factory with so much preloaded “productivity” software that I was considering installing a DOS VM so I could run Word Perfect 5.1 and get some work done. The registry entry for HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionRun had a list of keys so long I needed to scroll the page. No way am I going to be able to run a hosted VM on this box. But even clean, well maintained machines bog down when running VM’s. We have a demo box running Windows 2K3 server running SharePoint Server, SQL Server and Content Management Server in a VM on XP which accesses the data served up by the hosted process. This thing has a fast processor and a ton of RAM but when I fire it off I have time to drive to Starbucks and get some Java before it is finished booting. And my office is pretty far from the nearest barista.
Hardware Level Virtual Machines
If we really want to create efficient virtual machines we need to drop down to the machine level. Let’s face it – once we get too far beyond the processor’s basic instruction set we are virtualized anyway. The heart of the machine is just a series of nand and nor gates…assuming the smallest possible instruction set. By the time we get to pushing and popping data off the stack we are out of binary switch world and already virtualized. Still, the reality is that an operating system will expect certain behavior from the hardware. Systems that allow hardware level virtualization are often known as hypervisors. Hypervisors allow different operating systems to be run on the same machine at the same time. They emerged in the 1970′s – most particularly with IBM 360 systems which were designed and built specifically to allow virtualization. I cut my teeth on VM/CMS (Virtual Machine/Conversational Monitor System) machines and I can attest to their versatility. On the other hand X86 systems were not designed for virtualization. That makes them rather difficult to fully virtualize. One way around this is called paravirtualization – a scheme whereby a relatively simple hypervisor – which delivers a reduced virtualized hardware layer is matched up with an operating system specifically designed to operate with that reduced layer. The reduced virtualization is generally based upon a more trusted security level or protection ring afforded by the actual processor. In essence paravirtualization provides a machine level API that a modified OS must utilize.
Another scheme that falls short of full virtualization is called native virtualization. This is a hardware virtual machine that only virtualizes enough of the hardware to allow an unmodified operating system to run on the VM. The limitation here is that the operating system must be designed to run on the underlying hardware (CPU). BootCamp for the Macintosh provides the ability to install and run Windows XP on an Intel based Macintosh. I have not personally tested this product but the beta appears to provide some layer of hardware virtualization – not necessarily for the processor but for peripheral devices.
Our ultimate goal should be full hardware virtualization. Virtual machines that will allow any operating system to run on any hardware platform will provide the greatest flexibility. Full virtualization will create another paradigm shift. Right now if I need to recreate a failed server, or bring another online in a farm I need to restore an image of the machine and then used my latest backup to recreate the most recent state of the machine. If I need to accomplish this with a totally different machine I may need to rebuild the server starting with OS and then layering on all the various application software.
If I face the same problem but have virtualized all my machines I can bring another server online in a heartbeat. In an emergency I could drop my virtualized middleware server on the same (virtualized) physical machine running my presentation software and minimize downtime – a performance hit is always preferable to downtime. I kind of like the picture – I have a 6U rack of blade servers all running hardware level virtual machines. These guys are accessing my terabyte SAN. I can swap servers in and out of actual physical machines as needs and loads dictate. The only “machine” I need to concern myself with is a virtual one. And if I want to play Dr. Johnson I can always go kick my 200 pound Netfinity server that died last week.