Behind the Scenes of the Java 1.1 Virtual Machine

Ronald Crawford II is an information systems contractor (expertise includes: C, C++, Java, Windows, UNIX, Web, and others) based just outside Philadelphia. He can be contacted at [email protected].

"We must maintain our credibility at the highest level." —Chuck Lehman

AS YOU ARE well aware, Java is a hot topic in today's computer world. It enables a software developer to have the ease of "Write Once, Run Anywhere" as coined by Sun Microsystems, Inc. The Java technology professes the ability to run compiled Java code on any operating system, network computer, or hardware device that supports Java. We will explore the mechanism that is responsible for making this happen, so let us take a journey into the Java virtual machine (JVM), shall we?

The JVM is an abstract software-based machine that resides harmoniously with an actual microprocessor hosted machine, hence the name virtual machine (VM). Programs developed for the JVM are created using the Java programming language. This language, like other high-level languages, encompasses a language syntax that is not targeted for any specific microprocessor. Like other high-level languages, the Java source code is compiled using a compiler tool but its output differs from the output of traditional compilers.

Applications created by high-level languages, unlike Java, create an executable program for the underlying microprocessor or operating system. Java on the other hand creates an executable program designed for the JVM. "Wait a second, why do Java programs contain compiled code created exclusively for the JVM when the JVM is a host to a microprocessor with its own machine language instruction set?" That's a very good question, and the answer will show you how Sun resolved the portability obstacle.

Because the JVM is a VM, it shares the same characteristics as microprocessors with respect to having registers, stacks, instruction pointers, an instruction set, and so forth. This design creates a VM that has the characteristics of an actual hardware machine, and like a machine, the JVM can execute programs in its own language.

This means that if software developers wrote programs for the JVM, the programs would be certain to behave the same, regardless of the host microprocessor where the JVM resided.

"How do Java programs work if they are written for the JVM and not the microprocessor?" Designers of the JVM for their respective microprocessor or operating system must comply with the specification of the JVM and make the necessary bridge behind the scenes to the underlying microprocessor. This behind the scenes bridge allows software developers to "Write Once, Run Anywhere" because the JVM must behave the same regardless of the underlying microprocessor according the specifications of the JVM by Sun. If a program is developed for the JVM, then it will execute on Microsoft Windows, Macintosh, UNIX, Webphones, Network Computers, Internet browsers, Web TV, and a countless number of hardware devices as long as they have a JVM within them.

The JVM, like microprocessors or operating systems, contains technical information that, based on its complexity, would consume many pages of a book. Henceforth, we are going to look at certain areas of the JVM to get a better feel for how it operates under the covers.

The JVM, like a real microprocessor, uses internal registers and memory areas for processing and storing data. Java executable programs are created with a .class extension and contain JVM instructions (bytecodes), symbol table, and other pertinent data. Unlike other microprocessors, the JVM demands data constraints and format within the class file for security reasons. These security constraints make the Java technology appealing to software developers that develop for the Internet as well as stand-alone applications because of its protection from malicious applications.

The JVM operates on two datatypes: reference types and primitive types that can be passed as arguments, returned by methods, and stored in variables. Reference types consist of: class types, interface types, and array types, whose values are references to dynamically created objects of the respective type. Listing 1 shows examples of reference types in Java source code.

Table 1. JVM primitive types.
byte- 8-bit signed two's-complement integers Range - 128 to 127 (-27 to 27-1), inclusive
short- 16-bit signed two's-complement integers Range -32768 to 32767 (-215 to 215-1), inclusive
int- 32-bit signed two's-complement integers Range -2147483648 to 2147483647 (-231 to 231-1), inclusive
long- 64-bit signed two's-complement integers Range -9223372036854775808 to 9223372036854775807 (-263 to 263-1), inclusive
char- 16-bit unsigned integers representing Unicode characters Range 'u0000' to 'uffff'; char is unsigned, so 'uffff' represents 65535 when used in expressions, not -1
float- 32-bit IEEE 754 floating-point numbers
double- 64-bit IEEE 754 floating-point numbers

The primitive types supported by the JVM are shown in Table 1.

References enable the JVM to contain a controlled subsystem. A controlled subsystem is safeguarded because memory locations are hidden from the software developer. If an application requests a block of memory for usage, the JVM returns a reference to the memory area that will be used by the application in future access to the memory area. The reference is not a pointer to a memory address unlike other languages such as C and C++, but a handle to a memory area hidden from applications. This allows a security feature that disables direct access to sensitive areas of the JVM subsystem from poorly written applications or malicious applications.

Primitive types, unlike reference types, contain actual data and not a reference to some class, interface, or array. Applications cannot obtain the memory address of a primitive type; hence, the subsystem remains secure. Applications are allowed to read and write to primitive types but cannot access any pointers based on a primitive type.

The JVM does not have a memory releasing mechanism such as free() in the C language or delete in the C++ language. It uses a complex algorithm called "Garbage Collection." Each JVM vendor can supply their own algorithm for collecting garbage (freeing used resources) as long as it collects garbage properly. The reason for this freedom in choosing a garbage collection algorithm is to provide each JVM vendor freedom in using the underlying machine language constructs for optimum performance. The Java 1.1 API (Applications Programming Interface) specification does provide a method for explicitly invoking the garbage collector from within an application during runtime:

System.gc();   // Invokes the garbage collector

Another method for ensuring garbage collecting on an explicit reference type is assigning a null value after usage:
int array[] = new int[100];
for(int index = 0; index < 100;="" index++)="" array[index]="index;" calculatorroutine(array);="" array="null;" notify="" garbage="" collector="" to="" release="" this="" memory="" …="">

The general approach to garbage collecting is "collect during idle time." The garbage collecting functions within the JVM normally occur during idle time from the user of the application or idle areas within a running application. As long as a Class, Interface, or Array objects hold a reference, the memory for that reference will not be garbage collected. As soon as all objects release their reference to an object, it will be marked for garbage collection.

Multi-threaded support is included within the JVM. This multi-threaded support is facilitated using a PC (program counter) register for each concurrently running thread of execution. The PC keeps track of where the application is at while it's running. When the JVM issues a thread switch, the current thread that's executing is suspended along with pertinent data and the last instruction is retained in the PC until it's time for the thread to continue with its execution.

Despite the fact that the JVM is a VM, it too has a private stack that's created in parallel with a thread. The JVM stack stores JVM frames, holds local variables and their data, and contributes to method invocation data as a repository.

A JVM frame is a data structure used to store data and results, as well as performing dynamic linking to return values for methods and to route exceptions. Frames share the life of a method and are created and destroyed when the method is entered and exited accordingly. Frames belong to one thread exclusively and can not be shared between threads.

The JVM has a heap (memory area) that is shared among all threads, however. This heap is the memory source for objects dynamically created at runtime. Like the heap area, the JVM has a memory area called the method area. This area houses methods that may be utilized by a number of threads. Included in the method area space is the JVM constant pool. The constant pool serves as a symbol table similar to that of conventional programming languages, except that it contains a richer amount of data. Constant pool holds constant (non-changing data) of numeric literals and runtime method and field references.

The JVM supports native methods, which are machine-dependent code areas that can be called for execution from within the JVM. Because a call to a native method transfers control, it too has a stack called the Native Method Stack. The PC does not change while the native method is executing because it is not running from within the JVM. When the native method returns, the PC continues with execution and is updated accordingly.

Exception handling is delegated from within the JVM due to an explicit throw statement by the application or an error that occurs during runtime. If an exception is not caught, the current methods operand stack and local variables are removed and its frame is popped, reinstating the frame of the calling method.

This process continues up through the callers until the appropriate catch statement is located. If the exception reaches the top of the method-calling list, the thread is abnormally terminated due to an unresolved exception. If the exception is caught, execution continues from the point of the method catching the exception and the PC and all stacks are adjusted accordingly.

The operands or instruction set that make up byte codes are very small when compared to that of existing operands for microprocessors. A JVM instruction consists of a one-byte opcode distinguishing the action to be achieved, followed by zero or more operands providing arguments or data that are used by the operation. Several instructions have no operands and contain an opcode.

With the exclusion of exceptions, the JVM core engine looks similar to the following pseudo-code fragment:

do {
         if (operands() == true)
} while (moreProcessingNeeded());

There are more than 250 opcodes for the JVM (for the technically curious). Many of these opcodes are similar to existing opcodes for microprocessor units with respect to loading, storing, arithmetic, comparison, branching, bit operations, stack manipulation, conversion, etc. The opcodes for the JVM are extremely numerous and show why it is indeed a true VM. There are several microprocessors that don't have an instruction set as rich as the JVM.

You should now have a better understanding of the JVM and why it's called a VM and why Java is commanding the attention of software developers worldwide as well as major corporations, industries, and universities. In a future article we will delve into the inner workings of the JVM within areas such as the security manager, class file format, threads and locks, how loading and linking work, and other fun areas of the JVM.