Behind the Scenes of the Java 1.1 Virtual Machine
- By Ronald Crawford II
Ronald Crawford II is an information systems contractor (expertise includes: C, C++,
Java, Windows, UNIX, Web, and others) based just outside Philadelphia. He can be
contacted at Ronald@Crawford.net.
"We must maintain our credibility at the highest level." —Chuck Lehman
AS YOU ARE well aware, Java is a hot topic in today's computer world. It enables
a software developer to have the ease of "Write Once, Run Anywhere" as coined by Sun
Microsystems, Inc. The Java technology professes the ability to run compiled Java code on
any operating system, network computer, or hardware device that supports Java. We will explore the
mechanism that is responsible for making this happen, so let us take a journey into the Java virtual
machine (JVM), shall we?
The JVM is an abstract software-based machine that resides harmoniously with an actual microprocessor
hosted machine, hence the name virtual machine (VM). Programs developed for the JVM are created
using the Java programming language. This language, like other high-level languages, encompasses a
language syntax that is not targeted for any specific microprocessor. Like other high-level languages,
the Java source code is compiled using a compiler tool but its output differs from the output of
Applications created by high-level languages, unlike Java, create an executable program for the
underlying microprocessor or operating system. Java on the other hand creates an executable program
designed for the JVM. "Wait a second, why do Java programs contain compiled code created exclusively
for the JVM when the JVM is a host to a microprocessor with its own machine language instruction set?"
That's a very good question, and the answer will show you how Sun resolved the portability obstacle.
Because the JVM is a VM, it shares the same characteristics as microprocessors with respect to having
registers, stacks, instruction pointers, an instruction set, and so forth. This design creates a VM
that has the characteristics of an actual hardware machine, and like a machine, the JVM can execute
programs in its own language.
This means that if software developers wrote programs for the JVM, the programs would be certain to
behave the same, regardless of the host microprocessor where the JVM resided.
"How do Java programs work if they are written for the JVM and not the microprocessor?" Designers
of the JVM for their respective microprocessor or operating system must comply with the specification
of the JVM and make the necessary bridge behind the scenes to the underlying microprocessor. This
behind the scenes bridge allows software developers to "Write Once, Run Anywhere" because the
JVM must behave the same regardless of the underlying microprocessor according the specifications of
the JVM by Sun. If a program is developed for the JVM, then it will execute on Microsoft Windows,
Macintosh, UNIX, Webphones, Network Computers, Internet browsers, Web TV, and a countless number of
hardware devices as long as they have a JVM within them.
The JVM, like microprocessors or operating systems, contains technical information that, based on
its complexity, would consume many pages of a book. Henceforth, we are going to look at certain areas
of the JVM to get a better feel for how it operates under the covers.
THE JVM INTERNALS
The JVM, like a real microprocessor, uses internal registers and memory areas for processing and storing
data. Java executable programs are created with a .class extension and contain JVM instructions (bytecodes),
symbol table, and other pertinent data. Unlike other microprocessors, the JVM demands data constraints and
format within the class file for security reasons. These security constraints make the Java technology
appealing to software developers that develop for the Internet as well as stand-alone applications because
of its protection from malicious applications.
The JVM operates on two datatypes: reference types and primitive types that can be passed as
arguments, returned by methods, and stored in variables. Reference types consist of: class types, interface
types, and array types, whose values are references to dynamically created objects of the respective type.
shows examples of reference types in Java source code.
| Table 1. JVM primitive types.|
|byte||- 8-bit signed two's-complement integers Range - 128 to 127 (-27 to 27-1), inclusive|
|short||- 16-bit signed two's-complement integers Range -32768 to 32767 (-215 to 215-1), inclusive|
|int||- 32-bit signed two's-complement integers Range -2147483648 to 2147483647 (-231 to 231-1), inclusive|
|long||- 64-bit signed two's-complement integers Range -9223372036854775808 to 9223372036854775807
(-263 to 263-1), inclusive|
|char||- 16-bit unsigned integers representing Unicode characters Range 'u0000' to 'uffff'; char is unsigned, so 'uffff' represents 65535 when used in expressions, not -1|
|float||- 32-bit IEEE 754 floating-point numbers |
|double||- 64-bit IEEE 754 floating-point numbers|
The primitive types supported by the JVM are shown in Table 1.
References enable the JVM to contain a controlled subsystem. A controlled subsystem is safeguarded
because memory locations are hidden from the software developer. If an application requests a block
of memory for usage, the JVM returns a reference to the memory area that will be used by the application
in future access to the memory area. The reference is not a pointer to a memory address unlike other
languages such as C and C++, but a handle to a memory area hidden from applications. This allows
a security feature that disables direct access to sensitive areas of the JVM subsystem from poorly
written applications or malicious applications.
Primitive types, unlike reference types, contain actual data and not a reference
to some class, interface, or array. Applications cannot obtain the memory address
of a primitive type; hence, the subsystem remains secure. Applications are allowed to read and write
to primitive types but cannot access any pointers based on a primitive type.
TAKING OUT THE GARBAGE
The JVM does not have a memory releasing mechanism such as free() in the C language or delete
in the C++ language. It uses a complex algorithm called "Garbage Collection." Each JVM vendor can
supply their own algorithm for collecting garbage (freeing used resources) as long as it collects
garbage properly. The reason for this freedom in choosing a garbage collection
algorithm is to provide each JVM vendor freedom in using the underlying machine language constructs
for optimum performance. The Java 1.1 API (Applications Programming Interface) specification does
provide a method for explicitly invoking the garbage collector from within an application during runtime:
System.gc(); // Invokes the garbage collector
Another method for ensuring garbage collecting on an explicit reference type is assigning a null value
int array = new int;
for(int index = 0; index < 100;="" index++)="" array[index]="index;" calculatorroutine(array);="" array="null;" notify="" garbage="" collector="" to="" release="" this="" memory="" …="">
The general approach to garbage collecting is "collect during idle time." The garbage collecting
functions within the JVM normally occur during idle time from the user of the application or idle
areas within a running application. As long as a Class, Interface, or Array objects hold a reference,
the memory for that reference will not be garbage collected. As soon as all objects release their
reference to an object, it will be marked for garbage collection.
THE JVM RUNTIME DATA AREAS
Multi-threaded support is included within the JVM. This
multi-threaded support is facilitated using a PC (program counter) register for each concurrently
running thread of execution. The PC keeps track of where the application is at while it's running.
When the JVM issues a thread switch, the current thread that's executing is suspended along
with pertinent data and the last instruction is retained in the PC until it's time for the thread
to continue with its execution.
Despite the fact that the JVM is a VM, it too has a private stack that's created in parallel with
a thread. The JVM stack stores JVM frames, holds local variables and their data, and contributes
to method invocation data as a repository.
A JVM frame is a data structure used to store data and results, as well as performing dynamic
linking to return values for methods and to route exceptions. Frames share the life of a method
and are created and destroyed when the method is entered and exited accordingly. Frames belong
to one thread exclusively and can not be shared between threads.
The JVM has a heap (memory area) that is shared among all threads, however. This heap is the
memory source for objects dynamically created at runtime. Like the heap area, the JVM has a
memory area called the method area. This area houses methods that may be utilized by
a number of threads. Included in the method area space is the JVM constant pool. The constant
pool serves as a symbol table similar to that of conventional programming languages, except that
it contains a richer amount of data. Constant pool holds constant (non-changing data) of numeric
literals and runtime method and field references.
The JVM supports native methods, which are machine-dependent code areas that can be called for
execution from within the JVM. Because a call to a native method transfers control, it too has
a stack called the Native Method Stack. The PC does not change while the native method
is executing because it is not running from within the JVM. When the native method returns, the
PC continues with execution and is updated accordingly.
Exception handling is delegated from within the JVM due to an explicit throw statement by the
application or an error that occurs during runtime. If an exception is not caught, the current
methods operand stack and local variables are removed and its frame is popped, reinstating the
frame of the calling method.
This process continues up through the callers until the appropriate catch statement is located.
If the exception reaches the top of the method-calling list, the thread is abnormally terminated
due to an unresolved exception. If the exception is caught, execution continues from the point
of the method catching the exception and the PC and all stacks are adjusted accordingly.
JVM INSTRUCTION SET
The operands or instruction set that make up byte codes are very small when compared to that of
existing operands for microprocessors. A JVM instruction consists of a one-byte opcode distinguishing
the action to be achieved, followed by zero or more operands providing arguments or data that are
used by the operation. Several instructions have no operands and contain an opcode.
With the exclusion of exceptions, the JVM core engine looks similar to the following pseudo-code fragment:
if (operands() == true)
} while (moreProcessingNeeded());
There are more than 250 opcodes for the JVM (for the technically curious). Many of these opcodes are
similar to existing opcodes for microprocessor units with respect to loading, storing, arithmetic,
comparison, branching, bit operations, stack manipulation, conversion, etc. The opcodes for the
JVM are extremely numerous and show why it is indeed a true VM. There are several microprocessors
that don't have an instruction set as rich as the JVM.
You should now have a better understanding of the JVM and why it's called a VM and why Java is
commanding the attention of software developers worldwide as well as major corporations, industries,
and universities. In a future article we will delve into the inner workings of the JVM within areas
such as the security manager, class file format, threads and locks, how loading and linking work,
and other fun areas of the JVM.