Monday, May 28, 2012

Under the Hood: Classloading

I'm kind of a tinkerer... I like exploring new aspects of things I already know, or new things altogether, even if there is no apparent gain in it. And I bet many of you (well, if there were many readers in the first place...) share this trait with me, because it is one of the things that most engineers share.

The "thing" that I explored in the last week is Java itself, and the new aspect is the Java Bytecode, along with a few intricacies of Java that most programmers never have to worry about, namely bytecode manipulation and class loading.

Just in case you are not aware of it, I'll first describe the lifecycle of a HelloWorld program from writing it, up to its execution. It all starts, of course, with a source file. In the standard case, that's, but there are other languages, like Groovy, Scala and JRuby, that all run on the JVM.
This leads us directly to our next step, compilation. The JVM is, as the name suggests, a machine, just like any computer (except it's "virtual"), and has an instruction set it can execute. The Java compiler javac translates the more or less human readable source code into a .class file that contains code executable by the JVM, as do other compilers like gcj or scalac, except that in case of scalac, the source is written in a different language of course.

One aside about gcj: there are a few Java compiler vendors around, but by far not as many as for other languages like C. The answer is, again, the JVM: while a C compiler creates code for a specific architecture and OS, and therefore each new platform needs a new compiler (although much logic can be reused, of course), Java targets only a single platform, the JVM. Therefore, there's no inherent need for numerous compilers and Java doesn't have the problem of incompatible dialects. Still, there is some competition; for example, gcj is an open source alternative to javac. More importantly, the part of Java that is platform specific is the JVM. In Java's early days, Microsoft had its own Java implementation, which was kind of crappy compared to Sun's and was thankfully soon discontinued. Additionally, there are VMs implemented in Java, or ones that target platforms where Oracle does not provide support, or ones with smaller memory requirements, etc.

So, now we have a class file consisting of Java bytecode, independent of the language we started with. Using the java command, we can launch the main method of that class, but this involves more steps than are apparent!
Every java application has a system class loader. A ClassLoader is responsible for finding class files, reading them, and loading the classes into memory. In the most basic case, the classloader has a list of URLs (the classpath) that contains locations where classes are found. By default, this classpath contains the JRE classes (plus some more) and those found in the current directory, e.g. HelloWorld.class. Other classloaders delegate to a parent classloader, but may add new search locations or other means of getting the bytes that make up a class.
Now, the classloader loads the main class of the application, and the main thread starts, executing the main method. This method may in turn require other classes which are then loaded by the same classloader that also loaded the current class. This means by extension that normally, all classes are loaded by the same classloader that also loaded the main class. However, it's also possible to create and invoke classloaders directly, creating a class that was loaded by a different classloader than the current class. This is important because of two things. Firstly, as I said, the new classloader can add new ways of loading classes (the main reason for using a separate classloader), and secondly because it can introduce subtle problems when multiple classloaders define classes with the same name: they are not compatible, and you get a ClassCastException when you try to mix them. Don't worry, there's an easy solution: define an interface that is loaded by a common parent classloader, and only cast to the interface.

And finally, our main method can execute. And since this post was already long enough, I'll tell the story about bytecode manipulation with ASM next time ;)

1 comment:

Anonymous said...

What a cliffhanger! grr