Now, let's get serious! We know how ClassLoaders are responsible for getting the binary class data to load Java classes, and we know how that binary class data actually looks. What's missing is a way to effectively modify that data, without worrying too much about the low-level details of data storage in a class file.
And that's where ASM comes into play. ASM is one library (BCEL is another) for bytecode manipulation, and it seems to be the "tool of choice" out there. That means it's relatively active in development, yet mature and usable. Let's dive straight into a code example for creating (not modifying) a class with ASM:
public class AsmBuilder implements Opcodes {
public static final byte[] HelloWorld = HelloWorld();
private static byte[] HelloWorld() {
ClassWriter cw = new ClassWriter(0);
cw.visit(V1_6, ACC_PUBLIC | ACC_SUPER, "net/slightlymagic/asm/test/HelloWorld", null, "java/lang/Object",
null);
cw.visitSource("<generated>", null);
HelloWorld_init(cw);
HelloWorld_main(cw);
HelloWorld_hello(cw);
return cw.toByteArray();
}
private static void HelloWorld_init(ClassWriter cw) {
MethodVisitor mv = cw.visitMethod(ACC_PUBLIC, "<init>", "()V", null, null);
mv.visitCode();
mv.visitVarInsn(ALOAD, 0);
mv.visitMethodInsn(INVOKESPECIAL, "java/lang/Object", "<init>", "()V");
mv.visitInsn(RETURN);
mv.visitMaxs(1, 1);
mv.visitEnd();
}
private static void HelloWorld_main(ClassWriter cw) {
MethodVisitor mv = cw.visitMethod(ACC_PUBLIC | ACC_STATIC, "main", "([Ljava/lang/String;)V", null, null);
mv.visitCode();
mv.visitTypeInsn(NEW, "net/slightlymagic/asm/test/HelloWorld");
mv.visitInsn(DUP);
mv.visitMethodInsn(INVOKESPECIAL, "net/slightlymagic/asm/test/HelloWorld", "<init>", "()V");
mv.visitMethodInsn(INVOKEVIRTUAL, "net/slightlymagic/asm/test/HelloWorld", "hello", "()V");
mv.visitInsn(RETURN);
mv.visitMaxs(2, 1);
mv.visitEnd();
}
private static void HelloWorld_hello(ClassWriter cw) {
MethodVisitor mv = cw.visitMethod(ACC_PUBLIC, "hello", "()V", null, null);
mv.visitCode();
mv.visitFieldInsn(GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
mv.visitLdcInsn("Hello, World");
mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V");
mv.visitInsn(RETURN);
mv.visitMaxs(2, 1);
mv.visitEnd();
}
}
What you see here is a series of
visitSomething() calls on a
ClassWriter (which implements
ClassVisitor) and the
MethodVisitors created from it. The
Opcodes interface impelemented just gives us easy access to some useful constants. Before elaborating on the Visitor pattern next time, let's go through the calls:
ClassVisitor.visit() gets all data that is necessary once for a class definition; that includes class version, access flags like public, class name, superclass name, signature (for generic classes like
List<T>) and an array of implemented interfaces.
visitSource() gets one piece of optional information; the source file name, plus a debug string that I know nothing about.
Now to the interesting method instructions:
visitMethod() again takes some data that is needed once for each method: access flags, name, method descriptor, signature, and declared exceptions.
visitCode() simply indicates that the actual instructions follow, as opposed to the optional annotations.
visitVarInsn(ALOAD, 0) takes an opcode that works with local variables (either
LOAD or
STORE) of a specific type (a reference in this case; don't ask me why the prefix is
A) and takes the index of a local variable; as I said earlier, 0 is the implicit
this reference for nonstatic methods. So, in plain words, this instruction pushes
this onto the stack.
visitMethodInsn(INVOKESPECIAL, "java/lang/Object", "<init>", "()V") invokes a method. In this case, because it's a constructor (see the name), using the
INVOKESPECIAL opcode. There are four other
INVOKE opcodes, but we'll only see two others in this example. The next argument is the owner, i.e. the class containing the called method, followed by name and descriptor. If you haven't guessed it, this is a
super() call, invoked on
this. (Ironically, writing
this.super(); in Java is illegal, yet the implicit reference is needed of course)
Last but not least, there's
visitInsn(RETURN). Unlike
visit*Insn(),
visitInsn() takes no additional arguments, because the opcodes supported by this instruction simply don't need any.
RETURN is only one of many
RETURN opcodes for void methods (including constructors).
Finally, there's
visitMaxs(1, 1) and
visitEnd(). The
maxs are
stack and
local.
stack is 1, because there is at most one entry on the stack, namely
this after
ALOAD 0. And the sole local variable is, again,
this, ammounting for 1 again.
A look at
main([Ljava/lang/String;)V: We have a new type of instruction again!
visitTypeInsn(NEW, "net/slightlymagic/asm/test/HelloWorld") takes an opcode that needs a class name. In this case, we allocate a new object (of our class) on the heap.
visitInsn(DUP) duplicates the top element on the stack, i.e. the reference.
visitMethodInsn(INVOKESPECIAL, "net/slightlymagic/asm/test/HelloWorld", "<init>", "()V") invokes the constructor on our new object. Think about it: the constructor has an implicit argument that must be on the stack prior to calling, and in the case of additional parameters, these have to come even after the new reference, so there's essentially no way to integrate the constructor call into the
NEW opcode. (Well, of course you could, but why complicate the VM by introducing a different order of arguments for constructor calls if you can make it a really regular
INVOKESPECIAL call simply by splitting object creating into two opcodes?) And here, you have the reason for the previous
DUP:
<init> is
void, but we want to use the object afterwards, so we have to keep the reference on the stack some other way.
visitMethodInsn(INVOKEVIRTUAL, "net/slightlymagic/asm/test/HelloWorld", "hello", "()V") invokes
hello().
INVOKEVIRTUAL is probably the most common
INVOKE opcode. It's used for all nonstatic, non-constructor, non-interface methods (and now, if you wonder what the fifth variant, as hinted above, is: it's
INVOKEDYNAMIC, which is a new feature targeted at dynamic JVM languages, not Java itself). The big point about virtual method invocation is overriding. The VM searches for an overridden variant of the called method, which is not necessarily that of the
owner class given in the instruction.
visitInsn(RETURN); visitMaxs(2, 1); visitEnd(); there were two entries on the stack after
DUP, and there is one parameter
[Ljava/lang/String; in the main method.
Now
hello()V:
visitFieldInsn(GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;"), yet another type of instruction.
visitFieldInsn() supports the four logically named opcodes
GETFIELD,
GETSTATIC,
PUTFIELD and
PUTSTATIC. The nonstatic variants take a reference from the stack for obvious reasons. Then there's an owner, a field name, and a field descriptor. Unlike
LOAD and
STORE, where there's no descriptor, but different opcodes for different types.
visitLdcInsn("Hello, World"), where
LDC stands for "load constant". The argument is of type
Object, but must be a primitive wrapper like
Integer, or
String.
visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V") is our first method invocation with real arguments! I hope you can follow what's currently on the stack; there's a
PrintStream (
System.out) and a
String (
"Hello, World"), and now we're invoking a nonstatic method of
PrintStream, taking a
String as an argument. Knowing the order of elements on the stack is important when trying to modify bytecode; obviously, when you insert instructions, you have to leave the stack as you found it to have the following code work correctly (and don't forget to tamper with maxStack afterwards!)
visitInsn(RETURN); visitMaxs(2, 1); visitEnd(); there were two entries on the stack after
LDC, and the method is not static.
Now, to make this post longer, and not make you wait on how to invoking your new class, let's look at the classloading part. we have
byte[] containing the class and need a
Class from it. Take a look at this simple custom
ClassLoader:
public class DynamicClassLoader extends ClassLoader {
private final HashMap<String, byte[]> bytecodes = new HashMap<String, byte[]>();
public DynamicClassLoader() {
super();
}
public DynamicClassLoader(ClassLoader parent) {
super(parent);
}
public void putClass(String name, byte[] bytecode) {
bytecodes.put(name, bytecode);
}
@Override
protected Class<?> findClass(String name) throws ClassNotFoundException {
//remove here, because the class won't be loaded another time anyway
byte[] bytecode = bytecodes.remove(name);
if(bytecode != null) {
return defineClass(name, bytecode, 0, bytecode.length);
} else {
return super.findClass(name);
}
}
}
... and take a look at that simple main method:
public static void main(String[] args) throws Exception {
DynamicClassLoader l = new DynamicClassLoader();
l.putClass("net.slightlymagic.asm.test.HelloWorld", AsmBuilder.HelloWorld);
Class<?> HelloWorld = l.loadClass("net.slightlymagic.asm.test.HelloWorld");
HelloWorld.getMethod("main", String[].class).invoke(null, (Object) args);
}
Of course, we can't plainly reference a class that isn't present at compile time, but we can use reflection to test it. Of course, normally the generated class would implement some interface so we can store it in a variable other than an
Object, and make regular method calls through the interface.