Saturday, June 9, 2012

Under the Hood: Transforming Bytecode

This will be more or less a "what I've already done", although without much Java code. It got pretty verbose last time, as I explained how to use the visitors to create a new class. I'll focus on the bytecode transformations this time; the Java code is pretty straightforward then.

So, my goal was to make coding easier when working with properties. Basically, I want to have this:

public class Test {
    private int a;
   
    public void setA(int a) {
        this.a = a;
    }
   
    public int getA() {
        return a;
    }
}

transformed into this:

public class Test {
    private Properties properties = new BasicProperties();
   
    private final Property<Integer> a = properties.cfg(Properties.NAME, "a").property(0);
   
    public void setA(int a) {
        this.a.setValue(a);
    }
   
    public int getA() {
        return a.getValue();
    }
}

Of course, we need to add a few little modifications to control our transformation, but it would still simplify reading the code:

public class Test {
    private Properties properties = new TestProperties();
   
    @Property
    private int a;
   
    public void setA(int a) {
        this.a = a;
    }
   
    public int getA() {
        return a;
    }
}

That's enough for my simplistic version. I'll skip the field changing type and instead concentrate on how code accesses the fields. Now let's look at the source and target bytecodes; original at the top and goal at the bottom:


public void setA(int);
  Signature: (I)V
  Code:
   0:    aload_0
   1:    iload_1
   2:    putfield    #5; //Field a:I
   5:    return

 public void setA(int);
  Signature: (I)V
  Code:
   0:    aload_0
   1:    getfield    #10; //Field a:Lnet/slightlymagic/beans/properties/Property;
   4:    iload_1
   5:    invokestatic    #8; //Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   8:    invokeinterface    #11,  2; //InterfaceMethod net/slightlymagic/beans/properties/Property.setValue:(Ljava/lang/Object;)V
   13:    return


So in the original, we have a simple putfield, whereas our goal is to do a getfield, fetching our wrapper object, and then doing a setValue(Object) with our value. That we have to wrap the int into an Integer ourselves at the bytecode level is a complication, but not so hard.
What really poses a problem is where we realize that we're doing the important stuff: basically, at 2: putfield, we know that setting the attribute is happening. However, the target bytecode differs before that: to fetch the wrapper! Don't be temptated to buffer some instructions and then insert the getfield one entry ahead; imagine we're doing this.a = 2*a instead. It simply wouldn't line up. Instead, we use the swap opcode the JVM provides to bring the stack into order, even though the bytecode will be different than that from javac.
First, look at the stack in the original:

[ Test I ]
putfield
[ ]

and how we could transform the putfield into a sequence of instructions that have the same begin and end stack:

[ Test I ]
invokestatic Integer Integer.valueOf(int)
[ Test Integer ]
swap
[ Integer Test ]
getfield a Property
[ Integer Property ]
swap
[ Property Integer ]
invokeinterface void Property.setValue(Object)
[ ]

Not so hard at all. Note that I did the boxing at first, because swap can't handle 64 bit values like long and double, so I wrap these into references before doing anything else. (Don't ask me what that means in a 64 bit JVM, but since bytecode is platform independent, it's fine if we just know how it behaves in 32 bits)

public int getA();
  Signature: ()I
  Code:
   0:    aload_0
   1:    getfield    #5; //Field a:I
   4:    ireturn


public int getA();
  Signature: ()I
  Code:
   0:    aload_0
   1:    getfield    #10; //Field a:Lnet/slightlymagic/beans/properties/Property;
   4:    invokeinterface    #12,  1; //InterfaceMethod net/slightlymagic/beans/properties/Property.getValue:()Ljava/lang/Object;
   9:    checkcast    #13; //class java/lang/Integer
   12:    invokevirtual    #14; //Method java/lang/Integer.intValue:()I
   15:    ireturn


The same strategy here:

[ Test ]
getfield
[ I ]

becomes:

[ Test ]
getfield a Property

[ Property ]
invokeinterface Object Property.getValue()
[ Object ]
checkcast Integer
[ Integer ]
invokevirtual int Integer.intValue()
[ I ]
which is even more straight forward than the setter. But here comes the constructor:

public net.slightlymagic.beans.test.Test();
  Signature: ()V
  Code:
   0:    aload_0
   1:    invokespecial    #1; //Method java/lang/Object."<init>":()V
   4:    aload_0
   5:    new    #2; //class net/slightlymagic/beans/properties/basic/BasicProperties
   8:    dup
   9:    invokespecial    #3; //Method net/slightlymagic/beans/properties/basic/BasicProperties
."<init>":()V
   12:    putfield    #4; //Field properties:Lnet/slightlymagic/beans/properties/Properties;
   15:    return


public net.slightlymagic.beans.test.Test();
  Signature: ()V
  Code:
   0:    aload_0
   1:    invokespecial    #1; //Method java/lang/Object."<init>":()V
   4:    aload_0

   5:    new    #2; //class net/slightlymagic/beans/properties/basic/BasicProperties
   8:    dup
   9:    invokespecial    #3; //Method net/slightlymagic/beans/properties/basic/BasicProperties
."<init>":()V
   12:    putfield    #4; //Field properties:Lnet/slightlymagic/beans/properties/Properties;
   15:    aload_0
   16:    aload_0
   17:    getfield    #4; //Field properties:Lnet/slightlymagic/beans/properties/Properties;
   20:    ldc    #5; //String name
   22:    ldc    #6; //String a
   24:    invokeinterface    #7,  3; //InterfaceMethod net/slightlymagic/beans/properties/Properties.cfg:(Ljava/lang/Object;Ljava/lang/Object;)Lnet/slightlymagic/beans/properties/Properties;
   29:    iconst_0
   30:    invokestatic    #8; //Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
   33:    invokeinterface    #9,  2; //InterfaceMethod net/slightlymagic/beans/properties/Properties.property:(Ljava/lang/Object;)Lnet/slightlymagic/beans/properties/Property;
   38:    putfield    #10; //Field a:Lnet/slightlymagic/beans/properties/Property;
   41:    return


A constructor has basically three parts: the super constructor call, initializing fields, and the explicit constructor code. If we don't initialize a explicitly in its declaration, it won't appear in the constructor, which is bad if we wait for some bytecode to tell us that it's time for action. So we have to find a place where inserting our own initialization is fine, and that place is NOT directly after the super constructor. Rather, it's after creating the Properties object, our factory for the property wrappers - a putfield instruction. (and this is actually where I stopped, so my code doesn't do this right yet)
Another thing to note is that there might be multiple putfields in the constructor, but only the first one is the initialization, needing our special treatment.
And now to the bytecode of our modification. Let's for now pretend that there is an aload_0; iconst_0; putfield; here; it might be, so we have to prepare that there is something on the stack:

[ Test I ]
invokestatic Integer Integer.valueOf(int)
[ Test Integer ]
swap
[ Integer Test ]
dup_x1
[ Test Integer Test ]
getfield properties Properties
[ Test Integer Properties ]
ldc "name"
[ Test Integer Properties String ]
ldc "a"
[ Test Integer Properties String String ]
invokeinterface Properties Properties.cfg(Object,Object)
[ Test Integer Properties ]
swap
[ Test Properties Integer ]
invokeinterface Property Properties.property(Object)
[ Test Property ]
putfield a Property
[ ]

and the last thing: look at the stack: int begins with two elements, but has up to 5 elements in the middle, so the maxStack must be increased by three (or two, if the value is long or double). It's as simple as storing a field with that value and overwriting visitMaxs() to add it to the stack.

Thanks! If you want to look at the code, you find it here.

No comments: