Sunday, June 3, 2012

The Visitor pattern

A quick interruption from our Bytecode study for the Visitor pattern. A programming pattern is basically a technique which can be applied in different situations for some benefit, and at first I failed to see the benefit behind the visitor pattern. The visitor pattern has four core concepts:
  • There is a SomethingVisitor class or interface.
  • That class has many visitSomeAspect() methods that are either void or return another Visitor for that aspect.
  • A class can form an "entry point" by providing an accept(SomethingVisitor) method.
  • That accept() method calls the visit() methods in an order somehow defined in the SomethingVisitor class.
Let's take ASM as an example: There is the ClassReader class that provides an accept(ClassVisitor) method. This method calls the various visit() methods, starting with visit(), visitSource(), visitOuterClass(), followed by sequences of visitAnnotation(), visitAttribute(), visitInnerClass(), visitField(), visitMethod(), and ended with visitEnd(). In turn, visitAnnotation(), visitField() and visitMethod() return visitors on their own that get the details about the annotation, field or method. What we're actually describing is a tree structure, so why don't we simply create some ClassFile class that has some fields for name, superclass, interfaces and so on? What's the benefit of the pattern?

If you realized it straight on, you're probably smarter than me, but after working for a few days with ASM, I realized it myself. Unlike data structures, methods are inherently chainable. Let me show you some lines from ClassVisitor:

public abstract class ClassVisitor {
    protected final int api;
    protected ClassVisitor cv;

    public ClassVisitor(final int api) {
        this(api, null);
    }

    public ClassVisitor(final int api, final ClassVisitor cv) {
        this.api = api;
        this.cv = cv;
    }

    public void visit(
        int version,
        int access,
        String name,
        String signature,
        String superName,
        String[] interfaces)
    {
        if (cv != null) {
            cv.visit(version, access, name, signature, superName, interfaces);
        }
    }
}


Just ignore the api, which is for future compatibility, and look at cv and visit(). A ClassVisitor can be constructed with another delegate ClassVisitor, and the visit() methods delegate by default; if there is no delegate, the methods are no-ops. That's a really nice behavior, because you can stick ClassVisitors that do only some small thing together to get a more complex behavior. And they probably work right by doing so, because what you expect from one visitor is that it transforms one correct class into another, so each visitor gets a correct input, and you expect it to get a correct output back from it.

Now let's speak about performance. The most obvious thing is that, with the visitor pattern, you don't have to actually build your tree structure in order to process it. Basically, you can think of the visitor pattern as a stream-approach to tree processing. Think of a binary stream: you can chain different kinds of streams together, each operating on the output of the former one, yielding a result that is the composition of all the transformation. The analogue to a tree structure would be a byte array. Instead of chaining streams, each transform() method takes and returns a byte[], which is then passed to the next transform() method. This is a lot of excessive object creation, plus that each transformation processes the array from front to back. In a stream, there is only one pass that does all of the processing in a chain, and it certainly needs less memory. Of course, it's possible that the class providing accept() operates on a real tree structure, but not even that is a requirement. Thinking of binary again, there's no need to read a file into a byte[] before processing it, and then writing it back to a file again.

Let's talk Magic shortly: I repeatedly find myself thinking about the right way to represent cards. There is much you can store about a card: Name, types, abilities, power/toughness. There is also the Multiverse ID, but that is printing- and language dependent. Abilities have translated text, and the printed English text is not the same as the current oracle wording. Abilities can be present in a software-specific, machine-readable language. Images of different resolutions might be present, as well as flavor text for specific printings, artist name, copyright notice, and so on.
To make one data structure for all needs is virtually impossible. The visitor pattern is a tool that might make it possible to make all the data accessible without actually requiring all of it, and a visitor can also be used to transform one tree representation into another one with a different focus, without polluting them with explicit conversion methods featuring major duplication of code.

So the visitor pattern is, after a deeper analysis, nothing of that that you see on the surface: a method- and class name convention, combined with a specified order of method calls. It is a technique for processing complex data structures without the intermediate creation of a huge in-memory tree, and it can even boost performance. It makes you independent of the underlying tree structure, be it in memory, on disk or anywhere else. Instead, you have to conform to a single interface and be able to process the method arguments. And that's something I do call a benefit!

No comments: