Wednesday, April 24, 2013

F%$*ing Lambdas, How Do They Work?

Update: A friend was reading over this and noted that I should have linked to my two previous posts on Project Lambda, to provide more context:

On my Twitter feed, I recently got a link to this document by Brian Goetz explaining how JDK8 translates lambdas into bytecode. I was surprised, since I had assumed that lambdas were simply syntactic sugar for anonymous inner classes that implement one of the functional interfaces, much as anonymous functions in Scala currently work. That is, I thought that Project Lambda was basically doing the same thing as I've been doing with my functional types, but eliminating the annoying boilerplate that goes along with an anonymous class and using better type inference.

It turns out that it's considerably more interesting than that, and makes use of the InvokeDynamic bytecode instruction added in JDK7. It inspired me to dig a little deeper with javap and try writing up an explanation in simpler terms (and only covering the simple case), since that article is pretty heavy (though I have to admit that I spent a solid hour on a Saturday night going through it with glee — I'll claim that my standards for a fun Saturday night have lowered considerably with a small child at home, rather than admitting that I'm a huge nerd).

How don't lambdas work?

I think it's clearer to understand how lambdas work by first considering how they don't work. That is, they're not just a more succinct way of writing anonymous inner classes.

Throughout the rest of this post, I'll use the following example:

While only a few lines in total, this example could desugar to something a fair bit messier. It's useful to note that our simple lambda is capturing the variable i from its scope.

Suppose lambdas were simply a nicer way to write anonymous classes. The above example would be equivalent to:

If you've ever looked at the output directory from compiling code with anonymous functions, you would notice that the above code would compile to AnonymousFunction.class and AnonymousFunction$1.class. If we run javap AnonymousFunction\$1.class (escaping the $ so the shell doesn't think I'm referencing a variable), we get the following output:

So, we have a class that holds the variable i as a field (passed to the constructor), and two versions of the apply method, with and without type erasure. (The Object version just casts its input to an Integer and invokes the Integer version.)

More explicitly, anonymous functions are also a kind of syntactic sugar. In effect, AnonymousFunction.java compiles down to something like the following:

The output of running javap AnonymousFunction\$1.class looks nearly identical (save the One vs. 1 difference) to javap InnerClassFunction\$One.class.

So, how do lambdas actually work?

It's worth noting that the contents of my compiled output directory (with file sizes) are as follows:

Interesting... no secondary classes are produced by compiling the original SimpleLambda code. So, how does it work?

The code of the lambda is moved to a private static method called (in my case) lambda$6, that takes n and i as parameters and does the addition. That's the easy part. What does main actually compile to? Let's ask javap (using the -v flag):

I think of this bytecode as divided into three parts:

  1. Bytecode lines (distinct from GitHub gist lines) 0 through 6 take care of assigning local variable i and putting it on the stack.
  2. Lines 7, 8, and 13 load i from the stack, do an invokedynamic to actually create the Function, then push it onto the stack.
  3. Lines 14 through 27 take care of retrieving the constant System.out, loading the Function from the stack, boxing 5 into an Integer, calling our Function.apply method, and calling println.

We're most interested in line 8, which triggers the creation of the Function. Let's see what #3 points to (near the top of the javap output):

#3 = InvokeDynamic #0:#40 // #0:lambda$:(Ljava/lang/Integer;)Ljava/util/function/Function;

So, it's pointing to #0 with the expectation of a CallSite that takes an Integer and returns a Function (effectively a Function factory). Let's look at #0:

This is basically binding the last three arguments for a call to LambdaMetaFactory.metaFactory (samMethod, implMethod, and instantiatedMethodType) to our desired "single abstract method" (#37 Function.apply), the implementation method (#38, the previously-mentioned private static lambda$6), and the type arguments for the Function.apply method (#39 — it takes an Integer (within parentheses) and returns an Integer (outside the parentheses)). The first three arguments are supplied by the invokedynamic call itself. (I believe arguments 2 and 3 are lambda$ and (Ljava/lang/Integer;)Ljava/util/function/Function are bound by #3, above).

Right now, if I look at the code for LambdaMetaFactory.metaFactory, it's basically constructing a new InnerClassLambdaMetafactory and asking it to generate a CallSite. So, at runtime, we're generating an inner class that implements Function<Integer, Integer>, takes an Integer as a constructor parameter, and implements apply by calling lambda$6 with its field and the argument to apply.

Let's confirm this by creating a class that outputs this stuff at runtime:

And here is the output from running that:

In this case, our static method got called lambda$0, but you can see that it's called from the apply method of an inner class that holds an Integer field, which it probably received through its constructor (since it's a final field). Effectively, at runtime, we're getting the behaviour from InnerClassFunction.java from above (except that the actual code makes one more method call to the static method), which in turn is the same as AnonymousFunction.java, which is not what lambdas become. The one "major" difference is that the generate class doesn't implement the Integer version of apply, sticking purely to the erased version (which makes sense, since the checked cast is already required before calling lambda$0).

Wait... what?

In a very roundabout way, I hope that I've shown you that lambdas are not equivalent to anonymous classes (which are equivalent to inner classes that capture local variables via constructor arguments), but they are actually equivalent at runtime... for now. The truth is that this somewhat convoluted invokedynamic approach leaves things open for improvements in the future.

The current implementation of Project Lambda (at the time of writing) does, in fact, generate inner classes at runtime. However, by using the invokedynamic call to LambdaMetadataFactory, future runtimes will be able to provide a different implementation that may have better performance, and still work with code compiled now (or at least after JDK8 eventually ships). As a secondary bonus, by not performing this expansion at compile time, compile time should be shorter and the generated bytecode will be smaller. (Even with this trivial example, SimpleLambda compiled to 20% less bytecode than InnerClassFunction. I believe the difference would be more significant with more lambdas.)

Conclusions

I hope that this post has made you think more about how lambdas will work, along with how anonymous classes and inner classes already work.

It's worth noting that Scala currently compiles functions to inner classes, though there has been some discussion about using the invokedynamic approach there as well.