On my Twitter feed, I recently got a link to this document by Brian Goetz explaining how JDK8 translates lambdas into bytecode. I was surprised, since I had assumed that lambdas were simply syntactic sugar for anonymous inner classes that implement one of the functional interfaces, much as anonymous functions in Scala currently work. That is, I thought that Project Lambda was basically doing the same thing as I've been doing with my functional types, but eliminating the annoying boilerplate that goes along with an anonymous class and using better type inference.
It turns out that it's considerably more interesting than that, and makes use of the InvokeDynamic bytecode instruction added in JDK7. It inspired me to dig a little deeper with javap
and try writing up an explanation in simpler terms (and only covering the simple case), since that article is pretty heavy (though I have to admit that I spent a solid hour on a Saturday night going through it with glee — I'll claim that my standards for a fun Saturday night have lowered considerably with a small child at home, rather than admitting that I'm a huge nerd).
How don't lambdas work?
I think it's clearer to understand how lambdas work by first considering how they don't work. That is, they're not just a more succinct way of writing anonymous inner classes.
Throughout the rest of this post, I'll use the following example:
While only a few lines in total, this example could desugar to something a fair bit messier. It's useful to note that our simple lambda is capturing the variable i
from its scope.
Suppose lambdas were simply a nicer way to write anonymous classes. The above example would be equivalent to:
If you've ever looked at the output directory from compiling code with anonymous functions, you would notice that the above code would compile to AnonymousFunction.class
and AnonymousFunction$1.class
. If we run javap AnonymousFunction\$1.class
(escaping the $
so the shell doesn't think I'm referencing a variable), we get the following output:
So, we have a class that holds the variable i
as a field (passed to the constructor), and two versions of the apply
method, with and without type erasure. (The Object
version just casts its input to an Integer
and invokes the Integer
version.)
More explicitly, anonymous functions are also a kind of syntactic sugar. In effect, AnonymousFunction.java
compiles down to something like the following:
The output of running javap AnonymousFunction\$1.class
looks nearly identical (save the One
vs. 1
difference) to javap InnerClassFunction\$One.class
.
So, how do lambdas actually work?
It's worth noting that the contents of my compiled output directory (with file sizes) are as follows:
Interesting... no secondary classes are produced by compiling the original SimpleLambda
code. So, how does it work?
The code of the lambda is moved to a private static method called (in my case) lambda$6
, that takes n
and i
as parameters and does the addition. That's the easy part. What does main
actually compile to? Let's ask javap
(using the -v
flag):
I think of this bytecode as divided into three parts:
- Bytecode lines (distinct from GitHub gist lines) 0 through 6 take care of assigning local variable
i
and putting it on the stack. - Lines 7, 8, and 13 load
i
from the stack, do aninvokedynamic
to actually create theFunction
, then push it onto the stack. - Lines 14 through 27 take care of retrieving the constant
System.out
, loading theFunction
from the stack, boxing5
into anInteger
, calling ourFunction.apply
method, and callingprintln
.
We're most interested in line 8, which triggers the creation of the Function
. Let's see what #3
points to (near the top of the javap
output):
#3 = InvokeDynamic #0:#40 // #0:lambda$:(Ljava/lang/Integer;)Ljava/util/function/Function;
So, it's pointing to #0
with the expectation of a CallSite
that takes an Integer
and returns a Function
(effectively a Function
factory). Let's look at #0
:
This is basically binding the last three arguments for a call to LambdaMetaFactory.metaFactory
(samMethod
, implMethod
, and instantiatedMethodType
) to our desired "single abstract method" (#37 Function.apply
), the implementation method (#38
, the previously-mentioned private static lambda$6
), and the type arguments for the Function.apply
method (#39
— it takes an Integer
(within parentheses) and returns an Integer
(outside the parentheses)). The first three arguments are supplied by the invokedynamic
call itself. (I believe arguments 2 and 3 are lambda$
and (Ljava/lang/Integer;)Ljava/util/function/Function
are bound by #3
, above).
Right now, if I look at the code for LambdaMetaFactory.metaFactory
, it's basically constructing a new InnerClassLambdaMetafactory
and asking it to generate a CallSite
. So, at runtime, we're generating an inner class that implements Function<Integer, Integer>
, takes an Integer
as a constructor parameter, and implements apply
by calling lambda$6
with its field and the argument to apply
.
Let's confirm this by creating a class that outputs this stuff at runtime:
And here is the output from running that:
In this case, our static method got called lambda$0
, but you can see that it's called from the apply
method of an inner class that holds an Integer
field, which it probably received through its constructor (since it's a final
field). Effectively, at runtime, we're getting the behaviour from InnerClassFunction.java
from above (except that the actual code makes one more method call to the static method), which in turn is the same as AnonymousFunction.java
, which is not what lambdas become. The one "major" difference is that the generate class doesn't implement the Integer
version of apply
, sticking purely to the erased version (which makes sense, since the checked cast is already required before calling lambda$0
).
Wait... what?
In a very roundabout way, I hope that I've shown you that lambdas are not equivalent to anonymous classes (which are equivalent to inner classes that capture local variables via constructor arguments), but they are actually equivalent at runtime... for now. The truth is that this somewhat convoluted invokedynamic
approach leaves things open for improvements in the future.
The current implementation of Project Lambda (at the time of writing) does, in fact, generate inner classes at runtime. However, by using the invokedynamic
call to LambdaMetadataFactory
, future runtimes will be able to provide a different implementation that may have better performance, and still work with code compiled now (or at least after JDK8 eventually ships). As a secondary bonus, by not performing this expansion at compile time, compile time should be shorter and the generated bytecode will be smaller. (Even with this trivial example, SimpleLambda
compiled to 20% less bytecode than InnerClassFunction
. I believe the difference would be more significant with more lambdas.)
Conclusions
I hope that this post has made you think more about how lambdas will work, along with how anonymous classes and inner classes already work.
It's worth noting that Scala currently compiles functions to inner classes, though there has been some discussion about using the invokedynamic
approach there as well.