Tuesday, January 29, 2013

Project Lambda - First Impressions

Before launching into the actual post, I would like to make a quick plug for Martin Odersky's "Functional Programming Principles in Scala" course on Coursera. It will begin March 25th and run for 7 weeks. I took the course during its previous run (September-November last year), and really enjoyed it. No prior knowledge of Scala is required. The material is well-presented and quite interesting. Sign up now!

Moving on to today's topic

I created an impromptu poll on Google+ to see what I should write about next, not really expecting any suggestions (especially since most of my Google+ friends only seem to check in roughly once per month or so). As it happens, I got a vote for Project Lambda, the upcoming support for functional programming features planned for JDK8. So, with that unanimous support, I downloaded a prerelease (b74) and started porting the functional programming framework I've been developing in earlier posts.

Mapping concepts

Taking a look at that new java.util.function package, you find a fair number of interfaces (I count 38 in b74), but they're mostly specializations of the following few core interfaces:

  • Supplier: Declares a get() method that returns a value. Effectively interface-equivalent to Function0 from my previous posts. There are specialized versions (BooleanSupplier, DoubleSupplier, IntSupplier, LongSupplier, but no love for floats) to return unboxed primitive values.
  • Function: Has an apply method that takes an input of type T and returns an output of type R. Analogous to Function1 from my previous posts. Has subinterface UnaryOperator (for functions where the return type is the same as the input type), and many primitive type specializations (to avoid boxing), including Predicate (for functions that return boolean), IntFunction, LongFunction, DoubleFunction, plus many combinations of single-input functions for different input/output types (where at least one is primitive).
  • BiFunction: Analogous to the Function2 type I described in earlier posts, these take two arguments and return a value (where, in the most general case, each input and output type may be different). Has a subinterface BinaryOperator for the special case when all three types are the same, plus specialized versions of BinaryOperator for each of the primitive types.
  • Block and BiBlock: Like Function and BiFunction, but covering the case where nothing is returned (so the code must produce a side-effect). I handled these cases in my previous post by using functions that return the empty tuple (Tuple0). Again, there are several specialized interfaces to handle primitive inputs to the block.

Lambda syntax

By far, the biggest change you'll find in JDK8 from Project Lambda is the new syntax added to make working with the above interfaces much less verbose. (My previous posts serve as an example of how ugly it can be to create anonymous classes to wrap up a little bit of logic in a function object.) Here is a "before and after" example, using an anonymous class versus the lambda syntax:

I think we can all agree that the second conveys just as much useful information without the noise of the anonymous class boilerplate. IntelliJ even gives me a warning, telling me that the first version can be replaced with a lambda. This syntax can be applied to any anonymous implementation of an interface where there's a single method to implement (which includes a number of AWT or Swing EventListeners, as well as Runnable and Callable). I would argue that the lambda syntax will be pretty useful even if you're not interested in functional programming.

Since many existing codebases probably have static utility methods that behave as functions, there is also syntax to use these methods as function objects:

Similarly, an instance method without arguments can implicitly be treated as a function where the instance is the input and the result of invoking the method on the instance is the output (note that IntFunction<String> takes a String as input and returns an int):

Default methods

If you paid close attention to my previous posts (or looked at the code in GitHub), you may have noticed that most of my function types were classes. This was because I wanted to be able to implement certain helper methods that would work if you provided the actual logic of a function. For example, Function1 has a compose method that would allow you to define composition of functions. Similarly, the function classes have the logic for partial application and currying.

This is kind of annoying, as you can only inherit from a single class, whereas you can implement multiple interfaces. It would be much nicer to be able to define some logic that comes "for free" once the method(s) of an interface are defined. In particular, this is how Scala's traits work (and was one of the features that made my Why I like Scala list). As it happens, this is another feature coming to Java with Project Lambda.

There is a new keyword, default, that can be added before an interface method that allows you to define a body for the method. From what I can tell, while this introduces some multiple inheritance of behaviour to Java, it still isn't quite as flexible as Scala's multiple inheritance of traits (which avoids the diamond problem by defining a linearization of inheritance). For example, the following doesn't compile for me:

The compiler error is

java: class DefaultTest.HelloWorld inherits unrelated defaults for print() from types DefaultTest.Hello and DefaultTest.World

That said, default methods still allow some rich inheritance of interfaces, and could be used to significantly improve the collections API (much as the use of traits has given Scala a much better collections API). Based on this FAQ answer, which refers to new interfaces java.util.Sized and java.util.Traversable, it looks like this cleanup was planned at some point, but those interfaces don't exist in the b74 build of Project Lambda. I don't know if they backtracked on improving collections, or if those changes are excluded from current builds.

New collection types

In a previous post, I covered the Option type (taking the name from Scala). As it happens, Project Lambda includes the java.util.Optional type, which covers pretty much the same niche.

They've also added a hefty java.util.stream package that handles lazily-evaluated sequences, complete with evaluation pipelines (designed to accommodate concurrent evaluation). I won't go into details on streams, for the following reasons:

  • I want to discuss streams (as a type) in a later post, since they're pretty easy to implement.
  • This post is already going to be too long, and streams would roughly double the length.
  • I haven't had time to fully investigate pipelines, so anything I write wouldn't be doing them justice.

What's missing?

So far, Project Lambda doesn't seem to include helpers for partial application of functions, or lazy evaluation. Luckily, those are both pretty easy to implement, and the syntax is much nicer with the new syntax.

Lazy Evaluation

To implement lazy evaluation, let's revisit the Function0 idea, but this time, we'll make it an interface. Of course, interfaces still can't hold data, so we'll "remember" earlier evaluations using a static WeakHashMap (relying on identity equality) and take advantage of the hot new computeIfAbsent default method added to Map.

Next, let's create a Function1 that extends Function and adds the ability to take a Supplier (which may or may not be a Function0) as input to apply. This will allow us to eagerly evaluate a previous lazy result. Whereas previously, I made the return value always lazy (where you could then force evaluation with get()), the built-in Function interface we're extending performs eager evaluation. So, we'll expose lazy evaluation via the applyLazy methods. I'll use the T,R ordering of type arguments, since my earlier R,T ordering was silly and inconsistent with what everyone else does.

Partial Application

In my previous implementation, I had an abstract base class called Curryable which implemented partial application in terms of currying. Then, it was the responsibility of each FunctionN class to implement curry to produce a Function1 that binds the first argument and returns a FunctionN-1. Unfortunately, while writing this post, I discovered a fatal flaw in that implementation that actually invalidates one of my examples in my previous post. Specifically, that implementation of partial evaluation always evaluated the partial arguments eagerly, so one of the page downloads started in the middle of constructing the lazy result of the Function2. (In retrospect, I should have been suspicious when I found that my asynchronous implementation was consistently faster than my earlier concurrent implementation. Of course it was, it had already downloaded a page before the timer started.) We'll do partial application more directly this time, so that laziness works.

Anyway, let's define Function2 in terms of BiFunction, with the additional partial application methods. Function3 and up are analogous, but need to be defined from scratch, since there are no built-in interfaces for functions of more than two arguments.

Let's confirm that this works, and play some more with lambda syntax by implementing foldLeft as a static method, binding it to a function object (a Function3, specifically), and then using partial application to produce a Function1 that concatenates strings:

That's considerably easier than the hoops I jumped through to implement a higher-order version of foldLeft previously.

Conclusions

So far, Project Lambda looks like it could actually make functional programming much more viable on Java. While my examples here haven't done anything that I haven't been able to do with my existing toy framework, the syntax alone is a massive improvement. As an example of the impact of good syntax, I immediately spotted my mistake with my earlier Curryable implementation when I started to write it with lambdas. Previously, I had missed the mistake amidst all of the anonymous function boilerplate.

I'm still not convinced that it's on par with Scala, though, since it largely seems to be catching up to where Scala was a few years ago, and you're still stuck with the old "everything is mutable" collections framework. Various features of the Scala compiler (e.g. type inference, special handling of apply) allow you to write even less boilerplate than what you get with Project Lambda. Still, if Java developers get used to this stuff (and like it), it might make it easier to convince more of them to drop the uglier legacy parts of Java and move to Scala (which still keeps the good old JVM and JIT compiler).

That said, since I imagine that I'll be stuck mostly working in Java for the foreseeable future, I can only hope that JDK8 comes out soon, so I can use these goodies in my professional life.

Finally, remember that what I've covered here is only scratching the surface. This is what I've been able to pick up playing with Project Lambda in about a week, between when my daughter falls asleep and heading to bed myself. For more details, check out the Project Lambda homepage, and the resources linked from there.


Sunday, January 13, 2013

Asynchronous Evaluation

As promised at the end of my previous post, I've added (what I think is) a fairly simple asynchronous execution structure.

It's worth noting that this post actually has nothing to do with functional programming (especially since I will be producing side-effects). It just happens that what I've written works because of lazy evaluation, which itself is not a functional programming concept, so much as something that just happens to come up in some functional programming environments (most notably Haskell). The beauty of programming without side-effects is that your logic remains the same, regardless of the actual evaluation semantics. In turn, this makes it easier to "make it work, make it right, make it fast". This post largely deals with the last part.

Adding callbacks to a concurrent evaluation

Originally, when I wrote this, I had planned to write an AsyncFunction0 class, which would decorate a Function0 with asynchronous callback evaluation, and still support Function0 semantics. Upon further thought, I figured it would be easier to keep the asynchronous evaluation behaviour totally separate. Instead, I opted to create an AsyncEvaluator class which would:

  • be based on a Function0 and an Executor, and could be explicitly invoked to trigger evaluation of the result lazily defined by the Function0 on the Executor.
  • be able to accept callbacks, which will be called (on the same thread as evaluation) once the Function0 has been evaluated.
  • support convenient chaining of Function0 evaluations, so evaluation of one result could trigger evaluation of later results.

For callbacks, rather than defining a new type, we'll use a function that takes the result of evaluation as input, but it doesn't really make sense to return something -- to whom will we be returning it? Instead, the callback should "do" something. That is, it should have a side-effect. So, the callback should return void, or something to that effect. Since we already have tuples, we can depict the lack of a value by the empty tuple, or Tuple0. This is a singleton:

With that, we can move on to the core code of AsyncEvaluator:

This is simple enough, and still useful. You could write an asynchronous event handler that would respond to a UI event (like a button click) by creating a Function0 that returns a result, wrapping it in an AsyncEvaluator, and then adding a callback that writes the result to the UI. Before your event handler returns, it would call invoke to trigger the background processing.

Chaining function evaluations

Suppose you've written a nice, working program that runs well on a single thread, but you see various pieces that could run concurrently. Assuming you have lazy values representing certain "checkpoints" in the process, it would be nice to be able to say, "When result A has been calculated, start evaluating B and C (which depend on A), then when B finishes, kick off D. When C and D are both done, invoke E." We can do that, without changing any actual logic. Effectively, we just need to trigger evaluation of the Function0s based on the completed evaluation of earlier values.

First, we'll look at creating simple or branching chains. That is, from a single Function0 that has been wrapped in an AsyncEvaluator, we can trigger one or more later Function0 evaluations:

We may not want all evaluations to occur on the same Executor, so we allow an alternative Executor to be specified for the chained evaluation. If no Executor is specified, we'll use the same one used to evaluate the current result.

Here is a silly unit test that counts to ten asynchronously:

What if we have a value that is constructed based on multiple earlier asynchronously computed values? In that case, we want the new value to be computed as soon as the last of the previous values is available:

With that, we can write an asynchronous version of our example from the previous post, where we fetch two webpages and sum their byte counts:

As mentioned in the code comments, neither of these tests really reflects what you would actually want to do with an asynchronous execution path. In the real world, your final callback would push the result as a network response or output it to the user, such that you would never actually need to block while waiting for an answer.

Conclusion

Asynchronous processing is a pretty powerful tool, made quite popular (and certainly more familiar) thanks to AJAX calls from the browser. Now that many of us are comfortable with this approach, it's not a bad idea to add more asynchronous processing to our server-side or desktop software. In particular, while a blocked thread uses fewer resources than a busy-waiting thread (or a thread that periodically polls for a result), it's better to only use a thread when there's stuff to be done. In the second unit test example above, we still have two threads blocking on the HttpURLConnection responses. A better (more scalable) program would use Java NIO to fire off the requests and react to the responses asynchronously. Unfortunately, that's outside the scope of the current post.