Saturday, December 29, 2012

Adding concurrency to our functional framework

In previous posts, I hinted that someday I would discuss concurrent evaluation of functions. That time has finally come. Also, I've made the switch to Java 1.7 at home, so don't be shocked by some of the syntax here (like multi-catch or try-with-resources).

First, let's acknowledge that Function0 is effectively a Callable with result caching. Let's make this explicit, by making Function0 implement Callable:

Using that, it's quite easy to dispatch evaluation of a Function0 to an ExecutorService using a helper class:

This is a pretty powerful (but simple) pattern, since a Function0 can be built up at runtime based on successive lazy applications of functions. (Effectively, the Function0 type is a form of thunk.) Once you have several work units of decent size, you can dispatch them to background threads for evaluation, and then pass the resulting Function0s (which are probably still running) to other lazy functions. Once you need a result (e.g. for output), you invoke get() and (if necessary), your thread will block until the required background tasks have finished.

Let's see this in action:

These test cases download the two previous posts on this blog, counts their bytes (using a foldLeft), and output the sum of the byte counts, along with the time taken to compute the result. The only difference is that testConcurrentPageDownload kicks off evaluation of the two calls to getPageBytes on an ExecutorService. On my computer, the serial test typically takes about 1100ms, while the concurrent test takes about 600ms, suggesting that it takes about 500ms to download each page (they're pretty similar in size) and 100ms to count the bytes (since we're not using a particularly efficient data structure).

That's probably enough for now. I've got some code written that extends Function0 to include asynchronous evaluation of callbacks when the result is ready. Unfortunately, I still haven't developed a good example that makes it look useful yet. Once that's ready, though, I'll write it up. Stay tuned.


Saturday, November 24, 2012

What is Functional?

It occurred to me that in all of my writing about functional programming so far, I haven't really talked about the basic idea of what a function really is, at a fundamental level, and how this informs functional programming.

What is a function?

The idea of a function really comes from mathematics. It's just been implemented (in various ways) in computer languages since then. Essentially, a function is a transformation -- given an input value drawn from some set, it produces an output in some set (which could be the same as the input set, but in many useful cases, it's not).

In math, we usually write something like "Let f be a function ℝ ⟶ ℝ, given by f(x) = x." In this case, we're declaring the domain and codomain of the function as the set of real numbers, and then defining the "implementation" of the function.

To define a similar function in C, we could write:

double f(double x) { return x; }

Of course, this isn't exactly the same as the mathematical definition, since doubles can't represent all of the real numbers, but it is usually "close enough". That said, we've declared a domain (double) and a codomain (double), and defined the means by which values are transformed. Generalizing, we can say that types effectively describe sets. (On any actual computer (without infinite memory), types describe finite sets.)

At some point, math students become comfortable with functions over multidimensional sets. For example, the function to determine the magnitude of a 2d vector over the real numbers is f:ℝ × ℝ ⟶ ℝ given by f(x,y) = sqrt(x^2 + y^2), where the domain ℝ × ℝ denotes the Cartesian product between the real numbers and themselves. (Since the magnitude function only returns non-negative values, we could declare the codomain to be ℝ+ or some such notation. Usually, we'll just leave the more general codomain, despite the fact that the function only hits a subset of it. This is the difference between the codomain and image of the function.)

Similarly, in software, we have functions of multiple parameters. That magnitude function can be (approximately) defined in C as:

double magnitude(double x, double y) { return sqrt(x*x + y*y); }

Equivalently, we can consider the function as taking a single tuple (double, double).

One area where mathematics and imperative languages have diverged is in the ease of defining Cartesian product codomains. For example, we can define a function that reflects through the line y=x, as f: ℝ × ℝ ⟶ ℝ × ℝ, where f(x,y) = (y,x). To denote the two return values, we generally need to define a new type that has two fields (with each field representing an operand of the Cartesian product). Typically, this would be a struct in C, a record in Pascal, or a class in an object-oriented language. We can make this somewhat generic by defining a set of tuple typeclasses.

Composition

In math, functions can be composed. That is, given a function f: A ⟶ B and a function g: B ⟶ C, we can produce the composed function g ◦ f : A ⟶ C. In traditional imperative programming, we tend to do the equivalent by defining a new function that invokes f then passes the result as input to g, returning the result. Functional languages or libraries often have more convenient syntax for composing functions. (While I haven't included an example in my "Functional programming with examples in Java" series, it is not hard to add a "compose" method to Function1.)

Partial functions

Many functions that we encounter in programming aren't "true" functions, but rather partial functions. That is, they don't actually produce values over the full range of possible input. Instead, they may throw an exception, crash, or otherwise reject a value that is "acceptable" given the input type.

In mathematics, we usually deal with this by restricting the input domain to valid values. For example, the function f(x) = 1/x would have domain ℝ∖{0} -- that is, the real numbers, excluding zero. Unfortunately, in a statically-typed programming language, we don't have a type that represents the non-zero real numbers (at least in any language with which I'm familiar).

Another approach you sometimes see in math is to define a function piecewise, where you define a "total" function over a domain by specifying the existing partial function over the values for which it's valid, and then "something else" for the other values. In particular, we could augment the codomain to include a special "undefined" value, which could propagate through other functions that "accept" an undefined value. So, the function above becomes f: ℝ ⟶ ℝ ∪ {undefined} where f(x) = 1/x, if x ≠ 0, and undefined otherwise. Then, any function that composes with f needs to do something with the undefined value. We can accomplish roughly the same thing in programming by returning null (for non-primitive types), but we risk losing track of it. A better alternative is to use an option typeclass, since it provides a compile-time guarantee that composed functions expect an Option.

Functions in disguise

A number of data structures can also be thought of as functions. An array int[10] is effectively a function from the natural numbers between 0 and 9 to the set of integers. (Similarly, a List<T> is a partial function from the integers to T.) A Set<T> in Java can be thought of as function T ⟶ boolean, via the contains method. A Map<K,V> is a function K ⟶ V, where Java makes it a total function by returning null for missing keys, while Scala treats Maps as functions K ⟶ Option[V].

Independent evaluation and substitution

Mathematical functions can be evaluated in any order and can substitute their results arbitrarily. That is, if we know f(x) = 5, we can write 5 anywhere that we see f(x) and vice-versa. This is because mathematical functions are defined in terms of equivalences, independent of side-effects. Evaluating a mathematical function doesn't log output to a console, update some state, or strangle a kitten.

Unfortunately, the same cannot generally be said for functions in a programming context (especially if you're using the Apache Commons Kittenstrangler library). In particular, for a program to be useful, it must perform some I/O, which is inherently a side-effect. (Even the most mathematical program eventually needs to update state -- mutating a user who doesn't know the answer into one who does.)

Fortunately, we can isolate our "impure" side-effecting code from our "pure" functional code. In fact, this is generally how programs in Haskell are structured. They are spawned within the impure IO monad, and dive into the pure functional logic, only coming back up to the impure world to interact with the user.

That said, even the IO monad has a functional feel, if you twist your mind just right. The way I think of it is that your program takes "the world" as input. Your pure functional code is mapped over the IO monad and can be used to produce a "new world" as output. Thus, a "Hello, world" program in Haskell takes as input the universe and outputs a new, slightly different universe: one where "Hello, world" is written one more time on your screen.

Generally, you want to keep the impure part as small as possible. In traditionally imperative languages, we can theoretically structure our programs in a similar way -- writing side-effect-free functions wherever possible, and keeping side-effects within a specific layer. Unfortunately, most imperative languages also have standard libraries that encourage updating of mutable state.

What do we get as programmers for following a purely functional approach? Like mathematical functions, it is safe to evaluate pure functions in an arbitrary order, or even all at once. Effectively, we can have near-arbitrary concurrency of our functional code, which is quite useful in today's multi-core world.

Obviously, even with pure functional code, we need to be careful about how/when we parallelize, since the overhead of context-switching could quickly dominate the gains obtained by parallel evaluation. That is, we as developers still need to find appropriately-sized work units. That said, pure functional code can be parallelized without worrying about locking shared state. When you have mutable state, you need to worry about finding appropriately-sized work units as well as locking shared state. Picking the wrong work units leads to degraded performance. Making mistakes with locking can lead to degraded performance, deadlocks, or race conditions. Worst of all, deadlocks and race conditions often only show up under heavy load (which tend to involve production scenarios).

Conclusions

In this post, I've tried to make a more mathematical argument in favour of functional programming. Computer science is a comparatively young field of study. Unfortunately, it's also very useful. This has resulted in a lot of unproven (and unprovable) solutions running our world. As a step toward maturity and stability, I believe we would benefit from adhering to the boring, predictable logic of mathematics where possible, isolating those pieces that cannot be represented by pure functions.

While Haskell (more-or-less) enforces this separation between pure functions and side effects, we can move toward purity in traditionally imperative languages like C/C++, Java, Pascal, Objective C, and the .NET languages. It's largely just a matter of changing your coding style a bit. If you find that you're bumping against limits of the language to write elegant pure code (as in my previous blog posts, where there's a lot of boilerplate involved with making Java act functional, and I dropped purity in a few cases -- e.g. memoization and lazy evaluation), you could also try a hybrid OO/functional language like Scala, or a functional language that still gives you easy mutability and side-effects when needed, like F# or most Lisps. Additionally, Python and Ruby have some convenient functional constructs in their standard libraries/syntax. Finally, while the language itself can be pretty ugly, remember that Javascript is basically Lisp disguised as a C-family language.


Friday, July 20, 2012

Why I like Scala

In my past few blog posts (actually, almost all of my blog posts), I've been trying to present ways of implementing stuff built into Scala, using old-school Java syntax. I should probably explain some reasons why I think it makes more sense to just make the jump to Scala.

Type inference

Whenever you declare an object variable in Java, you tend to need to specify the type twice: once when declaring the compile-time type for the variable (the thing that goes to the left of the variable name) and once when specifying the object constructor (the thing that goes after new). Java has added some inference of generic type parameters in 1.7, so you can now write Map<String, Integer> myMap = new HashMap<>, but it's still a fair bit of code noise compared to Scala's var myMap = new HashMap[String, Int](). You still need to specify types on method/constructor parameters, of course, and you can explicitly specify types for variables when you want (e.g. when you're making a mistake and the type inferencer is being too helpful).

Some people unfamiliar with Scala see the type inference and assume that Scala is a dynamically typed language, since the conciseness of variable declarations looks more like Ruby, Python, or Javascript. Scala is statically typed, though. The compiler just looks at the type of the expression on the right side of the =, and maps the type to the left side (and will generate a compiler error right there if you've explicitly specified a type on the left side that is not satisfied by the type of the expression on the right). In my opinion, having the short, unobtrusive code typically associated with dynamic languages, but still keeping the compile-time type-safety (and IDE autocompletion) of a static language, is awesome.

Each line in this toy example that involves a variable declaration is shorter in Scala than in Java (even ignoring the semicolon inference). Plus, as described in the next section, to make it more fair, the first and fourth line in the Java example should have final prepended.

Immutability

While I've shown in a few posts that it's possible to implement immutable data structures in Java (and you can use the Functional Java library), it's not the same as having core library support. The Java Collections API has mutability built in from the ground up. Sure, you can use Collections.unmodifiable{Set,List,Map,Collection} to produce immutable collections, but then you're left with runtime exceptions if you do try modifying their contents, and no way to create a new collection based on a previous one (besides an O(n) copy to create a new collection, versus the O(1) prepend/tail of an immutable list or the O(log n) insert/delete of an immutable trie map/set). In my humble opinion, to push the unmodifiability constraint to compile time, it would have been ideal to split the collections interfaces, adding e.g. ReadableCollection, ReadableList, ReadableIterator, etc. making the existing interfaces extend the new "read-only" interfaces, adding the modification methods. That's not how they did it, though.

Furthermore, while in Java, it's a good idea to minimize mutability (it's even Item 15 in Effective Java), I personally spend a decent amount of time and screen real estate writing final everywhere. (It's largely become habit for me, so I'm not wasting brain cycles when typing final, but it's confusing for my colleagues who aren't familiar with seeing the final keyword.)

In Scala, immutability is the default for collections, and it takes the same number of characters to create an immutable variable as it does a mutable one: it's var for mutable and val for immutable. That's it.

Uniform access principle

Most Java (or C++) developers are taught early on to hide data in private fields, and provide public accessors and mutators. This has the advantage that you can someday add validation on the mutators, log use of the accessors, delegate to some proxy object, etc. Unfortunately, 90+% of the time, you're just providing get/set access to a field. Yes, for that 10%, the flexibility is crucial, but it leads to a ridiculous amount of code noise (made less painful for the initial developer by the fact that the "fix" has been to add "Generate getter/setter methods" functionality to IDEs -- unfortunately, that code noise is still present for anyone maintaining the codebase).

Languages like Ruby allow you to first write public fields, and later drop in custom accessors or mutators, without changing the calling code. The earliest language that I know of to provide this functionality was Delphi, with its notion of properties (which were later included in the initial version of C#, since Anders Hejlsberg designed both). In general, this ends up being a form of the uniform access principle.

Scala provides support for the uniform access principle with a simple, built-in compiler trick. Effectively, whenever you create a field that is visible outside the current object, the Scala compiler outputs bytecode for the field and accessor/mutator methods (excluding the mutator method if the field is a val).

Consider the following Scala code:

Compiling that and running it through javap -c -private UAPExample1, we get the following output:

public class UAP.UAPExample1 extends java.lang.Object implements scala.ScalaObject{
private java.lang.String name;

public java.lang.String name();
  Code:
   0: aload_0
   1: getfield #11; //Field name:Ljava/lang/String;
   4: areturn

public void name_$eq(java.lang.String);
  Code:
   0: aload_0
   1: aload_1
   2: putfield #11; //Field name:Ljava/lang/String;
   5: return

public UAP.UAPExample1();
  Code:
   0: aload_0
   1: invokespecial #19; //Method java/lang/Object."":()V
   4: aload_0
   5: ldc #21; //String Michael
   7: putfield #11; //Field name:Ljava/lang/String;
   10: return

}

Our public, mutable name field was pushed into a private field called name, and methods name() and name_$eq were created to provide an accessor and mutator. We can produce (more or less) the same bytecode, following the traditional Java bean pattern, as follows:

The only difference from the previous javap output is that the private field is called _name instead of name. Of course, this code is also doing the exact same thing as UAPExample1. Let's consider a more interesting case, where the accessor/mutator are doing something more complicated, like providing access to a value stored on disk:

This example is functionally equivalent to the first two, but keeps the values on disk. In particular, the following code exercises all three. The output from the first two examples is always "Michael" followed by "Blah". The third example will output "Michael" followed by "Blah" on the first run, but will output "Blah" followed by "Blah" on subsequent runs in the same working directory (since the previously set "Blah" value is still accessible):

Traits

Traits sit somewhere between Java's interfaces and abstract classes. In particular, traits can define methods, like abstract classes, but cannot take constructor parameters. Like interfaces, a class can implement multiple traits. Since you can define behaviour in traits, this gives you a form of mixin inheritance.

Since the JVM itself does not support multiple inheritance, this mixin behaviour is accomplished by linearizing the class hierarchy. That is, you can imagine that a linear class hierarchy has been created where each class has one parent (with one fewer trait mixed in). The implementation at the bytecode level doesn't exactly work that way, from what I can in the output from javap -- rather one class is created with delegates to static methods for the implemented trait methods (passing this to the static method).

Let's consider an example using traits. I'm going to create a base CSSStyle trait, which will have subtraits for various CSS attributes. Then, we'll have two classes -- one representing an HTML element with inline style, and the other representing a CSS style rule (as you would find in a stylesheet file or within a style element). We'll add CSS attributes to each of these classes by mixing them in to new anonymous classes.

The output from running RunCSSStyle.main is:

<span style="font-weight: bold; font-style: italic; font-family: Helvetica, Arial, sans-serif"></span>
.important {
    font-weight: bold;
    font-style: italic;
    font-family: Helvetica, Arial, sans-serif
}

More refined scoping

In my example on the uniform access principle, I made use of private[this], and also stuck some imports within a method declaration. Both of these are examples of Scala's tighter compile-time scoping.

Using private[this] specifies that the method or field can only be accessed by the current object -- other instances of the same class cannot access it. (If I had simply specified private, the compiler would have generated an accessor/mutator pair for use by other instances of the object, to be consistent with the UAP.)

Scoping imports within a method (or even a nested scope), allows you to clearly specify where you're using something. Compare that to the typical Java source file, where the IDE collapses your mountain of imports, so you're not overwhelmed.

Similarly, there are times in Java where you want to invoke the same chunk of code multiple times within a method (but not in a loop), or you just want to break a method apart into logical chunks that solve independent problems, specific to the current method. So, you create a private method scoped to the class, even though the new method is only used in that one statement. Scala, on the other hand, supports nested methods. Once it's compiled, the nested method is pushed up to the class scope (since the JVM doesn't support nested methods), but your code is organized in a cleaner way, and the compiler will prevent you from using the method outside of its scope.

Incidentally, Scala also supports import aliasing, for when you're going to be using conflicting imports within the same scope. I believe this was mostly done to avoid developer anger when trying to bridge between (e.g.) a java.util.List and a scala.collections.immutable.List. Having to fully qualify one or the other would bulk up your code. Instead, you can write:

import java.util.{List => JavaList}

Then, every time you say JavaList, the compiler knows that you mean java.util.List. Lacking this in Java bugs me regularly, since my job involves working with Lucene documents (org.apache.lucene.document.Document) and occasionally with XML DOM documents (org.w3c.dom.Document). On those rare occasions where I would like to use both in one file, only can can have the name Document. The other must be fully qualified. It would be so much nicer if I could import them as LuceneDocument and DOMDocument, regardless of what the third parties decided to call them.

Off-hand, I'm not sure why locally-scoped imports and import aliases haven't made it into Java yet. Looking at the output of javap, it looks like the bytecode doesn't actually contain imports. Instead, the bytecode references fully-qualified classes (that is, the imports are resolved statically). As a result, adding support for locally-scoped imports and import aliases would not break bytecode compatibility, and would give Java developers another (albeit minor) reason to like the language.

In my opinion, Scala's scoping rules are the logical next step to the scoping transition from C to C++, where C required that you list all of your local variables at the beginning of a function (and any for (i = 0; i < ...; i++) within the function was using the same i), but C++ allows you to declare variables in the scope in which they would be used (but ultimately compiles to pretty much the same machine code).

Functional Programming

Obviously, one of the main benefits of Scala is functional programming. I made use of the map method on the Map type in the trait example to transform the key/value entries into a collection of strings, without needing to write a for loop. Note that I mapped a static method directly (technically, methods on a Scala object aren't really static -- they're methods on a singleton), without needing to create an anonymous Function instance, as I've done in my previous posts in Java. (Behind the scenes, the Scala compiler did create an anonymous Function1[Tuple2[String,String], String] for me, but I don't have to care about that.)

I won't bother writing a bunch of examples of functional programming in Scala, since they would largely come down to showing how the stuff I implemented (verbosely) in Java in my previous posts already exists in Scala, and has much nicer syntax. For example, I believe all of the immutable data structures I've implemented in Java exist in some form in Scala (and have a much better, more unified API). Assuming I stick with writing Scala in future posts, there will be more functional programming examples.

Conclusions

These are a few of the features I like in Scala that I miss in my day-to-day work programming in Java. I have barely touched the surface of what's available in Scala (having ignored pattern matching, case classes, the REPL, implicits, parallel collections, and more). I chose these features to begin with, since they're not that foreign to Java developers. Indeed, just about everything here could be done in Java, but with a lot more code.

Learning Scala has reinforced my love of the JVM as a platform and my disdain for Java as a language (and the Java standard library -- I have the utmost respect for Joshua Bloch as the author of Effective Java and Java Puzzlers, but the Java collections API angers me regularly in terms of things I believe they could have done better, while retaining backwards-compatibility). That said, I am also hoping that that the .Net port of Scala matures, as a lot of code tends to target the compiler and the standard library, and should be portable across underlying runtimes.

If you're interested in learning more about Scala, I suggest reading Scala for Java Refugees (which is probably a little outdated now -- two major versions of Scala have been released since then) or the free chapters of Scala for the Impatient. There is a "Java to Scala" page on the scala-lang website with more resources for moving to Scala.

In September, there will be a free course given by Martin Odersky (the creator of Scala, and the guy who introduced generics to Java). I'm signed up and would love more classmates.

In terms of books, my favourite is Programming in Scala (I bought it, downloaded the .mobi version, and converted it to ePub using Calibre). A less-good, but still enjoyable book is Dean Wampler's Programming Scala (also a little outdated, but still relevant). As a followup book (once you're pretty comfortable with Scala), I recommend Josh Suereth's Scala in Depth. The preprint PDF I read had a lot of grammar/spelling/typographical mistakes, but the content was solid and understandable (and hopefully the mistakes will get fixed in a subsequent printing).

Finally, if you're a little warped in the head (like me) and you develop a taste for the hardcore functional programming features that Scala enables (and have been implemented in the Scalaz library), I suggest you learn some Haskell (via Real World Haskell or Learn You a Haskell for Great Good -- both great books, which I encourage you to buy, but which you can also read online). While Scala is an object-oriented/functional hybrid language (by design), where you can always fall back to mutable data structures and side-effects, Haskell feels (to me at least, with my predominantly imperative, object-oriented background) more like playing the piano with one hand tied behind your back. That said, I feel that programming in Haskell makes me a better Scala programmer (much as switching between C++ and Java tends to make me better at both than if I stuck with one or the other).

Saturday, June 30, 2012

Immutable Hash Trie Maps in Java

In this post, I aim to introduce another immutable data structure, following up on my posts on immutable binary trees ([1] and [2]) and immutable lists. We'll be looking at hash tries, which provide an efficient way of implementing an immutable HashMap-like structure. Like a HashMap, we can provide more or less constant time operations in the absence of hash collisions, and linear time when collisions occur. The catch is that, in practice, our constants will be considerably larger. That said, we'll have a data structure that requires no synchronization when accessed from multiple threads, since it cannot be modified in-place.

Our hash trie map will be constructed as a tree with a high branching factor. If you haven't already read the posts on immutable binary trees, I suggest you do so, especially the second one. In particular, we're going to be relying heavily on specialized node types to save memory and split our logic into different cases based on polymorphic behaviour.

What is a trie?

Tries are a kind of tree where the path from the root to a given node encodes some earlier information. A nice use for tries is to store dictionaries in a relatively efficient way. Your root node represents the empty string, and it has 26 children, for the letters of the alphabet. Each node could be implemented as an array of 26 (possibly null) elements, plus a boolean to indicate whether that node represents a valid word. The word 'cat' would be encoded by the third-level node reached by going to 'c', then 'a', then 't', from the root. Since 'cat' is a word, the boolean value at that node would be true. By comparison, the node for 'coug' would be false, but would need to exist as a parent for 'cough'.

Our hash trie map will work on a somewhat similar principle, except that our valid "words" will be hash codes for the keys contained in the map.

An overview of what our tree will look like

To implement our trie structure, we'll have nodes that branch over part of the hash code of the elements (BranchedArrayHashNode), a singleton EmptyHashNode that represents a lack of children for a given partial hash, an EntryHashNode that represents a single key/value pair, and a ListHashNode that contains a linked list of key/value pairs whose keys' hashes collide. As a useful optimization, we'll implement a specialized SingleArrayHashNode for internal nodes who only have one child, so we don't unnecessarily allocate an entire array.

Like with the immutable binary trees, an add or remove operation will descend the tree until it reaches a (possibly empty) leaf node, which will provide its own replacement. Then, as the recursive calls return, the parents will provide their replacements (which will reuse the other children), until we get a new root node. In this way, we can guarantee (in the absence of a collision) running time that is logarithmic over the possible hash codes. Since the space of possible hash codes is fixed, the maximum depth of the tree is log(2^32) (which would be 32 with a branching factor of 2, but we'll use a branching factor of 2^5 = 32, guaranteeing a maximum depth of 7). This is where we get the claim of constant running time in the absence of collisions. (In the event of a collision, we hit a ListHashNode and we add a linear traversal of the list.)

An example of how we'll store our data

The above image depicts a hash trie map with 5 values stored in it (represented by the five EntryHashNode leaves). We break up the 32-bit hash codes of the keys into 5-bit chunks (with a 2-bit chunk at the deepest level), and use the sequence of 5/10/15/20/25/30-bit prefixes in our trie. It's probably clearest if I describe the five entries' hash codes:

  • The (1) entry has a hash code that begins with 11001. It is the only node with this prefix, so it can be stored directly below the root.
  • Entries (2) and (3) both have hash codes that begin with 10110, but entry (2)'s hash code starts with 10110 00011, while entry (3)'s hash code starts with 10110 10101.
  • Entries (4) and (5) both have hash codes that start with 01101 00101. Since we have at most 32 branches at each level, so can only encode 5 bits of hash code, we stick in a SingleArrayHashNode to avoid wasting 31 empty array elements on the shared 01101 prefix. You can see that the first 15 bits of entry (4)'s hash code are therefore 01101 00101 01011, while entry (5)'s hash code begins with 01101 00101 11101.

I hope this example explains the basic principle a little more clearly. (I've read a number of explanations of hash tries, but none have struck me as particularly good. In particular, several of them try to "simplify" down to 1 bit per level, creating a binary tree. I actually found that more confusing, personally.)

The interface and implementation

The basic ImmutableMap interface is similar to that of java.util.map, except that the put and remove operations need to return the new map. Also, rather than having get return null (when no entry is found), I've decided to make it return an Option (described in a previous post).

My implementation starts with an abstract base class, ImmutableHashTrieMap, which implements these operations based on our sliding 5-bit window looking at hash codes:

As you can see, the window size is hardcoded as a constant, so we can experiment to find the optimal branch factor. (Smaller branch factors mean deeper trees, while bigger branch factors lead to larger array copies when replacing a branched node.)

Then we just need to implement each of the different concrete node types, with an understanding of how they get replaced when adding and removing nodes.

EmptyHashNode

The simplest node type is EmptyHashNode, which is a singleton (since all empty nodes are the same after type erasure).

The operations on EmptyHashNode are as follows:

  • put: Returns a new EntryHashNode containing the given key/value pair.
  • remove: Returns this, since there is nothing to remove.
  • get: Returns None, since no entry may be found here.

EntryHashNode

Each EntryHashNode holds a single key/value pair. It defines the operations as follows:

  • put:
    • If the given key is equal to this entry's key, return a new EntryHashNode, replacing this node.
    • If the given key is not equal to this entry's key, but they have the same hash code, return a new ListHashNode containing this entry and the new entry.
    • Otherwise, return a new ArrayHashNode (Branched or Single, depending on whether the current hash code windows collide) with this entry and the new entry as children.
  • remove: If the given key matches this entry's key, return the EmptyHashNode. Otherwise, the given key was not found, so return this.
  • get: If the given key equals this entry's key, return this entry's value. Otherwise, return None.

ListHashNode

ListHashNode is used in the unfortunate event of a hash collision (that is, different keys, same hash code). It is implemented as an immutable list of key/value pairs (using the Tuple2 class from a previous post).

The operations on ListHashNode are as follows:

  • put:
    • If the given key's hash code is different from that of the entries in this node (since they all have the same hash code), return a new ArrayHashNode (Branched or Single, depending on whether the current hash code windows collide) with this node and a new EntryHashNode (for the new entry) as children.
    • If the given key is equal to the key of an entry in this node, return a new ListHashNode of the same size, with the new entry replacing the old entry.
    • If the given key is not equal to the key of any entry in this node, but the hash code is the same as that of the entries in this node, return a new ListHashNode with the new entry prepended to this node's entries.
  • remove:
    • If the given key is not equal to the key of any entries in this node, return this.
    • If the given key is equal to the key of an entry in this node, return a new ListHashNode containing the list of entries minus the removed one. However, if the new list would have size 1, return a new EntryHashNode instead.
  • get: If the given key matches the key of an entry in this node, return that node's value. Otherwise, return None.

This is the last of the leaf nodes. Next, we get into the internal nodes, BranchedArrayHashNode and SingleArrayHashNode. Neither of these actually hold data. They simply hold references to deeper nodes, grouped by partial hash codes. I'm going to leave off the diagrams for those nodes, since they get pretty complex (since the internal nodes have children).

ArrayHashNode

Both SingleArrayHashNode and BranchedArrayHashNode inherit from ArrayHashNode, a marker abstract class that indicates that they are non-leaf nodes. ArrayHashNodes need to know how to compute a partial hash code from a full 32-bit hash code and a shift parameter. The formula I used is hash >> shift & 0x1F, which actually takes the lowest 5 bits at each shift level. (So, in my example tree above, I actually lied -- we're not actually looking at prefixes, but rather a sequence of 5-bit suffixes. This is usually more efficient for many of Java's builtin hashCode functions, since they tend to vary heavily on the low bits and less on the high bits. This makes sense when your hash-based structures rely on modular arithmetic, like the builtin implementation of HashMap. For more randomly-distributed hashCode implementations, the choice of how you slice your bits makes no difference.)

SingleArrayHashNode

This represents an internal node that has a single child (so we don't waste a 32-element array on that one child). That child is in one of 32 possible branches, so we need to keep track of which branch it's in, given by the partial hash code that it and all of its children share. Therefore, SingleArrayHashNode stores the child index (in the 0-31 range) and the child node. The child of a SingleArrayHashNode should always be an ArrayHashNode, whether another SingleArrayHashNode or a BranchedArrayHashNode. If a SingleArrayHashNode had a leaf node child, that child could simply be promoted toward the root, replacing the SingleArrayHashNode. By this axiom, we can guarantee that a chain of SingleArrayHashNodes ends in a BranchedArrayHashNode.

Here are the operations on a SingleArrayHashNode:

  • put:
    • If the 5-bit partial hash code of the given key matches this node's partial hash code, then delegate the put to this node's child and return a new SingleArrayHashNode with the result of that operation as a child.
    • Otherwise return a new BranchedArrayHashNode with this node's child and a new EntryHashNode (to hold the new entry) as children.
  • remove:
    • If the 5-bit partial hash code of the given key does not match this node's partial hash, return this. There is nothing to remove.
    • Otherwise, delegate the remove call to this node's child. If the resulting node is not an instance of ArrayHashNode, return it (since it's a leaf node and doesn't need a SingleArrayHashNode as a parent). Otherwise, return a new SingleArrayHashNode with the new child node (and the same partial hash).
  • get: If the 5-bit partial hash code of the given key does not match this node's partial hash, return None. Otherwise, return the result of delegating the call to this node's child.

BranchedArrayHashNode

This is the node type that actually makes the whole structure work. It doesn't store any data itself, but it provides a dispatch mechanism to zero in on the subtree that contains the relevant (partial) hash code. The BranchedArrayHashNode has an array of 32 ImmutableHashTrieMaps (meaning nodes), where unoccupied branches point to the EmptyHashNode. Since we have SingleArrayHashNode, we guarantee that BranchedArrayHashNode has at least two non-empty children.

The operations are comparatively expensive, since puts and removes usually involve creating a copy of the 32-element array and replacing a single child.

Here are the operations:

  • put: Delegate the put call to the relevant child, based on the 5-bit partial hash code of the given key. Return a new BranchedArrayHashNode with that child replaced with the outcome of the delegated call. If the relevant child was previously empty, the size of the new BranchedArrayHashNode is incremented by 1.
  • remove: Determine the relevant child based on the 5-bit partial hash code of the given key. If the relevant child is already empty, return this. Otherwise, delegate the call to the relevant child. If the returned node is empty (i.e. the child was an EntryNode that was removed), the new size will be 1 less than the current size, otherwise the new size will be the old size. If the new size is 1, return a new SingleArrayHashNode containing the remaining child. Otherwise, return a new BranchedArrayHashNode with the relevant child replaced by the outcome of the delegated remove call.
  • get: Return the result of delegating the call to the relevant child, given by the 5-bit partial hash code. (If the relevant child is the EmptyHashNode, it will take care of returning None.)

Conclusions

You can check out the complete implementation on GitHub, here.

I ran some microbenchmarks, adding and removing the integers from 0 to 1 million (in random order), comparing against the builtin Java HashMap. I found that my ImmutableHashTrieMap was about 2.5 times slower. Increasing to 10 million, I was 5 times slower.

I think I might be able to improve on my time by adding classes to handle other internal nodes with a small number of branches (like what I did with SingleArrayHashNode). I may try implementing specialized internal node classes with up to four child branches, since I believe the cost of checking each child would be less expensive than the additional array copies involved in working with BranchedArrayHashNode.


Monday, June 11, 2012

Intro to Functional Programming (with examples in Java) - Part 5

Tuples

In this post, we're going to cover a less-exciting, but still surprisingly useful concept from the world of functional programming. Tuples are a way of producing "Cartesian products" of types. Thinking about it mathematically, types can be characterized as sets of valid values. Like a mathematical Cartesian product, where we define a composite value that draws a component from each of the underlying sets, a tuple is an object that encapsulates objects of each of the underlying types.

What are tuples?

Tuples are objects that group other objects together according to some logical need. They have very little in the way of functionality on their own. That is, you don't want a tuple that "does stuff". It's just data. In a way, Java beans can be considered a form of tuple -- they group together a set of fixed set of (heterogeneous) fields so they can be passed around. I kind of think of them like structs in C. They have fields but no "behaviour". (Our tuples will have methods, but they will be convenience things, like equals, hashCode, and toString.)

What are tuples not?

Tuples are not general-purpose collections. They are not lists of objects of some type, since they are made up of a fixed-sized set of heterogeneous components.

What are tuples good for?

Tuples are useful for returning a (fixed-sized, related) group of values without having having to create a new class. Suppose you have a private method that needs to return a String and an int. Normally in Java, you would need to create a small wrapper class to encapsulate the two return values, which isn't necessarily a bad thing. If the two values go together, the "thing" that they represent together should probably have a name, to improve readability. That said, since this hypothetical method is private, you know it's only going to be used within the current class, so you can choose instead to return a Tuple%lt;String, Integer%gt;. Similarly, there are times within a method body where you may end up constructing a pair of arrays of different types, and iterating through them with a common index (with checks and double-checks to make sure that the arrays really have the same length). Instead, if you can create a single array of tuples, your new "composite type" still doesn't escape the scope of the method.

A simple Tuple class

Here is a simple, immutable 2-dimensional Tuple class:

It's pretty simple -- it's like an immutable struct in C, where the fields don't have names, but they have positions. Let's see an example of this tuple class in action:

In this test, we managed to do a simple operation on a 2D integral vector (returning a 2D integral vector) and operate on a "person", without defining an IntegerVector2D class or a Person class. That said, in practice, if you're frequently dealing with vectors or people, you should probably have classes. In particular, Tuple2%lt;String, Integer%gt; doesn't tell you what the String and Integer types represent. You could totally pass a product SKU and inventory count to describePerson and have it tell you "DV014533423H is 56 years old". As I mentioned above, tuples are most useful for those cases where you want to group values at a very local scope (i.e. as local variables, where a private static class would, by necessity, escape the method scope, or, worst-case, as a return value of a private method).

Making tuple construction less painful

The construction of tuples in the above test case is pretty verbose, partly thanks to the repeated type parameters (which are largely eliminated in Java 1.7), but also because of the new keyword and the specification of the exact size of tuple we want. Instead, we can make factory methods in a TupleUtils class as follows:

This allows us to rewrite the above test case a bit more succinctly:

Where are the functions?

Since this is a series on functional programming, we should actually see some functions, right? Well, there's an interesting analogue between tuples and function parameter lists. In particular, function parameter lists are heterogeneous, positional values grouped together for a common purpose. Instead of passing a list of parameters to a function, you could pass a single tuple value. In effect, every FunctionN<R, T1, ..., TN> is isomorphic to a Function1<R, TupleN<T1, ..., TN>>. Furthermore, this isomorphism can be described in code by decorating FunctionN with a tupled Function1 and vice-versa, as follows:

There are the functions! Now what?

Recall from my third post in this series, where I presented the MemoizedFunctionN subclasses of FunctionN, and pointed out that other functionality could be added by decorating existing functions. Unfortunately, for every new piece of functionality you want to add according to that pattern, you need to generate wrapped versions of all FunctionN, potentially making use of custom currying on the functions of 2 or more parameters.

Using tupled and untupled, you can define your decorator in a Function1 subclass, and tuple your FunctionN down to a Function1, decorate it, then untuple the decorated class.

As an example, let's create a decorator that logs the input and return values for every call to a given Function1. It will need a Function1 to wrap, a stream to which it will log, and a name for the function (so the log messages will have some context).

Let's confirm that this works with a Function1, before trying functions of more parameters.

Now, let's try it with a Function2 that is tupled and untupled:

I wanted to rewrite memoization in terms of tupling and untupling functions, but my previous work at implementing all functions of more than one parameter via currying gets in the way (along with the whole notion of needing a recursion target). That said, memoization is a bit of an anomaly in terms of the complexity of decorating functionality. Most other cases I can think of are like this logging example, where tupling and untupling "just work", and you don't need to create your own curried chain of LoggingFunctionN equivalents.

Conclusions

The major takeaway from this post is the fact that tuples provide another (arguably simpler) way of looking at functions of more than one parameter as being equivalent to Function1 (compared to currying). Tuples can also provide a convenient way of returning more than one value without having to create a new class to hold the composite value.

To avoid writing TupleN classes by hand, I created a TupleGenerator class. You give it a source directory, a target package, and maximum number of tuple dimensions (Scala supports up to 22), and it will generate for each dimension a reasonable TupleN class with hashCode, equals, and toString defined in terms of the underlying elements. It also creates the TupleUtils utility class with overloaded static methods for tuple (to instantiate new tuples), tupled (to turn a FunctionN<R, T1, ..., TN>> into a Function1<R, TupleN<T1, ..., TN>>), and untupled (to turn a Function1<R, TupleN<T1, ..., TN>> into a FunctionN<R, T1, ..., TN>> )

Saturday, April 21, 2012

Intro to Functional Programming (with examples in Java) - Part 4

Immutable data structures

In the past few posts, we've seen some interesting things that can be done with first-class and higher-order functions. However, I've glossed over the fact that functional programming also tends to rely on parameters being immutable. For example, in the previous post, memoization relies on an underlying HashMap, where the keys are function arguments. We assumed that the arguments implemented reasonable hashCode and equals methods, but things fall apart if those function arguments can be modified after they've been memoized. (In particular, their hashCode method should return a new value, but the existing memoization HashMap has already placed them in a bucket based on their previous hashCode.) Since functional programming dictates that the same function called with the same arguments should always return the same value, one of the easiest ways to guarantee that is to ensure that inputs and outputs are immutable.

Immutability also makes it easier to guarantee thread safety. If a given value is immutable, then any thread can access it, and read from it, without fearing that another thread may modify it (since it cannot be modified).

In some of my first few posts, I described an implementation of immutable binary trees. In this post, we're going to look at some simpler immutable data structures, namely ImmutableLists and Options.

Immutable Lists

For lists, we're going to implement a singly linked list structure that supports only two "modification" operations, prepend and tail. These add a new element at the head of the list, and return the list without the head element, respectively. These are both O(1) operations, since prepend simply creates a new head node whose tail is the previous list, while tail returns the list pointed to by the tail of the head node. Neither operation modifies the existing list, but rather returns "new" list reference (where prepend creates a new list reference, and tail returns a list reference that already exists). To extract that actual elements from the list, we'll have a single operation, head that returns the element at the front of the list. Readers familiar with Lisp should recognize head and tail as car and cdr. For convenient list traversal, we'll use the familiar Java Iterable interface, so we can use the friendly Java 1.5 "foreach" syntax. Also, for convenience, we'll define an isEmpty method.

As with the immutable binary trees, we're going to take advantage of immutability to assert that there is only one empty list (regardless of type arguments), and create a specialized singleton subclass for it, called EmptyList. Everything else will be a NonEmptyList. Here is the code:

That's not terribly complicated, but also a little unpleasant to use. In particular, the only way we have of creating a list is via the nil() factory method, which returns an empty list, followed by a series of prepend() operations. Let's add a factory method that takes a varargs parameter to produce a list. Since we only have the prepend operation at our disposal, we'll need to iterate through the varargs array backwards:

Let's create a quick test to confirm that the list factory method and list iteration actually work:

Now, since it's a frequently used operation, let's add a size method. The size of a list can be defined recursively, as follows: the size of the empty list is 0, the size of a non-empty list is the size of its tail plus 1. As a first try, we'll define size exactly that way:

That's nice and simple. Let's create a unit test for it:

Uh-oh -- at least on my machine, that test triggered a StackOverflowError. If it doesn't trigger a StackOverflowError on your machine (which should be unlikely -- the Java stack usually overflows around 50k calls), try increasing the value of the size local variable in the test. The problem is that our recursive definition is triggering 100k recursive calls to size(). But "Michael," I hear you say, "aren't recursive calls a fundamental keystone of functional programming?" Well, yes, but even in Scheme, Haskell, or Erlang, I believe this particular implementation would blow the stack. For those languages, the problem is that our method is not tail-recursive. A tail-recursive method/function is one where any recursive call is the last operation that executes. Since we add 1 to the result of the recursive call to size, that addition is the last operation to execute. Many functional languages implement what's called tail-call optimization, which basically turns tail-calls into a goto that returns to the beginning of the method. To clarify, let's first implement a tail-recursive version of size:

In this case, we've implemented a helper method that keeps a so-called accumulator, that keeps track of the size up to now, in the recursive call stack. The recursive call to sizeHelper is simply modifying the input parameters, and jumping back to the beginning of the method (which is why tail-call optimization is so easy to implement). Unfortunately, Java compilers do not traditionally implement tail-call optimization, so this code will still cause a StackOverflowError on our unit test. Instead, we can simulate tail-call optimization by optimizing by hand:

If Java did support tail-call optimization, it would produce roughly the same bytecode as the code above. The code above passes the unit test with flying colours. That said, because Java does not support tail-call optimization, we are probably better off inlining sizeHelper to produce the following:

Note that Scala, a JVM language, would actually optimize the previous tail-call (by producing the goto bytecode to return to the start of the helper method), so we wouldn't have to optimize it into a while loop. That said, if you come from an object-oriented background, the optimized version may actually be more intuitive. Where Scala is not able to match "traditional" functional programming languages is with "mutual tail-calls", where function a tail-calls function b, which tail-calls function a. I believe you could implement this by hand in C by longjmping to the beginning of the body of the other function/procedure, but my understanding is that the JVM's builtin security requirements prevent you from gotoing to code outside the current method. Basically, the goto bytecode is only designed to make control statements (if, while, for, switch, etc.) work within the current method, I think. If I recall correctly, Clojure (another JVM language) is able to produce optimized mutual tail-calls if you use "trampolines", but I have no idea how those work at the bytecode level.

For another example of a function that we can implement with tail-recursion (and would be implemented with tail-recursion in languages that lack while loops), consider reversing a list. When we reverse a list, the first element becomes the last element, then we prepend the second element to that first element, then prepend the third element to that, etc. As a recursive function, we pass in the original list and an empty list as the initial accumulator value. We prepend the tail to the accumulator, and recurse with the tail of the original list and the new accumulator value:

As with size, this implementation of reverse will overflow the stack on large lists. Also, like with size, we can perform the tail-call optimization by hand in almost exactly the same way:

Below, we will make extensive use of reverse, and use while loops to implement several other methods that would traditionally (in the functional world) be written with tail-recursive functions.

Anyway, let's move on to an even simpler collection type, which essentially makes null obsolete.

Option

The Option type is effectively a collection of size zero or one, with the semantics of "maybe" having a value. (In fact, the Haskell equivalent of Option is the Maybe monad. I personally like the name Maybe better, since the idea of a function that returns Maybe int nicely expresses that it may return an int, but might not. That said, I've decided to go with Scala's naming in this case, and call it Option.)

Why would we want to return or keep an Option value instead of just returning or keeping an object reference? We do this partly to explicitly document the fact that a method may not return a valid value, or that a field may not be initialized yet. In traditional Java, this is accomplished by returning null or storing a null object reference. Unfortunately, any object reference could be null, so this leads to the common situation of developers adding defensive null-checks everywhere, including on return values from methods that will never return null (since they are guaranteed to produce a valid result in non-exceptional circumstances). With Option, you're able to explicitly say "This method may not have a defined return value for all inputs." Assuming you and your team settle on a coding standard where you never return null from a method, you should be able to guarantee that a method that returns String returns a well-defined (not null) String, while a method that returns Option<String> might not.

As with our other immutable collections, we'll follow the same pattern: abstract base class (in this case Option), with a singleton subclass for the empty collection (called None) and a subclass for non-empty collections (called Some). As with immutable lists, there will be no public constructors, but instances will be made available through factory methods on the base Option class.

Now, if we have a method that returns an Option, we can call isDefined() before calling get() to avoid having an exception thrown. Of course, this is only marginally better than just using null pointers, since the Option warns us that we should call isDefined(), but we're still using the same clunky system of checks. It would be nicer if we could just say "Do something to the value held in this Option, if it is a Some, or propagate the None otherwise", or simply "Do something with a Some, and nothing with a None". Fortunately, we can do both of these. Let's do the second one first, by making Option iterable:

Here is a test showing how we can iterate over Options to execute code only if there is an actual value:

Using this idiom, we can wrap the None check and the extraction of the value into the for statement. It's a little more elegant than writing if (a.isDefined()) a.get(), and ties in nicely with the idea of thinking of Option as a collection of size 0 or 1.

We can also replace a common Java null idiom, where we use a default value if a variable is null, using the getOrElse method:

Here is a test that shows how that works (and verifies the standard get() behaviour):

Before getting into some other collection operations that we'll add to ImmutableList and Option, let's add a convenient factory method that wraps an object reference in Some or returns None, depending on whether the reference is null.

And the test:

Higher-order Collection Methods

In the second post in this series, we saw the map and fold higher-order functions. Since these require a List parameter anyway (and, specifically, according to my previous implementation, one of those nasty mutable java.util.Lists), for convenience, we can add map and fold operations to our immutable collections as methods. While we're at it, we'll add two more higher-order functions, filter and flatMap. Since all of our collections should support for-each syntax, we'll specify the contract for our methods as an interface extending Iterable.

We'll implement these methods on ImmutableList and Option. Let's look at map first:

To establish that these map methods word, let's try mapping a function that doubles integers:

Next, we'll move on to the two folds. For an ImmutableList, the foldLeft operation traverses the list in the natural order, while a foldRight is easier if we reverse the list first. For Option, since there is (at most) one element, the only difference between the folds is the order in which parameters are passed to the function argument. To distinguish between foldLeft and foldRight, we need to use a non-commutative operation, so we'll use string concatenation in the tests.

The filter method returns a sub-collection, based on a predicate that gets evaluated on every element of the original collection. If the predicate returns true for a given element, that element is included in the output collection. For Option, filter will return a Some if and only if the original Option was a Some and the predicate returns true for the wrapped value. For an ImmutableList, the code for filter is similar to map, but we only copy into the accumulator the values that satisfy the predicate.

Our last higher-order method is a little more complicated. flatMap takes a function that returns collections, maps it across all elements in our collection, and then "flattens" the result down to a collection of elements. Say you have a function that takes an Integer and returns an ImmutableList. If you map that function over an ImmutableList<Integer>, the result is an ImmutableList<ImmutableList<Integer>>, or a list of lists of integers. However, if you flatMap it, you get back an ImmutableList<Integer> consisting of the concatenation of all of the lists you would get using map. Interestingly, both map and filter can be implemented in terms of flatMap. For map, we simply return wrap the return value from the passed function in single-element lists (or Some), while for filter we return a single-element list (or Some) for elements where the predicate evaluates to true and empty lists (or None) for elements where the predicate is false. Here are implementations and tests for flatMap:

A nice way of combining these two collection types is to flatMap a function that returns Option across an ImmutableList. The elements that evaluate to None simply disappear, while the Some values are fully realized (as in, you don't need to invoke get() on them).

Summary

In this post, we got to see a couple of immutable collection types.

While ImmutableList as I've implemented it takes a performance hit by using its own interface (since it tends to have to call reverse to provide consistent ordering), there are a couple of improvements that could be made. One would be to provide additional implementations of the methods that would contain all of the logic before the call to reverse, for the cases where order doesn't matter, and then implement the ordered versions by calling the unordered version followed by a call to reverse. The other would be to make NonEmptyList's tail member non-final, and make use of our private access to implement efficient appends by the higher-order methods. The code for that would be a fair bit more verbose (since we would have to hold a reference to the element preceding the terminal EmptyList). That said, it's possible that the additional operations per iteration might outweigh the cost of traversing (and creating) the list twice.

I think Option is one of my favourite types from Scala. The idea that Option gives you a warning through the type system that a value may not be initialized is fantastic. Sure, you're wrapping your object in another object (at a cost of 8 byes on a 32-bit JVM, I believe), but (with some discipline) you have some assurance that things that don't return Option will not return null. With the option() factory method, you can even wrap the return values from "old-fashioned" Java APIs. Unfortunately, I haven't yet seen a language where NullPointerExceptions are completely unavoidable (or aren't masked by treating unitialized values as some default). Since Scala works on the JVM, null still exists. Haskell has the "bottom" type (which actually corresponds more closely to Scala's Nothing), which effectively takes the place of null (as I understand it -- I'm a dabbler at best in Haskell), though the use of bottom is heavily discouraged, from what I've read.

In between these collection types, I managed to sneak in an explanation of tail-call optimization, and how it can be used to avoid blowing your stack on recursive calls on a list. Tail-calls are not restricted to recursive functions (or even mutually-recursive sets of functions), but the stack-saving nature of tail-call optimizations tend to be most apparent when recursion is involved. The other bonus of TCO (with apologies for my vague memories of some form of assembly) is that the tail-call gets turned into a JMP ("jump" or "goto"), rather than a JSR ("jump, set return"), so the final evaluation can immediately return to the last spot where a tail-call didn't occur. That said, I should remind you that tail-call optimization only works when you're delegating your return to another method/function/procedure (that is, the compiler doesn't need to push your current location onto the stack in order to return). When you're traversing a binary tree, and need to visit the left and right children via recursion, at least the first recursive call is not a tail-call. Of course, in Java, with default JVM settings, in my experience, the stack seems to overflow around 50k calls. If you have a balanced binary tree with 50k levels, then you probably have enough elements to make traversal outlast the earth being consumed by the sun, in which case a StackOverflowError is the least of your worries, since you and everyone you love will be long dead. (Actually, since the recursion is likely to be depth-first, you're lucky -- the stack will overflow quite quickly, and you can call your mom to remind her that you love her and you're glad that she didn't die while your program ran. Moms love that kind of thing.)

For some really interesting information about tail-call optimization on the JVM at the bytecode level, I suggest I you read John Rose's blog post on the subject from 2007.

I think this may be the longest post in the series so far. In particular, I've gone whole-hog on using unit tests to both establish that my code actually works (give or take copy-paste errors between my IDE and GitHub gists) and show examples of how to use these methods.

I must confess that this post was also more comfortable to write. Java doesn't make writing functions as objects terribly pleasant (since it wasn't really designed for it). I think the code in the last couple of posts involved more type arguments than actual logic. Working with data types is much easier, by comparison.

Where do we go from here?

I had been thinking about trying to implement a mechanism to combine lazy-loading of Function0 values with parallel execution, by replacing uninitialized values with Futures linked to a running thread on a ForkJoinPool. Unfortunately, the more I think about it, the more I realize that "parallelize everything" is a really bad idea (which is why nobody actually does it). I might be able to do it like memoization, where you can selectively memoize certain functions that are frequently used on a small set of inputs, so you would be able to parallelize only your "hot spot" functions. Unfortunately, my memoization implementation is kind of an ugly hack, requiring a parallel MemoizedFunctionN hierarchy, and I'm not particularly enthusiastic about creating a separate ParallelizedFunctionN hierarchy. Sure, I can write code generators for these different function specializations, but it's still not particularly elegant.

The last remaining "standard" functional programming concept I can think of that I haven't covered (besides pattern matching, which I don't think I can implement nicely in Java) is tuples. If I dedicate a post exclusively to tuples, I think it should be fairly short. After these last couple of lengthy posts, that may be a good way to go.

Saturday, March 10, 2012

Intro to Functional Programming (with examples in Java) - Part 3

Let's recap the last two entries in this series, and what we learned:
  • In the first post, we learned that we can implement functions as objects in Java and pass them as arguments to other functions or more traditional methods. Functions that can be passed around are called first-class functions. First-class functions can be implemented in C/C++ as function pointers, while languages that are more conventionally "functional", like Lisp, Haskell, Scala, or Javascript, simply treat functions as another "thing" that can be passed around. Functions that take other functions as parameters are called higher-order functions. This is similar to passing an object that implements a callback interface. Listeners in Swing are like function objects, though they may have multiple entry-points. Our function objects have a single "apply" method that basically means "call me".
  • In the second post, we learned about currying and partial application, which effectively say that any function with n parameters can be thought of as function with one parameter that returns a function with n-1 parameters. Once we hit zero parameters, we actually invoke the logic we've specified. (In this post, we're going to look at not invoking the logic right away.) We also looked at two of the more important higher-order functions, map and fold. The map function transforms a collection to another collection by applying a function of one parameter to each element in the original collection and accumulating the results in the new collection. The fold function aggregates values from a collection, producing a new single value, by applying a binary function that take the current output and a value from the original collection. In some contexts, fold is also called reduce. Thus, these two higher-order functions form the basis of so-called map-reduce algorithms, where you would typically apply the map step in parallel on multiple nodes, and aggregate the results via a fold on a single node. (Sometimes the fold can be parallelized, though it's not generally as trivial as parallelizing a map operation. As an example of a parallelizable fold, consider computing the sum of a list of integers. The sum of the whole list can be computed by partitioning into sublists and mapping the sum fold over each sublist, then folding the sum function over the resulting list of sums. Anyway, this is just a teaser for a future post on parallelizing function evaluation.)

Lazy Evaluation

In the second post, I introduced partial application, giving the example of a Function2 that returns a Function1, and explained that this can be generalized to any FunctionN returning a Function(N-1) after binding the first parameter.

Similarly, we can apply the same logic to a Function1, and have it return a Function0, instead of a result.

What is a Function0? It's a function that takes no parameters, and returns a value. Since one of the tenets of functional programming is consistency (that is, a function, given the same arguments, should always return the same value), a Function0 (taking no arguments) produces a constant value. That is, a Function0 effectively is a constant value. For example, you could define a Function0 called Five, whose apply() method returns the Integer 5. However, if a Function0 is produced from a Function1 (or a FunctionN), it doesn't evaluate that value until you ask for it. If it retains the value from the first evaluation, and returns it rather than reevaluating on subsequent calls to apply() (since every evaluation should produce the same value), then you've effectively implemented lazy evaluation.

Recall the definition of Function1, from the first post in this series:

We have a function that takes a T and returns an R, for some types T and R, via the apply method. We can change this, so instead of returning an R directly, we return a Function0<R>. That Function<R> will return the R once its apply() method is invoked.

Here is a simple definition for a Function0 with lazy evaluation: And here is a modified version of Function1, where we've renamed the previous abstract apply method to evaluate, to highlight the fact that when you call it, the code actually gets evaluated. When you simply apply the function, you get back something that can eventually be evaluated (ideally via Function0's get method).

We can also extend this to any FunctionN, by redefinining their total application methods to use partial application (taking advantage of currying to treat them as Function1s):

I've chosen to make evaluate public throughout to allow the option of strict evaluation, meaning "I want you to actually compute this now, and not store up the things that you are going to do when I eventually call get()". While lazy evaluation with functional programming kind of allows you to build up a planned execution path at runtime and then trigger it with one call to get(), sometimes you do need to fully compute something before returning from the current function.

Memoization

If you were paying close attention on the currying section of the previous post, you may have noticed that we are creating new function objects every time we do a partial function application. This same problem extends to our creation of Function0 objects above. This is expensive, since each of these function objects consumes some memory (which may be short-lived, but there is still some overhead), and it kind of defeats the purpose of lazy loading. In particular, if we're creating a new Function0 every time we finish passing all of our parameters, then we would need to retain that specific Function0 to ensure that we don't reevaluate the logic the next time we need the result.

We can extend Function1 to remember the value passed as the parameter to apply, and hold onto the resulting Function0, to return it the next time we're called with that same parameter. This process of returning the same value based on remembered arguments is known as memoization.

I am making the assumption that the parameter passed is of a type T that implements hashCode and equals in a reasonable way.

Using an instance of this MemoizedFunction1 class, we know that its logic will be evaluated exactly once for each parameter value. Of course, this comes at a cost: for each parameter value, we retain the evaluation result object in memory (wrapped in a Function0, which itself adds a few more bytes). If you have a function that will be applied over a significant number of inputs, or is rarely applied to the same value twice (e.g. string processing functions operating on user input sent to a server have a decent chance of seeing billions of different strings), then the overhead of memoization likely would not pay off. This is why I chose to create a separate class, rather than simply adding memoization to Function1 directly. You can use a regular Function1 when you don't need memoization (which is likely to be most of the time), and a MemoizedFunction1 when you do. Since they use the same API (you just override evaluate), there's no additional thinking cost./p>

Let's confirm that the memoization works with a unit test, using everyone's favourite memoization and dynamic programming example, the Fibonacci function:

I selected the expected timings to give consistent results on somewhat faster and slower machines than mine. On my machine, the unmemoized version takes around 3.5 seconds, while the memoized version takes less than 1 millisecond.

Runtime memoization and recursive memoization

The previous example shows that, when writing your code, the memoized and unmemoized versions of a function are identical, except that one is instantiated as a MemoizedFunction1 and the other just a regular Function1. What if you want to dynamically memoize a function at runtime? We can implement a utility method to "upgrade" a regular Function1 as follows:

Let's create a test for it:

Unfortunately, that test will not pass (yet). The problem is with our method call targets, and the fact that we're creating a MemoizedFunction1 and wrapping it around a regular Function1. How do the method calls work out?

What we really want is something more like this (since MemoizedFunction1's apply method is where the memoization happens):

In order to route the call back to the outer MemoizedFunction1, it would be ideal if we could override this in the inner Function1. Unfortunately, we can't do that. Really, though, object-oriented programming can be mapped to procedural programming (and, indeed, I believe this is more or less how the first implementation of C++ worked) roughly as follows:

  • Every method is converted to a procedure with an additional this parameter, which is basically like a pointer to a C struct.
  • Every method call is transformed into a procedure call where the address of the object (or rather, it's struct representation) on the left-hand side of the ., or the pointer on the left-hand side of a ->, is passed as the this parameter.
  • Unqualified calls to methods or references to fields within the current objects can be thought of as using an explicit this->.

So, while we can't redefine this, we can make Function1's evaluate method take an extra parameter which will act in place of this for recursive calls. We'll call it self.

Note that for the base Function1, self really is this, as far as apply is concerned.

Now, we can redefine our Fibonacci function to use self.apply instead of just apply, and change MemoizationUtils to pass the MemoizedFunction's this as the self parameter.

With these changes, the testRuntimeMemoization test shown earlier now passes.

Memoizing functions of N parameters

Recall that because of currying, every function of N parameters can be thought of as a Function1 that returns a function of N - 1 parameters. This means that by implementing MemoizedFunction1, we get MemoizedFunctionN for free (almost). We just need to override the way that we curry, to return a MemoizedFunction1 that returns a MemoizedFunctionN-1:

And MemoizationUtils can have convenience methods to memoize them at runtime:

Summary

This is a post that gets into some of the "meat" of functional programming. Lazy evaluation can be quite powerful, as we defer evaluation of results until they are needed. Combined with memoization, we can ensure that we never reevaluate a function with the same parameters, using the previously computed value, instead. Memoization is not something you want to use everywhere, though. It's kind of a form of optimization, and, as Knuth said, "Premature optimization is the root of all evil". In a general program, if you memoize everything, you'll quickly exhaust your available memory. That said, because the functional paradigm allows considerable separation of concerns, we don't need to build memoization into any functions. The functions themselves can focus on their logic, and memoization can be tacked on as a cross-cutting concern (even at runtime, as we showed).

Where is memoization useful? In dynamic programming problems, you can define the problem in terms of a subproblem (or a problem on lesser inputs), with an eventual base-case. If you can write a function (or a collection of functions) that encapsulate the program logic in a naive way, you can probably just stick on a call to memoize and guarantee reuse of subproblem solutions. To a constant factor, it won't be as efficient as maintaining results in arrays with a hand-coded solution, but it's so much easier to write. If you need to solve dynamic programming problems as part of your job, and not just to compete in online contests against C programmers, writing your memoization logic once and transparently attaching it where needed may be a good approach.

The more interesting thing, in my opinion, about memoization is that it demonstrates how easy it is to attach a cross-cutting or orthogonal concern to your logic (without needing to resort to aspect-oriented programming). The same way that I wrote MemoizationUtils to attach memoization to a function could be used for other purposes. For example, we could attach a tracer that collects a capacity-bounded set of parameters passed to a function and the number of times the function is called. If a function is usually called with, fewer than, say, 1000 different parameters (of reasonable size to be used as keys) but is called millions or billions of times during a typical execution of the program, that function may be a good candidate for memoization. Similarly, I believe we could simply wrap each of our functions in a tracer that keeps track of the function's summed execution time and call count in a ThreadLocal variable to implement a basic in-process profiler.

This ties in with the higher-order functions in the previous post. Whereas a typical object-oriented programmer will again and again write code to turn a list into another list or compute some sort of aggregate value from a list (whether it be a max, min, sum, or whatever), functional programming allows you to separate what you are doing (transforming/aggregating a list) from how you are doing it (the problem-specific logic).

In the next post, I think I will introduce concurrency via parallel/asynchronous evaluation. Assuming that's where I go (since I tend to get sidetracked between posts), Function0 will be very prominent, as it's effectively a simplified version of a java.util.concurrency.Future. If I don't feel like diving into concurrency, I will probably end up covering immutability and immutable data structures. Immutable data structures are more traditional (and fundamental) to functional programming, but concurrency is pretty cool. It's always an adventure guessing where I'm going to go next.