Wednesday, May 1, 2013

Pattern Matching and Functional Structure

Taking a break (or possibly moving on) from Project Lambda posts, I've decided that it's a good time to look at pattern matching, a common approach in functional programming that leads to a similar end result as object-oriented inheritance polymorphism. That said, I'll use Project Lambda (JDK8) features when writing Java code (since it's much nicer). I'll also diverge from most of my previous posts by using Scala here, since the Java compiler doesn't support (elegant) pattern matching (though I've just discovered and may want to play around with JMatch, which apparently extends Java with pattern matching support).

An object-oriented example

For comparison, here is an object-oriented implementation of an immutable list in Java. An immutable list is either an EmptyList (a singleton that holds nothing) or a NonEmptyList (that holds a value and a reference to the rest of the list). To hide the implementation details from calling code, I've implemented the two cases as private static classes and exposed empty() as a factory method to get the EmptyList. The other operations (head, tail, prepend, and map) are instance methods of ImmutableList.

Algebraic Data Types

This linked list is a form of algebraic data type. Algebraic data types are made up of "products" (groups of fields — records or structs) or "sums" (disjoint unions) of other types. In particular, the ImmutableList type is the sum of EmptyList and NonEmptyList (since an instance comes from one set or the other), while NonEmptyList is the product of its field types (the generic type T for head and ImmutableList<T> for tail). EmptyList is a single-element set. Note that "product", in this context effectively means Cartesian product. Really, algebraic data types describe a set of possible values (in the mathematical definition of "set").

In mathematics, we tend not to talk about elements of a set "doing" things. Instead, we have functions and operators that act on elements of a set and produce other elements (either from the same set or from some other set). Thus, the object-oriented notion of member functions or methods doesn't really map well. Instead, we might define a function over a sum type by specifying partial functions over the underlying types and producing a resulting total function. We basically say, "If you live in this bucket, return something, whereas if you live in this other disjoint bucket, return something else".

Pattern Matching

In the orthodox functional programming world, this matching by partial functions is exactly what happens, by matching the "patterns" that describe the underlying types of a sum type. Before digging deeper, I think it helps to look at the same example above, written in Scala in a more purely functional way. (Since Scala was built to compile to JVM bytecode, it still has classes and objects, and our types are still effectively Java classes.)

There are several things to note in comparison to the Java code above:

  • Neither EmptyList nor NonEmptyList has any methods. They don't "do" anything.
  • EmptyList is a singleton by virtue of the Scala object keyword. It really is a single-element set.
  • The sealed trait keyword pair basically means "Everything that implements this interface is defined in this file". Since the MyImmutableList interface defines no methods, you can consider it purely a marker interface. Thus, we have the same level of implementation-hiding as we had with our private inner classes above. As a bonus, it's illegal for a class in another Scala file to announce that it implements MyImmutableList, which cannot be said for our abstract class in the Java example above.
  • The operations head, tail, prepend, and map are defined separately operating on a MyImmutableList as input.
  • For head, tail, and map, the implementations need to decide which particular part of the disjoint union of the sum type applies. They do this by pattern matching against the underlying types. This is logically equivalent to inheritance polymorphism in the object-oriented example.
  • For prepend no pattern match is necessary. The logic is the same in both cases. This is equivalent to defining the implementation in the abstract base class in the object-oriented example.
  • Pattern matching captures variables against the fields of the product type NonEmptyList, such that we can use them on the right-hand side of the case expression. By convention, if we don't care about a particular field, we use _ as the variable name.

In the end, it's another way of saying the same thing. That said, let's look at what each approach gives us. For this next bit, I'm totally stealing from Professor Dan Grossman and material I learned from his Programming Languages course on Coursera. As a "thank you" for that knowledge he gave to me for free, I encourage you to take his future courses and use up more of his time. I'm sure he will appreciate it.

First, let's consider the types and the operations, since that's what we want to implement:


In the object-oriented case, we basically implement the question marks by rows. That is, we implement one of EmptyList or NonEmptyList, filling in all of the methods, and then implement the other. If a particular column (or most of it) shares code, we implement it in the base class (overriding when needed).

In the functional case (using pattern matching), we implement the question marks by column. For each operation, we describe how it will work on each of the underlying types. If a particular column doesn't need to distinguish between types, we don't.

Another way of looking at this table is that the columns are verbs, and the rows are nouns.


Which approach is better? Given that my blog is about little excluding functional programming, you might expect me to say "the functional approach". The answer, of course, is, "It depends". Given that we spend way more time modifying code, refactoring it, and updating it, you need to ask yourself, looking at the above table, "Will I be adding more rows, or will I be adding more columns?". In most cases, you'll be adding more columns — that is, describing more operations on your data. There are, however, cases where you add more rows. In particular, I believe that it's no mistake that object-oriented programming took over at the same time as the "GUI revolution". Specifically, most graphical elements (widgets) can be summed up by two operations: paint and handleEvent. In that case, you are more likely to be adding rows, as you define different kinds of widgets. That said, I consider it unfortunate that today's "server revolution" and service-oriented architectures continue to shoehorn things into object-orientation, even when better approaches exist (and predate object-oriented programming).

In short, what would you rather your code do? Make things (create nouns)? Or do things (create verbs)? In practice, I believe most of us would rather focus on building actions, but work in development environments where we develop objects.

For some fun reading, I suggest that anyone who works in an object-oriented language (and especially folks working in Java) read Execution in the Kingdom of Nouns, a nice tongue-in-cheek look at Java anti-patterns like AbstractSingletonProxyFactoryBean (though I believe that article predates that particular monstrosity).

In the end, I am being quite disingenuous, and downright unfair to object-oriented programming. There are cases where grouping types under a single hierarchy (as one does in object-oriented programming) makes sense, even in the functional world. Specifically, my purely-functional Scala example is not how the Scala standard library implements immutable lists. Instead, they take an object-oriented approach, since it means that map can be implemented in terms of base traits like Iterable, so the same code can be reused for other types. Similarly, Haskell (the purest of semi-mainstream functional programming languages) has typeclasses, which are not object-oriented classes by any stretch of the imagination, but they allow you to define functions that take an instance of any type, so long as the type has certain associated functions that are defined in the given typeclass. (Okay... they're kind of like saying, "I will take an instance of any type that implements this particular interface".) All that to say, while I'm gently suggesting that the object-oriented mindset is not always the correct one, it still definitely has its place, even in the functional programming world.


Are you curious about what the above Scala example compiles to? It turns out that it's actually not very exciting (using Scala 2.9). It's pretty much like what you would hand-write in Java (except that the Scala compiler doesn't actually call Java 1.7's Objects.equals, but rather inlines similar logic):

No comments:

Post a Comment