Monday, June 11, 2012

Intro to Functional Programming (with examples in Java) - Part 5

Tuples

In this post, we're going to cover a less-exciting, but still surprisingly useful concept from the world of functional programming. Tuples are a way of producing "Cartesian products" of types. Thinking about it mathematically, types can be characterized as sets of valid values. Like a mathematical Cartesian product, where we define a composite value that draws a component from each of the underlying sets, a tuple is an object that encapsulates objects of each of the underlying types.

What are tuples?

Tuples are objects that group other objects together according to some logical need. They have very little in the way of functionality on their own. That is, you don't want a tuple that "does stuff". It's just data. In a way, Java beans can be considered a form of tuple -- they group together a set of fixed set of (heterogeneous) fields so they can be passed around. I kind of think of them like structs in C. They have fields but no "behaviour". (Our tuples will have methods, but they will be convenience things, like equals, hashCode, and toString.)

What are tuples not?

Tuples are not general-purpose collections. They are not lists of objects of some type, since they are made up of a fixed-sized set of heterogeneous components.

What are tuples good for?

Tuples are useful for returning a (fixed-sized, related) group of values without having having to create a new class. Suppose you have a private method that needs to return a String and an int. Normally in Java, you would need to create a small wrapper class to encapsulate the two return values, which isn't necessarily a bad thing. If the two values go together, the "thing" that they represent together should probably have a name, to improve readability. That said, since this hypothetical method is private, you know it's only going to be used within the current class, so you can choose instead to return a Tuple%lt;String, Integer%gt;. Similarly, there are times within a method body where you may end up constructing a pair of arrays of different types, and iterating through them with a common index (with checks and double-checks to make sure that the arrays really have the same length). Instead, if you can create a single array of tuples, your new "composite type" still doesn't escape the scope of the method.

A simple Tuple class

Here is a simple, immutable 2-dimensional Tuple class:

It's pretty simple -- it's like an immutable struct in C, where the fields don't have names, but they have positions. Let's see an example of this tuple class in action:

In this test, we managed to do a simple operation on a 2D integral vector (returning a 2D integral vector) and operate on a "person", without defining an IntegerVector2D class or a Person class. That said, in practice, if you're frequently dealing with vectors or people, you should probably have classes. In particular, Tuple2%lt;String, Integer%gt; doesn't tell you what the String and Integer types represent. You could totally pass a product SKU and inventory count to describePerson and have it tell you "DV014533423H is 56 years old". As I mentioned above, tuples are most useful for those cases where you want to group values at a very local scope (i.e. as local variables, where a private static class would, by necessity, escape the method scope, or, worst-case, as a return value of a private method).

Making tuple construction less painful

The construction of tuples in the above test case is pretty verbose, partly thanks to the repeated type parameters (which are largely eliminated in Java 1.7), but also because of the new keyword and the specification of the exact size of tuple we want. Instead, we can make factory methods in a TupleUtils class as follows:

This allows us to rewrite the above test case a bit more succinctly:

Where are the functions?

Since this is a series on functional programming, we should actually see some functions, right? Well, there's an interesting analogue between tuples and function parameter lists. In particular, function parameter lists are heterogeneous, positional values grouped together for a common purpose. Instead of passing a list of parameters to a function, you could pass a single tuple value. In effect, every FunctionN<R, T1, ..., TN> is isomorphic to a Function1<R, TupleN<T1, ..., TN>>. Furthermore, this isomorphism can be described in code by decorating FunctionN with a tupled Function1 and vice-versa, as follows:

There are the functions! Now what?

Recall from my third post in this series, where I presented the MemoizedFunctionN subclasses of FunctionN, and pointed out that other functionality could be added by decorating existing functions. Unfortunately, for every new piece of functionality you want to add according to that pattern, you need to generate wrapped versions of all FunctionN, potentially making use of custom currying on the functions of 2 or more parameters.

Using tupled and untupled, you can define your decorator in a Function1 subclass, and tuple your FunctionN down to a Function1, decorate it, then untuple the decorated class.

As an example, let's create a decorator that logs the input and return values for every call to a given Function1. It will need a Function1 to wrap, a stream to which it will log, and a name for the function (so the log messages will have some context).

Let's confirm that this works with a Function1, before trying functions of more parameters.

Now, let's try it with a Function2 that is tupled and untupled:

I wanted to rewrite memoization in terms of tupling and untupling functions, but my previous work at implementing all functions of more than one parameter via currying gets in the way (along with the whole notion of needing a recursion target). That said, memoization is a bit of an anomaly in terms of the complexity of decorating functionality. Most other cases I can think of are like this logging example, where tupling and untupling "just work", and you don't need to create your own curried chain of LoggingFunctionN equivalents.

Conclusions

The major takeaway from this post is the fact that tuples provide another (arguably simpler) way of looking at functions of more than one parameter as being equivalent to Function1 (compared to currying). Tuples can also provide a convenient way of returning more than one value without having to create a new class to hold the composite value.

To avoid writing TupleN classes by hand, I created a TupleGenerator class. You give it a source directory, a target package, and maximum number of tuple dimensions (Scala supports up to 22), and it will generate for each dimension a reasonable TupleN class with hashCode, equals, and toString defined in terms of the underlying elements. It also creates the TupleUtils utility class with overloaded static methods for tuple (to instantiate new tuples), tupled (to turn a FunctionN<R, T1, ..., TN>> into a Function1<R, TupleN<T1, ..., TN>>), and untupled (to turn a Function1<R, TupleN<T1, ..., TN>> into a FunctionN<R, T1, ..., TN>> )