Monday, June 11, 2012

Intro to Functional Programming (with examples in Java) - Part 5

Tuples

In this post, we're going to cover a less-exciting, but still surprisingly useful concept from the world of functional programming. Tuples are a way of producing "Cartesian products" of types. Thinking about it mathematically, types can be characterized as sets of valid values. Like a mathematical Cartesian product, where we define a composite value that draws a component from each of the underlying sets, a tuple is an object that encapsulates objects of each of the underlying types.

What are tuples?

Tuples are objects that group other objects together according to some logical need. They have very little in the way of functionality on their own. That is, you don't want a tuple that "does stuff". It's just data. In a way, Java beans can be considered a form of tuple -- they group together a set of fixed set of (heterogeneous) fields so they can be passed around. I kind of think of them like structs in C. They have fields but no "behaviour". (Our tuples will have methods, but they will be convenience things, like equals, hashCode, and toString.)

What are tuples not?

Tuples are not general-purpose collections. They are not lists of objects of some type, since they are made up of a fixed-sized set of heterogeneous components.

What are tuples good for?

Tuples are useful for returning a (fixed-sized, related) group of values without having having to create a new class. Suppose you have a private method that needs to return a String and an int. Normally in Java, you would need to create a small wrapper class to encapsulate the two return values, which isn't necessarily a bad thing. If the two values go together, the "thing" that they represent together should probably have a name, to improve readability. That said, since this hypothetical method is private, you know it's only going to be used within the current class, so you can choose instead to return a Tuple%lt;String, Integer%gt;. Similarly, there are times within a method body where you may end up constructing a pair of arrays of different types, and iterating through them with a common index (with checks and double-checks to make sure that the arrays really have the same length). Instead, if you can create a single array of tuples, your new "composite type" still doesn't escape the scope of the method.

A simple Tuple class

Here is a simple, immutable 2-dimensional Tuple class:

public final class Tuple2<T1,T2> {
public final T1 _1;
public final T2 _2;
public Tuple2(final T1 v1, final T2 v2) {
_1 = v1;
_2 = v2;
}
}

It's pretty simple -- it's like an immutable struct in C, where the fields don't have names, but they have positions. Let's see an example of this tuple class in action:

public class Tuple2Test {
private Tuple2<Integer, Integer> vectorAdd(Tuple2<Integer, Integer> v1,
Tuple2<Integer, Integer> v2) {
return new Tuple2<Integer, Integer>(v1._1 + v2._1, v1._2 + v2._2);
}
private String describePerson(Tuple2<String, Integer> nameAge) {
return nameAge._1 + " is " + nameAge._2 + " years old";
}
@Test
public void testTuple2() throws Exception {
Tuple2<Integer, Integer> v1 = new Tuple2<Integer, Integer>(1, 1);
Tuple2<Integer, Integer> v2 = new Tuple2<Integer, Integer>(3, 4);
Tuple2<Integer, Integer> vsum = vectorAdd(v1, v2);
assertEquals(Integer.valueOf(4), vsum._1);
assertEquals(Integer.valueOf(5), vsum._2);
assertEquals("Michael is 32 years old",
describePerson(new Tuple2<String, Integer>("Michael", 32)));
}
}

In this test, we managed to do a simple operation on a 2D integral vector (returning a 2D integral vector) and operate on a "person", without defining an IntegerVector2D class or a Person class. That said, in practice, if you're frequently dealing with vectors or people, you should probably have classes. In particular, Tuple2%lt;String, Integer%gt; doesn't tell you what the String and Integer types represent. You could totally pass a product SKU and inventory count to describePerson and have it tell you "DV014533423H is 56 years old". As I mentioned above, tuples are most useful for those cases where you want to group values at a very local scope (i.e. as local variables, where a private static class would, by necessity, escape the method scope, or, worst-case, as a return value of a private method).

Making tuple construction less painful

The construction of tuples in the above test case is pretty verbose, partly thanks to the repeated type parameters (which are largely eliminated in Java 1.7), but also because of the new keyword and the specification of the exact size of tuple we want. Instead, we can make factory methods in a TupleUtils class as follows:

public final class TupleUtils {
public static <T1,T2> Tuple2<T1,T2> tuple(final T1 v1, final T2 v2) {
return new Tuple2<T1,T2>(v1, v2);
}
public static <T1,T2,T3> Tuple3<T1,T2,T3> tuple(final T1 v1, final T2 v2, final T3 v3) {
return new Tuple3<T1,T2,T3>(v1, v2, v3);
}
// ... more tuple factory methods, up to highest-needed dimensionality ...
}

This allows us to rewrite the above test case a bit more succinctly:

public class Tuple2Test {
private Tuple2<Integer, Integer> vectorAdd(Tuple2<Integer, Integer> v1,
Tuple2<Integer, Integer> v2) {
return tuple(v1._1 + v2._1, v1._2 + v2._2);
}
private String describePerson(Tuple2<String, Integer> nameAge) {
return nameAge._1 + " is " + nameAge._2 + " years old";
}
@Test
public void testTuple2() throws Exception {
Tuple2<Integer, Integer> v1 = tuple(1, 1);
Tuple2<Integer, Integer> v2 = tuple(3, 4);
Tuple2<Integer, Integer> vsum = vectorAdd(v1, v2);
assertEquals(Integer.valueOf(4), vsum._1);
assertEquals(Integer.valueOf(5), vsum._2);
assertEquals("Michael is 32 years old",
describePerson(tuple("Michael", 32)));
}
}

Where are the functions?

Since this is a series on functional programming, we should actually see some functions, right? Well, there's an interesting analogue between tuples and function parameter lists. In particular, function parameter lists are heterogeneous, positional values grouped together for a common purpose. Instead of passing a list of parameters to a function, you could pass a single tuple value. In effect, every FunctionN<R, T1, ..., TN> is isomorphic to a Function1<R, TupleN<T1, ..., TN>>. Furthermore, this isomorphism can be described in code by decorating FunctionN with a tupled Function1 and vice-versa, as follows:

public final class TupleUtils {
// ... previous tuple factory methods ...
public static <R, T1,T2> Function1<R, Tuple2<T1,T2>>
tupled(final Function2<R,T1,T2> f) {
return new Function1<R, Tuple2<T1,T2>>() {
@Override
public R evaluate(Function1<R, Tuple2<T1,T2>> self,
Tuple2<T1,T2> i) {
return f.evaluate(f, i._1, i._2);
}
};
}
public static <R, T1, T2> Function2<R, T1, T2>
untupled(final Function1<R, Tuple2<T1, T2>> f) {
return new Function2<R, T1, T2>() {
@Override
public R evaluate(final Function2<R, T1, T2> self,
final T1 i1, final T2 i2) {
return f.evaluate(f, tuple(i1, i2));
}
};
}
public static <R,T1,T2,T3> Function1<R, Tuple3<T1,T2,T3>>
tupled(final Function3<R,T1,T2,T3> f) {
return new Function1<R, Tuple3<T1,T2,T3>>() {
@Override
public R evaluate(Function1<R, Tuple3<T1,T2,T3>> self,
Tuple3<T1,T2,T3> i) {
return f.evaluate(f, i._1, i._2, i._3);
}
};
}
public static <R,T1,T2,T3> Function3<R,T1,T2,T3>
untupled(final Function1<R, Tuple3<T1,T2,T3>> f) {
return new Function3<R, T1, T2, T3>() {
@Override
public R evaluate(final Function3<R, T1, T2, T3> self,
final T1 i1, final T2 i2, final T3 i3) {
return f.evaluate(f, tuple(i1, i2, i3));
}
};
}
// ... more tupling/untupling methods, up to highest-needed dimensionality ...
}

There are the functions! Now what?

Recall from my third post in this series, where I presented the MemoizedFunctionN subclasses of FunctionN, and pointed out that other functionality could be added by decorating existing functions. Unfortunately, for every new piece of functionality you want to add according to that pattern, you need to generate wrapped versions of all FunctionN, potentially making use of custom currying on the functions of 2 or more parameters.

Using tupled and untupled, you can define your decorator in a Function1 subclass, and tuple your FunctionN down to a Function1, decorate it, then untuple the decorated class.

As an example, let's create a decorator that logs the input and return values for every call to a given Function1. It will need a Function1 to wrap, a stream to which it will log, and a name for the function (so the log messages will have some context).

public class LoggingFunction1<R, T1> extends Function1<R, T1>{
private final Function1<R, T1> wrappedFunction;
private final PrintStream logStream;
private final String name;
private LoggingFunction1(final Function1<R, T1> wrappedFunction,
final PrintStream logStream, final String name) {
this.wrappedFunction = wrappedFunction;
this.logStream = logStream;
this.name = name;
}
@Override
public R evaluate(final Function1<R, T1> self, final T1 i1) {
logStream.println(name + " called with " + i1);
final R returnValue = wrappedFunction.evaluate(wrappedFunction, i1);
logStream.println(name + " returned " + returnValue);
return returnValue;
}
public static <R, T1> Function1<R, T1>
addLogging(final Function1<R, T1> wrappedFunction,
final PrintStream logStream, final String name) {
return new LoggingFunction1<R, T1>(wrappedFunction, logStream, name);
}
}

Let's confirm that this works with a Function1, before trying functions of more parameters.

import static functions.LoggingFunction1.addLogging;
public class LoggingFunction1Test {
@Test
public void testFunction1() throws Exception {
Function1<Integer, String> getLength =
new Function1<Integer, String>() {
@Override
public Integer evaluate(final Function1<Integer, String> self,
final String i1) {
return i1.length();
}
};
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
PrintStream outputStream = new PrintStream(bytes);
Function1<Integer, String> loggingGetLength =
addLogging(getLength, outputStream, "getLength");
loggingGetLength.apply("hello").get();
assertEquals("getLength called with hello\n" +
"getLength returned 5\n",
new String(bytes.toByteArray()));
}
}

Now, let's try it with a Function2 that is tupled and untupled:

import static functions.LoggingFunction1.addLogging;
import static tuples.TupleUtils.tupled;
import static tuples.TupleUtils.untupled;
public class LoggingFunction1Test {
// ... previous test case ...
@Test
public void testFunction2() throws Exception {
Function2<Integer, Integer, Integer> add =
new Function2<Integer, Integer, Integer>() {
@Override
public Integer evaluate(
final Function2<Integer, Integer, Integer> self,
final Integer i1, final Integer i2) {
return i1 + i2;
}
};
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
PrintStream outputStream = new PrintStream(bytes);
Function2<Integer, Integer, Integer> loggingAdd =
untupled(addLogging(tupled(add), outputStream, "add"));
loggingAdd.apply(4, 5).get();
assertEquals("add called with (4,5)\n" +
"add returned 9\n",
new String(bytes.toByteArray()));
}
}

I wanted to rewrite memoization in terms of tupling and untupling functions, but my previous work at implementing all functions of more than one parameter via currying gets in the way (along with the whole notion of needing a recursion target). That said, memoization is a bit of an anomaly in terms of the complexity of decorating functionality. Most other cases I can think of are like this logging example, where tupling and untupling "just work", and you don't need to create your own curried chain of LoggingFunctionN equivalents.

Conclusions

The major takeaway from this post is the fact that tuples provide another (arguably simpler) way of looking at functions of more than one parameter as being equivalent to Function1 (compared to currying). Tuples can also provide a convenient way of returning more than one value without having to create a new class to hold the composite value.

To avoid writing TupleN classes by hand, I created a TupleGenerator class. You give it a source directory, a target package, and maximum number of tuple dimensions (Scala supports up to 22), and it will generate for each dimension a reasonable TupleN class with hashCode, equals, and toString defined in terms of the underlying elements. It also creates the TupleUtils utility class with overloaded static methods for tuple (to instantiate new tuples), tupled (to turn a FunctionN<R, T1, ..., TN>> into a Function1<R, TupleN<T1, ..., TN>>), and untupled (to turn a Function1<R, TupleN<T1, ..., TN>> into a FunctionN<R, T1, ..., TN>> )

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.