There are several problems with this implementation that I plan to address in a future post:
- The remove method creates a new node at each level, even if the element to remove does not exist.
- The use of null to indicate the lack of a left or right child opens us up to the potential of NullPointerExceptions and wastes space from those fields. (In particular, in a balanced binary tree, half the elements will be leaf nodes.)
- Currently, there is no way to create an empty tree, since every node has a value. If you look at the ImmutableTreeRunner implementation above, I had to generate my first random value before I could create the ImmutableBinaryTree.
This is that future post. I'll present a much cleaner, more memory-efficient implementation that solves these problems. That said, this is still not a good binary tree implementation to use. I'm not implementing any balancing, so it's generally going to be quite slow. It's just a more elegant unusable solution.
The first problem was largely laziness on my part, I suppose. (It would have been possible to better handle remove in the old implementation.) The other two problems can be addressed by accepting that not all immutable binary trees should be created equal, and establishing a class hierarchy:
That may look overly complicated, but all of the logic will actually degenerate into handling empty trees and handling non-empty trees. The
NonEmptyTree
subclasses only add fields and provide one-line implementations for NonEmptyTree
's abstract methods.Note that I gave the
EmptyTree
rounded edges in the diagram to distinguish it -- this will be implemented as a singleton. Conceptually, there is only one empty immutable tree). Also, this makes for some simpler code, as we can use ==
to see if a child is the empty tree.First, let's look at the implementation of the
ImmutableBinaryTree
abstract base class:We specialize the declared return type of
add
and remove
, while still satisfying the ImmutableSet
interface I put up on Monday. We declare an internal addAll
helper method (though this could also be added to the ImmutableSet
interface). Finally, we create two factory methods for immutable binary trees -- one that returns the empty tree and another that constructs a series of trees until all the given elements have been added.Next we implement all of the abstract methods for the empty tree. These are all one-liners:
We also store the (untyped) singleton
EmptyTree
instance and provide a (typed) static accessor for it. In this case, working without the T
type is perfectly legitimate -- there is no state influenced by T
in instances of this class. That said, the Java compiler apparently can't tell that this is safe (which is fair, since usually it's not safe). So, I've used the @SuppressWarnings
annotations to reassure the compiler that I know what I'm doing. (Note that a similar approach is used at least in the Apache Harmony implementation of Collections.emptyList()
, which is solving almost exactly the same problem.)Before thinking about the behaviour of non-empty trees, I think it makes sense to think about their state. Firstly, all non-empty trees represent a node with a value. So,
NonEmptyTree
will have a field of type T
for that value. Additionally, non-empty trees have 0, 1, or 2 children. If they have 1 child, it is either a left or right child. This motivates the NonEmptyTree
hierarchy presented above. By isolating the state (that is, the fields) into subclasses, we ensure that memory is not wasted on fields that are unused (that is, are permanently null
). It also makes it easier to reason about the state being used in each subclass.Fortunately, by ensuring that all of the subclasses provide a consistent exposure of state, that is, they supply a left and right child on demand (where they may be the singleton
EmptyTree
), we can safely implement all of the logic in NonEmptyTree
:Why did I declare
emptyTree
local variables to capture the empty tree instance before checking equality against passed parameters? It turns out that the Java type inferenceer (or at least the one used in my local version of Eclipse on a Mac) wouldn't allow me to check e.g. tree == EmptyTree.instance()
. It's possible that I messed up with the covariance of the type parameter. Regardless, adding the local variable was an easy of making my intentions clear to the type inferencer.The general flow is as follows:
add
is a special case ofaddAll
, unless the current node already holds the value being added.remove
either removes the current node (at which point, we merge the left and right children), or replaces the relevant child with the result of the remove operation applied to it.addAll
either merges the newly added tree's children with this node's children (if they have the same root value), or replaces the relevant child after applying the addAll operation to it.replaceChildren
runs through the five relevant cases for replacing the children of a given node:- If both children are equal to the current value, then nothing has changed. Return
this
. - If both new children are the
EmptyTree
singleton (and weren't before, or they would have been caught by the previous check), then we return a newLeaf
. - If one new child is the
EmptyTree
, we return a newLeftBranch
orRightBranch
. - Otherwise, both children are non-empty so we return a new
DualBranch
, which is guaranteed to be different from the current node (by the first check).
- If both children are equal to the current value, then nothing has changed. Return
- The
toList
andcontains
methods are fairly self-explanatory.
Finally, we have all of the specializations of
NonEmptyTree
, which are a series of constructors and one-line method implementations:There are a couple of points that I found interesting working on this code:
- There is not a single
null
check in this code. Instead, that role has been absorbed byEmptyTree
, which I like to think of as a sort of "type-safe"null
. Instead of checking fornull
and deciding how to react in every situation, I've defined a "null-like" element, and specified how it should implement my interface in a consistent way. I did still check forEmptyTree
inreplaceChildren
, to ensure that the most efficient concreteNonEmptyTree
is returned. I could have just returned a newDualBranch
, and the correctness of the algorithms would still hold. - A Java object, as I understand it, occupies enough memory to hold a reference to its runtime class, and the space occupied by its fields. Since
EmptyTree
is a singleton, its memory consumption doesn't really matter, but should effectively be this single reference to its class.Leaf
is a concrete class with a reference to its class and a reference to the contained value of the node. On the other extreme,DualBranch
represents four references to: the class, the value, a left child, and a right child. Thanks to the immutable nature of this code, we are required to return new objects, which don't necessarily need to be of the same type as our current (runtime) class. We can make those objects as efficient as possible. If I had implemented a proper balancing strategy, half the nodes would beLeaf
s, and would occupy roughly half the memory of correspondingDualBranch
es (disregarding the space occupied by the values themselves, aside from their references).
While there's nothing new or clever in these posts, I've found them to be a good learning exercise, and have helped me reinforce some functional programming ideas. In particular, while the whole "type-safe
null
" idea expresseed by EmtpyTree
is pretty standard fare in Haskell, it's still pretty novel to me as a (mostly) Java developer. Also, the particular refactoring in this post was motivated by the elegant simplicity of Scala's immutable List
, which I feel that I now understand more clearly.Of course, I did need to add those two
@SuppressWarnings
annotations in EmptyTree
, which I believe Scala avoids by having Nil
(the singleton empty list) have a type parameter of Nothing
(which inherits from everything -- if this sounds weird but intriguing, read up on some Scala, it actually all makes sense). I don't believe it's possible to do the same in Java, so I'm happy to at least be consistent with Apache Harmony.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.