Rusty Tree
The idea of this post came from a conversation with a friend about the Advent of Code: which language would be interesting to try next time ?
I was hesitating between going for a known one (like Scala3 or Haskell) where I could gain a deeper understanding of it or try something new to me like Go.
My friend pointed out that Go, even if new in comparison to Haskell or Scala, didn’t include any new concepts. And on top of that, the community was still discussing how to add generics. He suggested I look at Rust instead. And I did !
Before committing to use Rust for the Advent of code I thought I would try it with a known example and see how it fares. Quite predictably I decided to see my tree example as I already know quite a few ways to implement it in Scala3 and various versions of Java.
In a nutshell, the goal of this algorithm is to add a unique integer identifier to the nodes of a binary tree.
The full code of the project is here.
Modelling the tree
The first design choice is how to model the tree. Rust does borrow a lot from functional programming languages so it has great support for algebraic data types so I decided to use one:
#[derive(Debug)]
pub enum Node<T> {
Leaf,
Branch(Rc<T>, Rc<Node<T>>, Rc<Node<T>>)
}
Immediately when designing the data structure, the main difference between Rust and other recent languages is apparent: Rust memory management is manual, it does not have a garbage collector to manage the heap.
This is a conscious design choice for the language as having a garbage collector makes it more difficult to have predictable performance.
It is interesting that in Go the designers tried to simplify the implementation of the garbage collector to avoid this issues but that introduces other problems.
So that means I have to do a couple of choices for the tree data structure:
- First, regarding the
value
on the branch I have to choose between an actual value or a reference.
Using an actual value would have to constraint the generic type T
to implement a ‘Copy’ type class. Type classes in Rust are very similar to their Scala or Haskell counterparts.
So I decide to use a reference instead and here there is quite a choice:
- the default reference is ‘scoped’ and that makes it difficult to construct a new tree (with ids) reusing the values of the previous tree so I didn’t use it. This is clearly a great feature of Rust to use the compiler help with the memory management, this is very well thought out and the error messages are fairly clear.
- so instead I look at how to allocate memory on the head. There are two type that can be used here, Box and Rc. Box only allow one reference which would prevent us from reusing values so I decide to use ‘Rc’ instead. ‘Rc’ stands for Reference count.
- As the data structure is recursive I also use a reference to store the left and right branches. To be able to reuse nodes I also use a ‘Rc’ type here.
Finally, to be able to display the tree I could have implemented the Rust Display trait but it turns out to be easier to simply add the [derive[Debug]]
statement as it adds a method to display the content with no code. Also using the Display trait I would need to constraint the generic type T
to implement Display.
Implementing the algorithm
As rust has pattern matching on the Enum type, the implementation is very straightforward and has a very functional feel:
pub fn add_id<T> (tree: Rc<Node<T>>, i: isize ) -> (Rc<Node<(Rc<T>, isize)>>, isize) {
match &*tree {
Node::Leaf => (Rc::new(Node::Leaf), i),
Node::Branch(value, left, right) => {
let new_left = add_id(Rc::clone(left), i);
let new_right = add_id(Rc::clone(right), new_left.1+1);
(Rc::new(Node::Branch(Rc::new((Rc::clone(value),new_left.1)), new_left.0, new_right.0)), new_right.1)
}
}
}
The function returns a tuple which members can be accessed with .0
, .1
, etc.
- The first member of the tuple is the tree which has the same structure as the input tree but the leaves now contain a tuple with the first member being the value from the input tree and the second the integer identifier. Here I could have used a named struct but as there are only 2 values a tuple is easy. As a side node, Rust also supports named tuples.
- The second member from the return value is the value of the next available identifier (to allow the function to be called recursively).
This is very similar to the Scala implementation, the main differences are caused by Rust memory management:
- the match is on
&*tree
.
Like in C++, &
returns the reference of a value and *
returns the value referred by a reference.
So why are they here present, why not simply do the match on tree
?
This is because tree
is of type Rc<Node<T>>
so to match on Node::Leaf
which is of type Node< reT>, we need to take the value of tree *tree
.
Value which we need then to ‘borrow’ to use it, hence &*tree
!
That may seem complicated but the compiler here is a good guide to get it right.
- to re-use
left
,right
andvalue
we need to ‘clone’ the references.
Again the compiler would prevent us from reusing them if they were not cloned.
A last point about the function is the use of the type isize
. It contains an integer which corresponds to the architecture of the machine running the program, for example on a 32 bits processor that would be a 32 bits integer and it would be 64 bits on a 64 bits processor.
I used it instead of the known size integers because it is such an oddity ! Only a language targetting low level code would include it. And the designers of Rust have clearly learnt from C !
Monitoring the memory usage
I wanted to make sure I didn’t increment the reference counter more than necessary when cloning the values, so I implemented a function that displays it:
pub fn add_ref_count<T:std::fmt::Debug>(tree: Rc<Node<T>>) -> String {
let s = match &*tree {
Node::Leaf => "".to_string(),
Node::Branch(value, left, right) => format!("([{:?},{:?}],{:?},{:?})",
value, Rc::strong_count(value),
add_ref_count(Rc::clone(&left)),
add_ref_count(Rc::clone(&right)))
};
s.replace("\"","")
}
Rc::strong_count(value)
returns the number of references of value
.
Conclusion
Using Rust was a refreshing experience. The tooling is very easy to learn, the compiler is fast and its messages are clear. The Visual Code support is more than adequate for such a simple example.
The main learning curve was understanding the manual memory management. It is clearly more complicated than with a garbage collector based language, we really need to think of the scope of the values and whether we want to share them.
That probably makes writing common libraries quite challenging, as like in the tree implementation, we need to make assumptions on how the values will be used and shared.
Having C++ experience, how the Rust compiler helps to make sure memory is properly managed is very impressive and well thought. The concept of lifetime for values is particularly innovative.
The combination of modern languages features like algebraic data types, good generic support and predictable memory management makes it a great system language as Linus has also noticed.
To go back to my initial question, would I use it for the advent of code ? I’m not sure.
When I do the advent of code there is some time pressure, not just to get points but also because of life ;-).
So I’d rather use a language where I’m not going to spend time having to think of memory management. Actually that may be an interesting challenge in itself, I may reconsider using Rust then ;-).
For most professional projects, JVM level of performance is enough, especially if we spend a bit of time optimizing the critical parts of the program as usually there are only a few of them. Most of the code is not performance critical so the overhead of having to manually manage the memory for all of it is not worth it.
Still if an opportunity comes where using Rust makes sense, I’d glad to give it a go.