Quicksort in FP is a benign lazy binary tree traversal

Quicksort is a staple of coding interviews and language tutorials. A classic and smart algorithm, that can really test and teach you.

It seems there is some controversy out there about wether you can properly implement Quicksort in functional languages, with immutable data structures. Take for instance this Stackoverflow question, where you may also find this link to an Alexandrescu article.


My personal opinion is that this is a bit pointless. It is of course important to understand the difference between the mutable version and what you’ll have to do with immutable data structures. But to say the immutable version is not the “true” algorithm sounds crazy to me. If it has a double recursion partitioning a list by a pivot, this looks enough like Quicksot to me. Algorithms are more like the “soul” of whatever instruction sequence runs in your computer in the end, flipping around the transistor states. They live in a more mathematical universe…

The reason I wanted to write this article is to take note of something I realized once while thinking about all this. “Immutable Quicksort”, in FP, demonstrates that this algorithm is actually very close to building a binary tree out of the input data, and then traversing it in-order. You don’t see that mentioned often, but there are some references, like this blog.

Consider the following very pleasant implementation of Quicksort in Scala

Procedural programmers not familiar with FP look at this and get a little crazy about the fact we are “appending” stuff in the last call. They think the partition must be doing lots of evil malloc() underneath, and then the recursive calls are duplicating all the data needlessly, what would result in a n×log(n) memory occupancy, like we had duplicated the original input at each pass. Some people may even think you have to duplicate the data a second time when you make the recursive calls…

What should be made clear, though, is that FP people are hippies, and they find strength in lazyness. This is not clear in Haskell code for a beginner, but the Haskell list is actually lazy. It is more like Scala’s Stream used here, what is not only explicit in the type annotations but also in the Stream-specific operators #:: and #:::.

That means that by no means you are performing those “costly appends” to the end of a linked list in that code. Unless you used linked lists, but why would you?? That would be as much of a bad idea in Scala and Haskell as it is in C. You gotta know your data structures whatever the language you are working with.

Laziness means when we sort the input we don’t really have to wait until we finish reading and rewriting the whole vector a bunch of times like in C. What happens is that you first pick the pivot (the first element as usual, although known to be sub-optimal). After you select the pivot you pick the remaining list elements (the tail), and apply the partition function, which produces two new streams. The first stream has the values up to the pivot, and the second stream has elements above the pivot. You don’t have the elements in your hands yet, but you can read them from these streams. They will be calculated on the go.

What happens when you run this code is that as you start reading values from the sorted stream, only then we’ll start looking for the pivots, and then for the elements of the two streams. More specifically, we’ll look for the first element from the first stream, because this is the second pivot. What will happen then for the first element is that we’ll construct in memory this list of pivots, and when we finally reach the end of the input list then the last pivot will pop out.

While at this point we have traversed the whole input once, what we have in memory are only pivots that are smaller or equal than previous ones. This is because larger values have been ignored, delegated to the “second” stream at each call. As we move on, and the “first” streams end, we can then process the “second” ones, and effectively traverse in-order a binary tree containing all the input data.

What should be made clear here is that there is no binary tree really, we don’t explicitly copy the whole data into some intermediate tree… The only use of memory is the overhead necessary for data structures such as the intermediate streams returned by the partition, and the pivots. This only takes O(log(n)) memory, since it happens inside each recursive call.

Does that sound costly compared to the in-place Quicksort?… It really isn’t! Remember this: Quicksort is actually a double-recursive algorithm, there is no tail-call optimization possible. When you do all those recursive calls you actually have to store something every time, which are the pointers to the “second list” from each call.

The FP version of Quicksort thus only stores in memory the necessary data to be able to tell you what is the “next” sorted value, while it traverses the whole input about O(log(n)) times, just like the mutable version. The difference is that in the mutable version values are getting written again and again in memory, you actually have to “rewrite” you whole data vector, dats jumping around here and there like crazy, while in the FP version all you have are temporary information that we may even avoid ever having to write at the heap.

So the “true” Quicksort is like a dumb idiot that cannot think very well so he actually has to separate some stuff physically, and in stages, in order to see what he has to deliver you, instead of a more clever individual who might just take some notes of some data to follow a criterion to decide what is the next item he’ll provide you. What looks really more elegant here?

Do notice this: the FP Quicksort can actually be implemented just with iterators. It does not mean you can read very big data like with Mergesort, but it’s a quite interesting fact that is not very clear in the traditional mutable, passé version of Quicksort.

To provide some more insight, consider this verbose version of Quicksort made with Lists.

And the output for [4,2,1,3,6,5,7]:

List(1, 2, 3)
List(5, 6, 7)
List(1, 2, 3, 4, 5, 6, 7)

Now the output with streams:

Stream(1, ?)
Stream(1, ?)
Stream(1, ?)
Stream(3, ?)
Stream(5, ?)
Stream(5, ?)
Stream(7, ?)

That demonstrates the different execution order, and the fact the streams are not fully evaluated in these intermediate passes. The “?” printed in Scala indicates that the tail of the stream is still unknown. It is still a “thunk” to be evaluated, to loan the Haskell term.

I hope this post can help dispel some myths about Quicksort and about FP in general that many procedural coders still have out there. Expand your minds people, there is a lot going on in the FP world! Even if you don’t actually pick up these languages, it is important to have a better understanding of tools such as streams. It is perfectly possible to make a Quicksort implementation in C++ using iterators and even garbage collection. People don’t do it just because, who knows. This really can be useful sometimes, I think every programmer can really benefit from studying these techniques instead of closing their minds immediately to the sight of a “suspicious” list append or whatever.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s