Quicksort in FP is a benign lazy binary tree traversal

Quicksort is a staple of coding interviews and language tutorials. A classic and smart algorithm, that can really test and teach you.

It seems there is some controversy out there about wether you can properly implement Quicksort in functional languages, with immutable data structures. Take for instance this Stackoverflow question, where you may also find this link to an Alexandrescu article.


My personal opinion is that this is a bit pointless. It is of course important to understand the difference between the mutable version and what you’ll have to do with immutable data structures. But to say the immutable version is not the “true” algorithm sounds crazy to me. If it has a double recursion partitioning a list by a pivot, this looks enough like Quicksot to me. Algorithms are more like the “soul” of whatever instruction sequence runs in your computer in the end, flipping around the transistor states. They live in a more mathematical universe…

The reason I wanted to write this article is to take note of something I realized once while thinking about all this. “Immutable Quicksort”, in FP, demonstrates that this algorithm is actually very close to building a binary tree out of the input data, and then traversing it in-order. You don’t see that mentioned often, but there are some references, like this blog.

Consider the following very pleasant implementation of Quicksort in Scala

Procedural programmers not familiar with FP look at this and get a little crazy about the fact we are “appending” stuff in the last call. They think the partition must be doing lots of evil malloc() underneath, and then the recursive calls are duplicating all the data needlessly, what would result in a n×log(n) memory occupancy, like we had duplicated the original input at each pass. Some people may even think you have to duplicate the data a second time when you make the recursive calls…

What should be made clear, though, is that FP people are hippies, and they find strength in lazyness. This is not clear in Haskell code for a beginner, but the Haskell list is actually lazy. It is more like Scala’s Stream used here, what is not only explicit in the type annotations but also in the Stream-specific operators #:: and #:::.

That means that by no means you are performing those “costly appends” to the end of a linked list in that code. Unless you used linked lists, but why would you?? That would be as much of a bad idea in Scala and Haskell as it is in C. You gotta know your data structures whatever the language you are working with.

Laziness means when we sort the input we don’t really have to wait until we finish reading and rewriting the whole vector a bunch of times like in C. What happens is that you first pick the pivot (the first element as usual, although known to be sub-optimal). After you select the pivot you pick the remaining list elements (the tail), and apply the partition function, which produces two new streams. The first stream has the values up to the pivot, and the second stream has elements above the pivot. You don’t have the elements in your hands yet, but you can read them from these streams. They will be calculated on the go.

What happens when you run this code is that as you start reading values from the sorted stream, only then we’ll start looking for the pivots, and then for the elements of the two streams. More specifically, we’ll look for the first element from the first stream, because this is the second pivot. What will happen then for the first element is that we’ll construct in memory this list of pivots, and when we finally reach the end of the input list then the last pivot will pop out.

While at this point we have traversed the whole input once, what we have in memory are only pivots that are smaller or equal than previous ones. This is because larger values have been ignored, delegated to the “second” stream at each call. As we move on, and the “first” streams end, we can then process the “second” ones, and effectively traverse in-order a binary tree containing all the input data.

What should be made clear here is that there is no binary tree really, we don’t explicitly copy the whole data into some intermediate tree… The only use of memory is the overhead necessary for data structures such as the intermediate streams returned by the partition, and the pivots. This only takes O(log(n)) memory, since it happens inside each recursive call.

Does that sound costly compared to the in-place Quicksort?… It really isn’t! Remember this: Quicksort is actually a double-recursive algorithm, there is no tail-call optimization possible. When you do all those recursive calls you actually have to store something every time, which are the pointers to the “second list” from each call.

The FP version of Quicksort thus only stores in memory the necessary data to be able to tell you what is the “next” sorted value, while it traverses the whole input about O(log(n)) times, just like the mutable version. The difference is that in the mutable version values are getting written again and again in memory, you actually have to “rewrite” you whole data vector, dats jumping around here and there like crazy, while in the FP version all you have are temporary information that we may even avoid ever having to write at the heap.

So the “true” Quicksort is like a dumb idiot that cannot think very well so he actually has to separate some stuff physically, and in stages, in order to see what he has to deliver you, instead of a more clever individual who might just take some notes of some data to follow a criterion to decide what is the next item he’ll provide you. What looks really more elegant here?

Do notice this: the FP Quicksort can actually be implemented just with iterators. It does not mean you can read very big data like with Mergesort, but it’s a quite interesting fact that is not very clear in the traditional mutable, passé version of Quicksort.

To provide some more insight, consider this verbose version of Quicksort made with Lists.

And the output for [4,2,1,3,6,5,7]:

List(1, 2, 3)
List(5, 6, 7)
List(1, 2, 3, 4, 5, 6, 7)

Now the output with streams:

Stream(1, ?)
Stream(1, ?)
Stream(1, ?)
Stream(3, ?)
Stream(5, ?)
Stream(5, ?)
Stream(7, ?)

That demonstrates the different execution order, and the fact the streams are not fully evaluated in these intermediate passes. The “?” printed in Scala indicates that the tail of the stream is still unknown. It is still a “thunk” to be evaluated, to loan the Haskell term.

I hope this post can help dispel some myths about Quicksort and about FP in general that many procedural coders still have out there. Expand your minds people, there is a lot going on in the FP world! Even if you don’t actually pick up these languages, it is important to have a better understanding of tools such as streams. It is perfectly possible to make a Quicksort implementation in C++ using iterators and even garbage collection. People don’t do it just because, who knows. This really can be useful sometimes, I think every programmer can really benefit from studying these techniques instead of closing their minds immediately to the sight of a “suspicious” list append or whatever.

You need the Option class

Having studied a lot of Scala, and now moving back to C++, there are many things that you find in code “in the wild” that I used to find acceptable, but today I find a real eye sore. Many of those are things that experienced C++ programmers can easily agree to, such as avoiding functions that just modify object state, and try otherwise to write mostly functions that take all information it needs explicitly as parameters, and puts all its outputs in the actual return value. This is greatly praised in functional programming. But another eye-opener to me was to learn about monads, and specifically about the Option class.

Monad theory is very interesting, but you don’t really need any of that to get the Option class. This class can be understood as a container that can hold a single value. It is either empty, or it has a value that you can read.

It becomes extremely handy as a way to dead with computations that can fail. Lazy or inexperienced programmers, or even yourself, can deal with that sometimes by just returning a “Null” or “None” instead of a value in your program. This is specially common in dynamic languages. Or even worse, you might decide to raise and exception, what is something awful… Or return “-1” also instead of a natural. Ideally what you should do is to return an Option<int> containing either a value, or the actual object from the class that represents the absence of a “valid” value.

C++ programmers do that a lot using pointers. You set a pointer to NULL if the object doesn’t exist, otherwise you can work with it… Here is one example of actual code with anonymized variables that does that:

bool bX = true;
    bX = false;
if(bX) {...}

It is an interesting example because it deals with a pointer to Boolean, that we would like to see changed to an Option<bool>, probably. The code is weird because apart from the disguised Option logic, it is hard to imagine why to have this pointer to bool instead of just taking bool parameters. And then there is a new bool declared to store the result of the expression

pbY && *pbY

or explicityly

pbY != NULL && *pbY == true

Why didn’t that turn into a bool earlier on? We can’t really say, we are thinking too locally maybe. But if this extra information about the Boolean being “valid” or not is really necessary here, then there is no other way: either you have two Booleans, one to mark “valid” and the other with the value, or you have something like this based on pointers, or finally you explicitly use an Option<bool> object.

One interesting aspect from Option class is that you can chain these computations that can fail. You don’t think too much about this kind of stuff when you are working with pointers like this, but it becomes more clear when you move into a proper class… Here is another example of a situation that happens sometimes in code, that doesn’t actually have to do with “failure”, but that might be improved with an Option modeling. Original code is:

bool wasOK;
if (testA() && testB()) {
  wasOK = processA();
} else {
  wasOK = processB();
  if (!wasOK)
    wasOK = processA();

The program is returning a single boolean to indicate whether the “process” was successful. The order of execution does not reveal something important, though. processB actually happens more frequently than processA. My nomenclature also didn’t help… It is easier to understand the code if I write in plain english:

“If we can run process B, do it. If we can’t run it, or if we can but it failed, then run processA.”

A clearer code would be something like:

wasOK = !(testA() && testB()) && processB() || processA()

Here we exploit the “short-circuit” feature from logic operators, by the way, so processA doesn’t run if processB is successful, and processB doesn’t run if not possible.

But really, what would be even nicer was if we had

Option<bool> processBopt() {
  return (testA() && testB()) ? None : Option<int>(processB());

Option<bool> processAopt() {
  return Option<int>(processA());

And now we can do, in Scala lingo:

processBopt() orElse processAopt()

And much better yet would be to return the actual result from the process, and not just a signal about whether it was successful and the results are sitting somewhere “you should know where”…

I have no idea how the “orElse” operator is implemented in existing C++ classes. But in Scala it looks great, and it is one of the main reasons I love to use Scala today. It is just much better to deal with this kind of slightly-more-complicated logic that you find in real life. Actually working with the Option class in C++ will probably never be as good as it is in Scala – apparently Boost has it, but I’m not sure how to use it. But I do believe I would still find a way to avoid this kind of pointer-based code I have shown here if I had written this program, and would have stick to some of the alternatives. Once a programmer finally learns about this Option “design pattern”, everything becomes easier if you stick to it when the situation requires it.

Primes below N, the sieve of Eratosthenes in Scala

I have been practicing coding tests again. One nice exercise I saw the other day was a sample from a HackerRank test: find the number of prime numbers less than N. Or tell how many primes are below a given limit. You can easily change the problem to pick the list of the said primes.

It is such a classic problem that I decided to dedicate myself to write a nice solution, because I couldn’t find a really good “functional style” implementation in Scala, strictly with immutable collections, and using `foldLeft` if possible. And there is also an important detail about this problem that is worth talking about. The most well known solution is not actually the “best” one, and the difference isn’t even that that big. So it really is a subject we should all be studying a little bit more carefully.

This article discusses four different solutions to the problem. First is the most usual solution, the so-called unfaithful sieve. Then we will show the most “cult” solution to this problem, the sieve of Eratosthenes. These both can be written recursively, but the first can actually be implemented with a fold. We will then show a quite different implementation of the sieve, using a map to store known composite numbers, that I came up with myself, even though I doubt is new. We finish by looking at implementations of the two algorithms using mutable collections.

Continue reading

A Scala JSON parsing benchmark experiment

I have just published a project on GitHub with an experiment to compare the speed from different Scala libraries to de-serialize JSON data into case classes. I have been using JSON reading and writing not only for web, but also data analysis in general, and I was afraid some of the libraries might still have performance issues, what apparently was usual in the past couple of years.

The experiment has showed that all libraries are pretty mature today, with no obvious winner or loser in the experiment. The most interesting is that apparently the performance might be very dependent on the specific model you are reading, what indicates there may be really little room for speed improvements left.

Move to my GitHub project to look at the code and see the results!