Sorting algorithms in Scala

I have been studying Scala, and I’m reviewing basic algorithms too. I thought it would be nice to mix the two of them, so here is an implementation of three sorting algorithms in Scala.

Here is the code straight up, and some comments after.

import scala.annotation.tailrec

def stripChars(s: String, ch: String) = s filterNot (ch contains _)
def cleanWord(word: String) = stripChars(word, ",.;:/=*[]()<>#@!'`ยด").toLowerCase

object ScalaSorts extends App {

  // Quicksort
  def quicksort[A <% Ordered[A]](input: Stream[A]): Stream[A] =
    input match {
      case Stream() => Stream()
      case pivot #:: tail => {
        val sublists = tail.partition(_ < pivot)
        quicksort(sublists._1) #:::
        pivot #::

  // Insertion sort
  def insertionsort[A <% Ordered[A]](
    input: Stream[A], sorted: Stream[A]=Stream()
  ): Stream[A] =
    input match {
      case head #:: tail =>
        insertionsort(tail, stream_insert(head, sorted))
      case Stream() => sorted

  def stream_insert[A <% Ordered[A]](
    value: A, input: Stream[A], acc: Stream[A]=Stream()
  ): Stream[A] =
    input match {
      case head #:: tail if head < value =>
        stream_insert(value, tail, acc :+ head)
      case _ => acc #::: value #:: input

  // Mergesort
  def mergesort[A <% Ordered[A]](input: List[A], j: Int=1): List[A] =
    if (j < input.size) {
      def pp = list_chunk_pairs_stream(j, input).map(ll => sorted_merge(ll._1, ll._2))
      mergesort(pp.toList.flatten, j * 2)
    else input

  def list_chunk_pairs_stream[A](chunk_size: Int, input: List[A]): Stream[(List[A], List[A])] =
    pegapares(list_chunkenize(chunk_size, input), List())

  def pegapares[A](ss: Stream[A], z: A): Stream[(A, A)] =
    ss match {
      case Stream() => Stream()
      case a #:: Stream() => Stream((a, z))
      case a #:: b #:: tail => (a, b) #:: pegapares(tail, z)

  def list_chunkenize[A](chunk_size: Int, input: List[A]): Stream[List[A]] =
    input match {
      case Nil => Stream()
      case _ => input.slice(0, chunk_size) #:: list_chunkenize(chunk_size, input.drop(chunk_size))

  def sorted_merge[A <% Ordered[A]](la: List[A], lb: List[A], acc: List[A]=Nil): List[A] =
    (la, lb) match {
      case (ha :: ta, hb :: tb) if (ha < hb) =>
        sorted_merge(ta, lb, acc :+ ha)
      case (ha :: ta, hb :: tb) if (ha >= hb) =>
        sorted_merge(la, tb, acc :+ hb)
      case (_, Nil) => acc ::: la
      case (Nil, _) => acc ::: lb

  // Main body
  val filename = args(0)
  val fileStream = Source.fromFile(filename).getLines.toStream

  def wordsStream = fileStream.flatMap(_ split &quot; &quot;).map(cleanWord).filter(_ != &quot;&quot;)

  val quicksort_solution = quicksort(wordsStream).toList
  val insertion_solution = insertionsort(wordsStream).toList
  val merge_solution = mergesort(wordsStream.toList).toList
  val solution = wordsStream.toList.sorted



Quicksort: Perhaps the most important algorithm. I managed to write it using a single pattern-matching, and two recursive calls. So it’s not tail-recursive, but shouldn’t be a problem. It’s interesting to note that the algorithm actually sweeps the input only once, with multiple streams in the nested calls taking care of the re-ordering. There would be no intermediary results stored if I could use some kind of Iterator instead of Stream. I don’t know why Scala doesn’t have an Iterator type that I can use in pattern-matching, I hope I just didn’t learn about it yet.

The use of “partition” is probably the most interesting thing about this implementation. It avoids the separate declaration of two different Streams filtering the input twice, instead creating the two necessary sub-lists in a single sweep. (Or at least it looks like that.)

Insertion sort: Perhaps the most important of the inefficient sorting algorithms. I used two tail-recursive functions here, one of them takes care of “inserting” a value in a list at the right place. The higher function just picks the values from the input and “insert” them at the output list one at a time, starting with an empty list. Tail recursion is important here because the number of calls is proportional to the size of the list, and not log(N) like in Quicksort.

Merge-sort: For this I used two tail-recursive functions, and a couple of stream-processing ones. The lowest-level function merges two lists picking the smallest elements first. It’s the fundamental operation from merge-sort. Two other functions are responsible for splitting an input list it in sub-lists with the specified chunk size, and returning them in pairs. It iterates on the input creating an iterator of iterator pairs… It’s pretty complicated, but it’s exactly what we need in merge-sort. The final, most external function is the one that calls the merger function to the sublist pairs, and them join them together as the input for a recursive call, with the sublist size increasing according to the necessary rule, double the size at each iteration. This function doesn’t have to be tail-recursive, but it came out like that naturally.

That was fun! I think I managed to really explore the “functional thinking”, and learn a thing or two about Scala. But I’m still a bit confused about when I should use Lists, Streams or something else. But I definitely think it’s way cooler than writing a procedural in-place Quicksort or insertion sort, for instance, where we having to take care of element indices and stuff, and be constantly afraid of overwriting something I shouldn’t, etc. I also think that even if I could try to write code like this in Python, for instance, it wouldn’t look that cool. The pattern matching is probably the main reason for that.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s