Quicksort in C

Today I wrote this implementation of binary Quicksort in C. I tested this code in my Intel Q9450 2.666 GHz CPU, and this graphic shows the mean calculation time as a function of the array size, in loglog scales. As we can see, it is pretty much linear — the curve inclination is 45 degrees.

The regular Quicksort is recursive comparison sort. At each iteration a pivot number is chosen from the data, and the list is split in two, each half of the list contains values that are smaller or larger than the pivot. This procedure that splits the list is then called again for each of the two halves. The choice of the pivot number is a big deal, and choosing a bad procedure can damage the performance of Quicksort. It usually archives O(n log n), but it can take O(n^2) if the pivot choices are bad.

In our case we are using Quicksort as a kind of radix sort. This is an integer sort algorithm, and its performance is O(nN), here N is the number of bits of the integers being sorted. This is one important modification, because it means the complexity can be even better than regular Quicksort, and the worst case is also much better. The modification means that the pivots are the powers of two, starting from the largest one, what actually  means we are looking at the bits of the numbers, from the most to the least significant bit.

It is not very difficult to understand how to implement Quicksort, as any other classic sorting algorithm. It is only a little more difficult to really grok why they work and why they have nice performances… It is also a little more difficult to really implement the thing in C or whatever, paying attention to how you consume machine resources. One thing is to say that “It’s a recursive algorithm that goes on analyzing sublists”, another thing is to actually implement that analyzing an array of integers in C, using pointers. And one thing that is very desirable in practice is to have an “in-place” implementation of the algorithm where we do not declare new arrays, and only store the necessary data in the original input array itself.

This is why I wrote this implementation today, to grok the practical aspect. So here is one way to do it. The arguments for the procedure call are a pointer to the beginning of a list, a pointer to the end, and the pivot. We are going to sweep the list, and move the smaller values to the first half. That means we need another pointer that will point to the first item of the second half of the list. Every time we find a small value in the sweep we replace the item  from this pointer, and increment it. So the small value is moved to the end of the first list, that first value from the second list moves to somewhere in the middle of it, and the second item from the second list becomes the first.

Once the weeping of the list is finished, the pointer is noe the end of the first list, and the beginning of the second list, and then we just make the two new separate calls to the procedure using the three pointers — the original beginning and ending pointers, and the third one we found — and also the new pivot value, moving it to the next less significant bit.

These recursive calls will grow the call stack in the number of bits, and the calls will proceed in a kind of in-order tree traversal. As we move deep into the recursive calls, this third pointer we found remain in the memory of each instance, and so the code “knows” what point of the list is being analyzed and what bit we are looking at. Because we are sweeping the list again and again in each recursive call, we have the “n” in the complexity. And because these sweeps happen for each bit, we have the “N”.

Here is the resulting code I implemented, reading random values from /dev/urandom, etc. One small detail is that I also sweep the list in the beginning until I find the first larger value, and this is where the actual swapping of values begins. (OBS: I did make some tests to be sure the time spent to read the arrays was not considered or relevant.)

// Binary Quicksort implementation in C.
// by Nicolau Werneck<nwerneck@gmail.com>, 2012-09-28

#include <stdio.h>
#include <stdlib.h>

void print_array(unsigned int* ar_ini, unsigned int size) {
  unsigned int k;
  for (k=0; k<size; k++) {
    printf("%05u % 16u\n", k, ar_ini[k]);
  }
}

unsigned int* quicksort(unsigned int* ar_ini, unsigned int* ar_end, unsigned int pivot) {

  // Return if there are no more bits to test or if the list is empty.
  if (pivot == 0 || ar_ini == ar_end){
    return;
  }

  unsigned int* k; // Loop variable
  unsigned int tmp; // Temp variable to switch values

  // This variable points to the first item that is larger or equal
  // than the pivot.
  unsigned int* sep;

  // Look for the first item that is larger or equal than the pivot.
  for (k = ar_ini; k < ar_end && !(*k & pivot); k++);
  sep = k;

  // Now iterate through the remaining list items, and move the
  // smaller ones to the first half of the list.
  for (; k<ar_end; k++) {
    // If the current item is smaller than the pivot,
    if (!(*k & pivot)) {
      // switch the current small value with head of second half
      tmp = *sep;
      *sep = *k;
      *k = tmp;
      // and increment head of the second half.
      sep++;
    }
  }

  // Make the recursive calls for the two sublists, and the next less
  // significant bit.
  quicksort(ar_ini, sep, pivot >> 1);
  quicksort(sep, ar_end, pivot >> 1);
}

int main(int argc, char* argv[]) {
  
  // Return if there is no argument.
  if (argc < 2) {
    return 1;
  }

  // Read array size from input argument.
  int size = atoi(argv[1]);

  // Return if conversion of first argument was not successful.
  if (size == 0) {
    return 1;
  }

  // Allocate memory for the array to be sorted
  unsigned int* arr = (unsigned int*) malloc(sizeof(int)*size);

  // Initialize array by reading random stuff from /dev/urandom.
  FILE * pf;
  pf = fopen ("/dev/urandom","r");
  if (pf == NULL)
  {
    return 1;
  }
  fread(arr, sizeof(unsigned int), size, pf);
  fclose (pf);

  ///////////////////////////////////////////////////////////////////
  // Where the real stuff begins...
  //

  // Print input array.
  /* print_array(arr, size); */
  /* printf("\nNow sorting...\n\n"); */

  // Initialize pivot variable.
  unsigned int pivot = 1 << (sizeof(int)*8-1);

  // Make the initial call to the quicksort procedure.
  quicksort(arr, arr+size, pivot);

  // Print array again, this time sorted.
  /* print_array(arr, size); */

  return 0;
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s