Codophilic: February 2013

Tuesday, February 26, 2013

DataStructures

AlgoMania

Even though computers can perform literally millions of mathematical computations per second, when a problem gets large and complicated, performance can nonetheless be an important consideration. One of the most crucial aspects to how quickly a problem can be solved is how the data is stored in memory.

To illustrate this point, consider going to the local library to find a book about a specific subject matter. Most likely, you will be able to use some kind of electronic reference or, in the worst case, a card catalog, to determine the title and author of the book you want. Since the books are typically shelved by category, and within each category sorted by author's name, it is a fairly straightforward and painless process to then physically select your book from the shelves.

Now, suppose instead you came to the library in search of a particular book, but instead of organized shelves, were greeted with large garbage bags lining both sides of the room, each arbitrarily filled with books that may or may not have anything to do with one another. It would take hours, or even days, to find the book you needed, a comparative eternity. This is how software runs when data is not stored in an efficient format appropriate to the application.

Simple Data Structures
The simplest data structures are primitive variables. They hold a single value, and beyond that, are of limited use. When many related values need to be stored, an array is used. It is assumed that the reader of this article has a solid understanding of variables and arrays.

A somewhat more difficult concept, though equally primitive, are pointers. Pointers, instead of holding an actual value, simply hold a memory address that, in theory, contains some useful piece of data. Most seasoned C++ coders have a solid understanding of how to use pointers, and many of the caveats, while fledgling programmers may find themselves a bit spoiled by more modern "managed" languages which, for better or worse, handle pointers implicitly. Either way, it should suffice to know that pointers "point" somewhere in memory, and do not actually store data themselves.

A less abstract way to think about pointers is in how the human mind remembers (or cannot remember) certain things. Many times, a good engineer may not necessarily know a particular formula/constant/equation, but when asked, they could tell you exactly which reference to check.

Arrays
Arrays are a very simple data structure, and may be thought of as a list of a fixed length. Arrays are nice because of their simplicity, and are well suited for situations where the number of data items is known (or can be programmatically determined). Suppose you need a piece of code to calculate the average of several numbers. An array is a perfect data structure to hold the individual values, since they have no specific order, and the required computations do not require any special handling other than to iterate through all of the values. The other big strength of arrays is that they can be accessed randomly, by index. For instance, if you have an array containing a list of names of students seated in a classroom, where each seat is numbered 1 through n, then studentName[i] is a trivial way to read or store the name of the student in seat i.

An array might also be thought of as a pre-bound pad of paper. It has a fixed number of pages, each page holds information, and is in a predefined location that never changes.

Linked Lists
A linked list is a data structure that can hold an arbitrary number of data items, and can easily change size to add or remove items. A linked list, at its simplest, is a pointer to a data node. Each data node is then composed of data (possibly a record with several data values), and a pointer to the next node. At the end of the list, the pointer is set to null.

By nature of its design, a linked list is great for storing data when the number of items is either unknown, or subject to change. However, it provides no way to access an arbitrary item from the list, short of starting at the beginning and traversing through every node until you reach the one you want. The same is true if you want to insert a new node at a specific location. It is not difficult to see the problem of inefficiency.

A typical linked list implementation would have code that defines a node, and looks something like this:

class ListNode {
   String data;
   ListNode nextNode;
}
ListNode firstNode;

You could then write a method to add new nodes by inserting them at the beginning of the list:

ListNode newNode = new ListNode();
NewNode.nextNode = firstNode;
firstNode = newNode;

Iterating through all of the items in the list is a simple task:

ListNode curNode = firstNode;
while (curNode != null) {
   ProcessData(curNode);
   curNode = curNode.nextNode;
}

A related data structure, the doubly linked list, helps this problem somewhat. The difference from a typical linked list is that the root data structure stores a pointer to both the first and last nodes. Each individual node then has a link to both the previous and next node in the list. This creates a more flexible structure that allows travel in both directions. Even still, however, this is rather limited.

Queues
A queue is a data structure that is best described as "first in, first out". A real world example of a queue is people waiting in line at the bank. As each person enters the bank, he or she is "enqueued" at the back of the line. When a teller becomes available, they are "dequeued" at the front of the line.

BFS means to first explore all states that can be reached in one step, then all states that can be reached in two steps, etc. A queue assists in implementing this solution because it stores a list of all state spaces that have been visited.

A common type of problem might be the shortest path through a maze. Starting with the point of origin, determine all possible locations that can be reached in a single step, and add them to the queue. Then, dequeue a position, and find all locations that can be reached in one more step, and enqueue those new positions. Continue this process until either a path is found, or the queue is empty (in which case there is no path). Whenever a "shortest path" or "least number of moves" is requested, there is a good chance that a BFS, using a queue, will lead to a successful solution.

Most standard libraries, such the Java API, and the .NET framework, provide a Queue class that provides these two basic interfaces for adding and removing items from a queue.

BFS type problems appear frequently on contests; on some problems, successful identification of BFS is simple and immediately, other times it is not so obvious.

A queue implementation may be as simple as an array, and a pointer to the current position within the array. For instance, if you know that you are trying to get from point A to point B on a 50x50 grid, and have determined that the direction you are facing (or any other details) are not relevant, then you know that there are no more than 2,500 "states" to visit. Thus, your queue is programmed like so:

class StateNode {
   int xPos;
   int yPos;
   int moveCount;
}

class MyQueue {
   StateNode[] queueData = new StateNode[2500];
   int queueFront = 0;
   int queueBack = 0;

   void Enqueue(StateNode node) {
      queueData[queueBack] = node;
      queueBack++;
   }

   StateNode Dequeue() {
      StateNode returnValue = null;
      if (queueBack > queueFront) {
      returnValue = queueData[queueFront];
      QueueFront++;
   }
   return returnValue;
   }

   boolean isNotEmpty() {
      return (queueBack > queueFront);
   }
}

Then, the main code of your solution looks something like this. (Note that if our queue runs out of possible states, and we still haven't reached our destination, then it must be impossible to get there, hence we return the typical "-1" value.)

MyQueue queue = new MyQueue();
queue.Enqueue(initialState);
while (queue.isNotEmpty()) {
   StateNode curState = queue.Dequeue();
   if (curState == destState)
return curState.moveCount;
   for (int dir = 0; dir < 3; dir++) {
      if (CanMove(curState, dir))
         queue.Enqueue(MoveState(curState, dir));
   }
}

Stacks
Stacks are, in a sense, the opposite of queues, in that they are described as "last in, first out". The classic example is the pile of plates at the local buffet. The workers can continue to add clean plates to the stack indefinitely, but every time, a visitor will remove from the stack the top plate, which is the last one that was added.

While it may seem that stacks are rarely implemented explicitly, a solid understanding of how they work, and how they are used implicitly, is worthwhile education. Those who have been programming for a while are intimately familiar with the way the stack is used every time a subroutine is called from within a program. Any parameters, and usually any local variables, are allocated out of space on the stack. Then, after the subroutine has finished, the local variables are removed, and the return address is "popped" from the stack, so that program execution can continue where it left off before calling the subroutine.

An understanding of what this implies becomes more important as functions call other functions, which in turn call other functions. Each function call increases the "nesting level" (the depth of function calls, if you will) of the execution, and uses increasingly more space on the stack. Of paramount importance is the case of a recursive function. When a recursive function continually calls itself, stack space is quickly used as the depth of recursion increases. Nearly every seasoned programmer has made the mistake of writing a recursive function that never properly returns, and calls itself until the system throws up an "out of stack space" type of error.

Nevertheless, all of this talk about the depth of recursion is important, because stacks, even when not used explicitly, are at the heart of a depth first search. A depth first search is typical when traversing through a tree, for instance looking for a particular node in an XML document. The stack is responsible for maintaining, in a sense, a trail of what path was taken to get to the current node, so that the program can "backtrack" (e.g. return from a recursive function call without having found the desired node) and proceed to the next adjacent node.

Trees
Trees are a data structure consisting of one or more data nodes. The first node is called the "root", and each node has zero or more "child nodes". The maximum number of children of a single node, and the maximum depth of children are limited in some cases by the exact type of data represented by the tree.

One of the most common examples of a tree is an XML document. The top-level document element is the root node, and each tag found within that is a child. Each of those tags may have children, and so on. At each node, the type of tag, and any attributes, constitutes the data for that node. In such a tree, the hierarchy and order of the nodes is well defined, and an important part of the data itself. Another good example of a tree is a written outline. The entire outline itself is a root node containing each of the top-level bullet points, each of which may contain one or more sub-bullets, and so on. The file storage system on most disks is also a tree structure.

Corporate structures also lend themselves well to trees. In a classical management hierarchy, a President may have one or more vice presidents, each of whom is in charge of several managers, each of whom presides over several employees.

Binary Trees
A special type of tree is a binary tree. A binary tree also happens to be one of the most efficient ways to store and read a set of records that can be indexed by a key value in some way. The idea behind a binary tree is that each node has, at most, two children.

In the most typical implementations, the key value of the left node is less than that of its parent, and the key value of the right node is greater than that of its parent. Thus, the data stored in a binary tree is always indexed by a key value. When traversing a binary tree, it is simple to determine which child node to traverse when looking for a given key value.

One might ask why a binary tree is preferable to an array of values that has been sorted. In either case, finding a given key value (by traversing a binary tree, or by performing a binary search on a sorted array) carries a time complexity of O(log n). However, adding a new item to a binary tree is an equally simple operation. In contrast, adding an arbitrary item to a sorted array requires some time-consuming reorganization of the existing data in order to maintain the desired ordering.

If you have ever used a field guide to attempt to identify a leaf that you find in the wild, then this is a good way to understand how data is found in a binary tree. To use a field guide, you start at the beginning, and answer a series of questions like "is the leaf jagged, or smooth?" that have only two possible answers. Based upon your answer, you are directed to another page, which asks another question, and so on. After several questions have sufficiently narrowed down the details, you are presented with the name, and perhaps some further information about your leaf. If one were the editor of such a field guide, newly cataloged species could be added to field guide in much the same manner, by traversing through the questions, and finally at the end, inserting a new question that differentiates the new leaf from any other similar leaves. In the case of a computer, the question asked at each node is simply "are you less than or greater than X?"

Priority Queues
In a typical breadth first search (BFS) algorithm, a simple queue works great for keeping track of what states have been visited. Since each new state is one more operational step than the current state, adding new locations to the end of the queue is sufficient to insure that the quickest path is found first. However, the assumption here is that each operation from one state to the next is a single step.

Let us consider another example where you are driving a car, and wish to get to your destination as quickly as possible. A typical problem statement might say that you can move one block up/down/left/right in one minute. In such a case, a simple queue-based BFS works perfectly, and is guaranteed to provide a correct result.

But what happens if we say that the car can move forward one block in two minute, but requires three minutes to make a turn and then move one block (in a direction different from how the car was originally facing)? Depending on what type of move operation we attempt, a new state is not simply one "step" from the current state, and the "in order" nature of a simple queue is lost.

This is where priority queues come in. Simply put, a priority queue accepts states, and internally stores them in a method such that it can quickly pull out the state that has the least cost. (Since, by the nature of a "shortest time/path" type of problem, we always want to explore the states of least cost first.)

A real world example of a priority queue might be waiting to board an airplane. Individuals arriving at their gate earlier will tend to sit closest to the door, so that they can get in line as soon as they are called. However, those individuals with a "gold card", or who travel first class, will always be called first, regardless of when they actually arrived.

One very simple implementation of a priority queue is just an array that searches (one by one) for the lowest cost state contained within, and appends new elements to the end. Such an implementation has a trivial time-complexity for insertions, but is painfully slow to pull objects out again.

A special type of binary tree called a heap is typically used for priority queues. In a heap, the root node is always less than (or greater than, depending on how your value of "priority" is implemented) either of its children. Furthermore, this tree is a "complete tree" from the left. A very simple definition of a complete tree is one where no branch is n + 1 levels deep until all other branches are n levels deep. Furthermore, it is always the leftmost node(s) that are filled first.

To extract a value from a heap, the root node (with the lowest cost or highest priority) is pulled. The deepest, rightmost leaf then becomes the new root node. If the new root node is larger than its left child, then the root is swapped with its left child, in order to maintain the property that the root is always less than its children. This continues as far as necessary down the left branch. Adding a value to the heap is the reverse. The new value is added as the next leaf, and swapped upward as many times as necessary to maintain the heap property.

A convenient property of trees that are complete from the left is that they can be stored very efficiently in a flat array. In general, element 0 of the array is the root, and elements k + 1 and k + 2 are the children of element k. The effect here is that adding the next leaf simply means appending to the array.

Hash Tables
Hash tables are a unique data structure, and are typically used to implement a "dictionary" interface, whereby a set of keys each has an associated value. The key is used as an index to locate the associated values. This is not unlike a classical dictionary, where someone can find a definition (value) of a given word (key).

Unfortunately, not every type of data is quite as easy to sort as a simple dictionary word, and this is where the "hash" comes into play. Hashing is the process of generating a key value (in this case, typically a 32 or 64 bit integer) from a piece of data. This hash value then becomes a basis for organizing and sorting the data. The hash value might be the first n bits of data, the last n bits of data, a modulus of the value, or in some cases, a more complicated function. Using the hash value, different "hash buckets" can be set up to store data. If the hash values are distributed evenly (which is the case for an ideal hash algorithm), then the buckets will tend to fill up evenly, and in many cases, most buckets will have no more than one or only a few objects in them. This makes the search even faster.

A hash bucket containing more than one value is known as a "collision". The exact nature of collision handling is implementation specific, and is crucial to the performance of the hash table. One of the simplest methods is to implement a structure like a linked list at the hash bucket level, so that elements with the same hash value can be chained together at the proper location. Other, more complicated schemes may involve utilizing adjacent, unused locations in the table, or re-hashing the hash value to obtain a new value. As always, there are good and bad performance considerations (regarding time, size, and complexity) with any approach.

Another good example of a hash table is the Dewey decimal system, used in many libraries. Every book is assigned a number, based upon its subject matterï¿½ the 500's are all science books, the 700's are all the arts, etc. Much like a real hash table, the speed at which a person could find a given book is based upon how well the hash buckets are evenly dividedï¿½ It will take longer to find a book about frogs in a library with many science materials than in a library consisting mostly of classical literature.

In applications development, hash tables are a convenient place to store reference data, like state abbreviations that link to full state names. In problem solving, hash tables are useful for implementing a divide-and-conquer approach to knapsack-type problems. In LongPipes, we are asked to find the minimum number of pipes needed to construct a single pipe of a given length, and we have up to 38 pieces of pipe. By dividing this into two sets of 19, and calculating all possible lengths from each set, we create hash tables linking the length of the pipe to the fewest number of segments used. Then, for each constructed pipe in one set, we can easily look up, whether or not we constructed a pipe of corresponding length in the other set, such that the two join to form a complete pipe of the desired length.

Conclusion
The larger picture to be seen from all of this is that data structures are just another set of tools that should be in the kit of a seasoned programmer. Comprehensive libraries and frameworks available with most languages nowadays preempt the need for a full understanding of how to implement each of these tools. The result is that developers are able to quickly produce quality solutions that take advantage of powerful ideas. The challenge lies in knowing which one to select.

Friday, February 22, 2013

Interview Cisco Sytems SDET


Cisco Systems Interview Experience : SDET, Pune, India

One fine day...i got a call from Cisco HR if i was availaible for an interview with Cisco Systems for an SDET position(call was correspoding to my refferal made by one of my friends there)in the coming week..? I agreed. Following is the narrative of my interview experience with Cisco.

First was a telephonic screening round. They were two guys and they started with my project i made in college. They asked me to write small pseudo code snippets of what i did and seeked some explanation for it. Then they said the interview would be split into two halves - networking and programming. In networking they started with the explanation of Basic OSI networking model and slowly started to go deeper and deeper.Some of the more specific questions involved explaining some protocols and examples where they are used like ARP,DHCP,BGP,OSPF etc. They were more interested to know which reference layer they work at. Then there was some discussion on multicasting , IPV6 addresses , MAC address.

First half went quite good and they started with the programming section. It started with questions like difference between the main () function of c and c++...i told them of the return types...they were not fully convinced though. Then they wanted me to write a linked list reversal program and dictate the code. They asked me how comfortable i was in OOPS... i told not much but the still continued with the standard OOPS questions. They grilled on virtual functions and their use. It was more of a theoretical interview where they were checking if a am aware of basic Computer science fundamentals. They also touched upon the basic Unix commands and wanted me to explain what happens when commands like 'ls' is typed on a shell.

This round lasted for a bit more than an hour and i thought i had done more than OK in this screen.

After a wait of 3 days i finally got a call from Cisco HR informing me that the inerview went OK but they want to have one more telephone screening round, this time with the Technical lead. I agreed for the next day afternoon interview. This time there was only one lady...who was a technical lead of the team i was interviewing for. This interview was purely programming based. She asked me some questions on linked lists..how to delete a node with only its pointer given. I knew this question before hand and skillfully gave its answer...considering  i was only 10 months out of college she seemed impressed:) She wanted me to write a program to convert a decimal number into binary and trust me i fumbled at this question ... She moved ahead with the discussion on complexities of algorithms...she wanted me to explain The Big O,theta notations and seeked some examples of sorting algorithms and wanted to know their complexities. Then she asked if i know of any scripting languages like Perl Python ...my answer was a NO but i told given the time i can always learn them with a good programming knowledge.

                                This round lasted for about 50 minutes and then i started a wait for a long time of 10 days before I finally got a call from HR that if i could come for an inHouse interview. I agreed and and had my first flight of the life for an interview in Cisco campus. This time the interviewers were 2 people from the team..One manager and the same team-lead who took my second phone screen. They started with the heapsort question and wanted to know the applications where minHeap can be used as a datastructructure. Second question consisted of finding the middle element of the linked list...i told we can maintain two pointers (fast and slow) , fast is incremented by two positions and slow by one and when fast reaches end then the slowere pointer will point to the middle position. They were quite satisfied with the explanation and wanted me to code it. And guess what the third question was..?WAP to convert decimal to binary....the exact same question i fumbled upon in the second screen...they are  interested to see if a person does some homework after the interview..thankfully i did and was able to answer that..and i could see a smile on the face of the TL:P

This round lasted for about 50 minutes and then I had lunch with the TL...

                                As a final round i had to face the Business unit head...and to my surprise it happened to be a Technical too...He asked me questions on multiCast Addresses. Wanted to know the difference between static and global variable and wanted me to to explain the process structure in memory.Followed with the normal discussion on why Cisco and if i was ready to relocate? Interview ended well and then i was asked that I am done for the day and HR will contact me.

I got a call a week later confirming my selection.

Things to lookout for a Cisco interview:

1) networking:- OSI model,specific protocols and their applications (IP,TCP,BGP,OSPF,ARP,RARP),addressing techniques (classfull and classless,v6,multicast,broadcast addresses).

2)Programming:-Basic Linked list,trees problems , basic searching sorting algorithms(merge quick heap among standard ones)...they wll not grill much especially for freshers,1-1.5 years.

3)Operating system: A decent knowledge of Unix and basic OS fundamentals-threads locks memory process management.

luck
-House

Sorting-1

CodoPhilic

AlgoMania - Sorting

Disclaimer: This document may just be a reproduction of the content on the internet with slight modifications, we lay no claim to it whatsoever.

Introduction

Any number of practical applications in computing require things to be in order. Even before we start computing, the importance of sorting is drilled into us. From group pictures that require the tallest people to stand in the back, to the highest grossing salesman getting the largest Christmas bonus, the need to put things smallest to largest or first to last cannot be underestimated.

When we query a database, and append an ORDER BY clause, we are sorting. When we look for an entry in the phone book, we are dealing with a list that has already been sorted. (And imagine if it weren't!) If you need to search through an array efficiently using a binary search, it is necessary to first sort the array. When a problem statement dictates that in the case of a tie we should return the lexicographically first result, well… you get the idea.

General Considerations
Imagine taking a group of people, giving them each a deck of cards that has been shuffled, and requesting that they sort the cards in ascending rank order. Some people might start making piles, others might spread the cards all over a table, and still others might juggle the cards around in their hands. For some, the exercise might take a matter of seconds, for others several minutes or longer. Some might end up with a deck of cards where spades always appear before hearts, in other cases it might be less organized. Fundamentally, these are all the big bullet points that lead algorithmists to debate the pros and cons of various sorting algorithms.

When comparing various sorting algorithms, there are several things to consider. The first is usually runtime. When dealing with increasingly large sets of data, inefficient sorting algorithms can become too slow for practical use within an application.

A second consideration is memory space. Faster algorithms that require recursive calls typically involve creating copies of the data to be sorted. In some environments where memory space may be at a premium (such as an embedded system) certain algorithms may be impractical. In other cases, it may be possible to modify the algorithm to work "in place", without creating copies of the data. However, this modification may also come at the cost of some of the performance advantage.

A third consideration is stability. Stability, simply defined, is what happens to elements that are comparatively the same. In a stable sort, those elements whose comparison key is the same will remain in the same relative order after sorting as they were before sorting. In an unstable sort, no guarantee is made as to the relative output order of those elements whose sort key is the same.

Bubble Sort
One of the first sorting algorithms that is taught to students is bubble sort. While it is not fast enough in practice for all but the smallest data sets, it does serve the purpose of showing how a sorting algorithm works. Typically, it looks something like this:

for (int i = 0; i < data.Length; i++)

   for (int j = 0; j < data.Length - 1; j++)

      if (data[j] > data[j + 1])

      {

        tmp = data[j];

         data[j] = data[j + 1];

         data[j + 1] = tmp;

      }

The idea is to pass through the data from one end to the other, and swap two adjacent elements whenever the first is greater than the last. Thus, the smallest elements will "bubble" to the surface. This is O(n²) runtime, and hence is very slow for large data sets. The single best advantage of a bubble sort, however, is that it is very simple to understand and code from memory. Additionally, it is a stable sort that requires no additional memory, since all swaps are made in place.

Insertion Sort
Insertion sort is an algorithm that seeks to sort a list one element at a time. With each iteration, it takes the next element waiting to be sorted, and adds it, in proper location, to those elements that have already been sorted.

for (int i = 0; i <= data.Length; i++) {

   int j = i;

   while (j > 0 && data[i] < data[j - 1])

      j--;

   int tmp = data[i];

   for (int k = i; k > j; k--)

      data[k] = data[k - 1];

   data[j] = tmp;

}

The data, as it is processed on each run of the outer loop, might look like this:

{18, 6, 9, 1, 4, 15, 12, 5, 6, 7, 11}

{ 6, 18, 9, 1, 4, 15, 12, 5, 6, 7, 11}

{ 6, 9, 18, 1, 4, 15, 12, 5, 6, 7, 11}

{ 1, 6, 9, 18, 4, 15, 12, 5, 6, 7, 11}

{ 1, 4, 6, 9, 18, 15, 12, 5, 6, 7, 11}

{ 1, 4, 6, 9, 15, 18, 12, 5, 6, 7, 11}

{ 1, 4, 6, 9, 12, 15, 18, 5, 6, 7, 11}

{ 1, 4, 5, 6, 9, 12, 15, 18, 6, 7, 11}

{ 1, 4, 5, 6, 6, 9, 12, 15, 18, 7, 11}

{ 1, 4, 5, 6, 6, 7, 9, 12, 15, 18, 11}

{ 1, 4, 5, 6, 6, 7, 9, 11, 12, 15, 18}

One of the principal advantages of the insertion sort is that it works very efficiently for lists that are nearly sorted initially. Furthermore, it can also work on data sets that are constantly being added to. For instance, if one wanted to maintain a sorted list of the highest scores achieved in a game, an insertion sort would work well, since new elements would be added to the data as the game was played.

Merge Sort
A merge sort works recursively. First it divides a data set in half, and sorts each half separately. Next, the first elements from each of the two lists are compared. The lesser element is then removed from its list and added to the final result list.

int[] mergeSort (int[] data) {

   if (data.Length == 1)

      return data;

   int middle = data.Length / 2;

   int[] left = mergeSort(subArray(data, 0, middle - 1));

   int[] right = mergeSort(subArray(data, middle, data.Length - 1));

   int[] result = new int[data.Length];

   int dPtr = 0;

   int lPtr = 0;

   int rPtr = 0;

   while (dPtr < data.Length) {

      if (lPtr == left.Length) {

         result[dPtr] = right[rPtr];

         rPtr++;

      } else if (rPtr == right.Length) {

         result[dPtr] = left[lPtr];

         lPtr++;

      } else if (left[lPtr] < right[rPtr]) {

         result[dPtr] = left[lPtr];

         lPtr++;

      } else {

         result[dPtr] = right[rPtr];

         rPtr++;

      }

      dPtr++;

   }

   return result;

}

Each recursive call has O(n) runtime, and a total of O(log n) recursions are required, thus the runtime of this algorithm is O(n * log n). A merge sort can also be modified for performance on lists that are nearly sorted to begin with. After sorting each half of the data, if the highest element in one list is less than the lowest element in the other half, then the merge step is unnecessary. (The Java API implements this particular optimization, for instance.) The data, as the process is called recursively, might look like this:

{18, 6, 9, 1, 4, 15, 12, 5, 6, 7, 11}

{18, 6, 9, 1, 4} {15, 12, 5, 6, 7, 11}

{18, 6} {9, 1, 4} {15, 12, 5} {6, 7, 11}

{18} {6} {9} {1, 4} {15} {12, 5} {6} {7, 11}

{18} {6} {9} {1} {4} {15} {12} {5} {6} {7} {11}

{18} {6} {9} {1, 4} {15} {5, 12} {6} {7, 11}

{6, 18} {1, 4, 9} {5, 12, 15} {6, 7, 11}

{1, 4, 6, 9, 18} {5, 6, 7, 11, 12, 15}

{1, 4, 5, 6, 6, 7, 9, 11, 12, 15, 18}

Apart from being fairly efficient, a merge sort has the advantage that it can be used to solve other problems, such as determining how "unsorted" a given list is.

Heap Sort
In a heap sort, we create a heap data structure. A heap is a data structure that stores data in a tree such that the smallest (or largest) element is always the root node. (Heaps, also known as priority queues, were discussed in more detail in Data Structures.) To perform a heap sort, all data from a list is inserted into a heap, and then the root element is repeatedly removed and stored back into the list. Since the root element is always the smallest element, the result is a sorted list. If you already have a Heap implementation available or you utilize the Java PriorityQueue (newly available in version 1.5), performing a heap sort is fairly short to code:

Heap h = new Heap();

for (int i = 0; i < data.Length; i++)

   h.Add(data[i]);

int[] result = new int[data.Length];

for (int i = 0; i < data.Length; i++)

   data[i] = h.RemoveLowest();

The runtime of a heap sort has an upper bound of O(n * log n). Additionally, its requirement for storage space is only that of storing the heap; this size is linear in proportion to the size of the list. Heap sort has the disadvantage of not being stable, and is somewhat more complicated to understand beyond just the basic implementation.

Quick Sort
A quick sort, as the name implies, is intended to be an efficient sorting algorithm. The theory behind it is to sort a list in a way very similar to how a human might do it. First, divide the data into two groups of "high" values and "low" values. Then, recursively process the two halves. Finally, reassemble the now sorted list.

Array quickSort(Array data) {

   if (Array.Length <= 1)

      return;

   middle = Array[Array.Length / 2];

   Array left = new Array();

   Array right = new Array();

   for (int i = 0; i < Array.Length; i++)

      if (i != Array.Length / 2) {

         if (Array[i] <= middle)

            left.Add(Array[i]);

         else

            right.Add(Array[i]);

      }

   return concatenate(quickSort(left), middle, quickSort(right));

}

The challenge of a quick sort is to determine a reasonable "midpoint" value for dividing the data into two groups. The efficiency of the algorithm is entirely dependent upon how successfully an accurate midpoint value is selected. In a best case, the runtime is O(n * log n). In the worst case-where one of the two groups always has only a single element-the runtime drops to O(n²). The actual sorting of the elements might work out to look something like this:

{18, 6, 9, 1, 4, 15, 12, 5, 6, 7, 11}

{6, 9, 1, 4, 12, 5, 6, 7, 11} {15} {18}

{6, 9, 1, 4, 5, 6, 7, 11} {12} {15} {18}

{1, 4} {5} {6, 9, 6, 7, 11} {12} {15} {18}

{1} {4} {5} {6} {6} {9, 7, 11} {12} {15} {18}

{1} {4} {5} {6} {6} {7} {9, 11} {12} {15} {18}

{1} {4} {5} {6} {6} {7} {9} {11} {12} {15} {18}

If it is known that the data to be sorted all fit within a given range, or fit a certain distribution model, this knowledge can be used to improve the efficiency of the algorithm by choosing midpoint values that are likely to divide the data in half as close to evenly as possible. A generic algorithm that is designed to work without respect to data types or value ranges may simply select a value from the unsorted list, or use some random method to determine a midpoint.

Radix Sort
The radix sort was designed originally to sort data without having to directly compare elements to each other. A radix sort first takes the least-significant digit (or several digits, or bits), and places the values into buckets. If we took 4 bits at a time, we would need 16 buckets. We then put the list back together, and have a resulting list that is sorted by the least significant radix. We then do the same process, this time using the second-least significant radix. We lather, rinse, and repeat, until we get to the most significant radix, at which point the final result is a properly sorted list.

For example, let's look at a list of numbers and do a radix sort using a 1-bit radix. Notice that it takes us 4 steps to get our final result, and that on each step we setup exactly two buckets:

{6, 9, 1, 4, 15, 12, 5, 6, 7, 11}

{6, 4, 12, 6} {9, 1, 15, 5, 7, 11}

{4, 12, 9, 1, 5} {6, 6, 15, 7, 11}

{9, 1, 11} {4, 12, 5, 6, 6, 15, 7}

{1, 4, 5, 6, 6, 7} {9, 11, 12, 15}

Let's do the same thing, but now using a 2-bit radix. Notice that it will only take us two steps to get our result, but each step requires setting up 4 buckets:

{6, 9, 1, 4, 15, 12, 5, 6, 7, 11}

{4, 12} {9, 1, 5} {6, 6} {15, 7, 11}

{1} {4, 5, 6, 6, 7} {9, 11} {12, 15}

Given the relatively small scope of our example, we could use a 4-bit radix and sort our list in a single step with 16 buckets:

{6, 9, 1, 4, 15, 12, 5, 6, 7, 11}

{1} {} {} {4} {5} {6, 6} {7} {} {9} {} {11} {12} {} {} {15}

Notice, however, in the last example, that we have several empty buckets. This illustrates the point that, on a much larger scale, there is an obvious ceiling to how much we can increase the size of our radix before we start to push the limits of available memory. The processing time to reassemble a large number of buckets back into a single list would also become an important consideration at some point.

Because radix sort is qualitatively different than comparison sorting, it is able to perform at greater efficiency in many cases. The runtime is O(n * k), where k is the size of the key. (32-bit integers, taken 4 bits at a time, would have k = 8.) The primary disadvantage is that some types of data may use very long keys (strings, for instance), or may not easily lend itself to a representation that can be processed from least significant to most-significant. (Negative floating-point values are the most commonly cited example.)

Sorting Libraries
Nowadays, most programming platforms include runtime libraries that provide a number of useful and reusable functions for us. The .NET framework, Java API, and C++ STL all provide some built-in sorting capabilities. Even better, the basic premise behind how they work is similar from one language to the next.

For standard data types such as scalars, floats, and strings, everything needed to sort an array is already present in the standard libraries. But what if we have custom data types that require more complicated comparison logic? Fortunately, object-oriented programming provides the ability for the standard libraries to solve this as well.

In both Java and C# (and VB for that matter), there is an interface called Comparable (IComparable in .NET). By implementing the IComparable interface on a user-defined class, you add a method int CompareTo (object other), which returns a negative value if less than, 0 if equal to, or a positive value if greater than the parameter. The library sort functions will then work on arrays of your new data type.

Additionally, there is another interface called Comparator (IComparer in .NET), which defines a single method int Compare (object obj1, object obj2), which returns a value indicating the results of comparing the two parameters.

The greatest joy of using the sorting functions provided by the libraries is that it saves a lot of coding time, and requires a lot less thought and effort. However, even with the heavy lifting already completed, it is still nice to know how things work under the hood.