CodoPhilic
                                                    AlgoMania - Sorting 
Disclaimer: This document may just be a reproduction of the content on the internet with slight modifications, we lay no claim to it whatsoever.
Introduction
When we query a database, and append an ORDER BY
clause, we are sorting. When we look for an entry in the phone book, we are
dealing with a list that has already been sorted. (And imagine if it weren't!)
If you need to search through an array efficiently using a binary search, it is
necessary to first sort the array. When a problem statement dictates that in
the case of a tie we should return the lexicographically first result, well…
you get the idea. 
General
Considerations
Imagine taking a group of people, giving them
each a deck of cards that has been shuffled, and requesting that they sort the
cards in ascending rank order. Some people might start making piles, others
might spread the cards all over a table, and still others might juggle the
cards around in their hands. For some, the exercise might take a matter of
seconds, for others several minutes or longer. Some might end up with a deck of
cards where spades always appear before hearts, in other cases it might be less
organized. Fundamentally, these are all the big bullet points that lead
algorithmists to debate the pros and cons of various sorting algorithms. 
When comparing various sorting algorithms, there
are several things to consider. The first is usually runtime. When dealing with
increasingly large sets of data, inefficient sorting algorithms can become too
slow for practical use within an application. 
A second consideration is memory space. Faster
algorithms that require recursive calls typically involve creating copies of
the data to be sorted. In some environments where memory space may be at a
premium (such as an embedded system) certain algorithms may be impractical. In
other cases, it may be possible to modify the algorithm to work "in
place", without creating copies of the data. However, this modification
may also come at the cost of some of the performance advantage. 
A third consideration is stability. Stability,
simply defined, is what happens to elements that are comparatively the same. In
a stable sort, those elements whose comparison key is the same will remain in
the same relative order after sorting as they were before sorting. In an
unstable sort, no guarantee is made as to the relative output order of those
elements whose sort key is the same. 
Bubble
Sort
One of the first sorting algorithms that is
taught to students is bubble sort. While it is not fast enough in practice for
all but the smallest data sets, it does serve the purpose of showing how a
sorting algorithm works. Typically, it looks something like this:
Insertion
Sort
Insertion sort is an algorithm that seeks to
sort a list one element at a time. With each iteration, it takes the next
element waiting to be sorted, and adds it, in proper location, to those
elements that have already been sorted.
Merge
Sort
A merge sort works recursively. First it divides
a data set in half, and sorts each half separately. Next, the first elements
from each of the two lists are compared. The lesser element is then removed
from its list and added to the final result list.
Heap
Sort
In a heap sort, we create a heap data structure.
A heap is a data structure that stores data in a tree such that the smallest
(or largest) element is always the root node. (Heaps, also known as priority
queues, were discussed in more detail in Data Structures.) To perform a heap sort, all data from a list
is inserted into a heap, and then the root element is repeatedly removed and
stored back into the list. Since the root element is always the smallest
element, the result is a sorted list. If you already have a Heap implementation
available or you utilize the Java PriorityQueue (newly available in version
1.5), performing a heap sort is fairly short to code:
Quick
Sort
A quick sort, as the name implies, is intended
to be an efficient sorting algorithm. The theory behind it is to sort a list in
a way very similar to how a human might do it. First, divide the data into two
groups of "high" values and "low" values. Then, recursively
process the two halves. Finally, reassemble the now sorted list.
Radix
Sort
The radix sort was designed originally to sort
data without having to directly compare elements to each other. A radix sort
first takes the least-significant digit (or several digits, or bits), and
places the values into buckets. If we took 4 bits at a time, we would need 16
buckets. We then put the list back together, and have a resulting list that is
sorted by the least significant radix. We then do the same process, this time
using the second-least significant radix. We lather, rinse, and repeat, until
we get to the most significant radix, at which point the final result is a
properly sorted list. 
For example, let's look at a list of numbers and
do a radix sort using a 1-bit radix. Notice that it takes us 4 steps to get our
final result, and that on each step we setup exactly two buckets:
Because radix sort is qualitatively different
than comparison sorting, it is able to perform at greater efficiency in many
cases. The runtime is O(n * k), where k is
the size of the key. (32-bit integers, taken 4 bits at a time, would have k =
8.) The primary disadvantage is that some types of data may use very long keys
(strings, for instance), or may not easily lend itself to a representation that
can be processed from least significant to most-significant. (Negative
floating-point values are the most commonly cited example.) 
Sorting
Libraries
Nowadays, most programming platforms include
runtime libraries that provide a number of useful and reusable functions for
us. The .NET framework, Java API, and C++ STL all provide some built-in sorting
capabilities. Even better, the basic premise behind how they work is similar
from one language to the next. 
For standard data types such as scalars, floats,
and strings, everything needed to sort an array is already present in the
standard libraries. But what if we have custom data types that require more complicated
comparison logic? Fortunately, object-oriented programming provides the ability
for the standard libraries to solve this as well. 
In both Java and C# (and VB for that matter),
there is an interface called Comparable (IComparable in .NET). By implementing
the IComparable interface on a user-defined class, you add a method int CompareTo (object other),
which returns a negative value if less than, 0 if equal to, or a positive value
if greater than the parameter. The library sort functions will then work on
arrays of your new data type. 
Additionally, there is another interface called
Comparator (IComparer in .NET), which defines a single method int Compare (object obj1,
object obj2), which returns a value indicating the results of comparing
the two parameters. 
The greatest joy of using the sorting functions
provided by the libraries is that it saves a lot of coding time, and requires a
lot less thought and effort. However, even with the heavy lifting already
completed, it is still nice to know how things work under the hood.
CodoPhilic
AlgoMania - Sorting
Disclaimer: This document may just be a reproduction of the content on the internet with slight modifications, we lay no claim to it whatsoever.
When we query a database, and append an ORDER BY clause, we are sorting. When we look for an entry in the phone book, we are dealing with a list that has already been sorted. (And imagine if it weren't!) If you need to search through an array efficiently using a binary search, it is necessary to first sort the array. When a problem statement dictates that in the case of a tie we should return the lexicographically first result, well… you get the idea.
General Considerations
Imagine taking a group of people, giving them each a deck of cards that has been shuffled, and requesting that they sort the cards in ascending rank order. Some people might start making piles, others might spread the cards all over a table, and still others might juggle the cards around in their hands. For some, the exercise might take a matter of seconds, for others several minutes or longer. Some might end up with a deck of cards where spades always appear before hearts, in other cases it might be less organized. Fundamentally, these are all the big bullet points that lead algorithmists to debate the pros and cons of various sorting algorithms.
When comparing various sorting algorithms, there are several things to consider. The first is usually runtime. When dealing with increasingly large sets of data, inefficient sorting algorithms can become too slow for practical use within an application.
A second consideration is memory space. Faster algorithms that require recursive calls typically involve creating copies of the data to be sorted. In some environments where memory space may be at a premium (such as an embedded system) certain algorithms may be impractical. In other cases, it may be possible to modify the algorithm to work "in place", without creating copies of the data. However, this modification may also come at the cost of some of the performance advantage.
A third consideration is stability. Stability, simply defined, is what happens to elements that are comparatively the same. In a stable sort, those elements whose comparison key is the same will remain in the same relative order after sorting as they were before sorting. In an unstable sort, no guarantee is made as to the relative output order of those elements whose sort key is the same.
Bubble Sort
One of the first sorting algorithms that is taught to students is bubble sort. While it is not fast enough in practice for all but the smallest data sets, it does serve the purpose of showing how a sorting algorithm works. Typically, it looks something like this:
for
(int i = 0; i < data.Length; i++)
   for (int j = 0; j < data.Length - 1; j++)
      if (data[j] > data[j + 1])
      {
         tmp = data[j];
         data[j] = data[j + 1];
         data[j + 1] = tmp;
      }
Insertion Sort
Insertion sort is an algorithm that seeks to sort a list one element at a time. With each iteration, it takes the next element waiting to be sorted, and adds it, in proper location, to those elements that have already been sorted.
for
(int i = 0; i <= data.Length; i++) {
   int j = i;
   while (j > 0 && data[i] <
data[j - 1])
      j--;
   int tmp = data[i];
   for (int k = i; k > j; k--)
      data[k] = data[k - 1];
   data[j] = tmp;
}
{18,  6, 
9,  1,  4, 15, 12, 
5,  6,  7, 11}
{
6, 18,  9,  1,  4,
15, 12,  5,  6,  7,
11}
{
6,  9, 18,  1,  4,
15, 12,  5,  6,  7,
11}
{
1,  6,  9, 18, 
4, 15, 12,  5,  6,  7,
11}
{
1,  4,  6,  9,
18, 15, 12,  5,  6,  7,
11}
{
1,  4, 
6,  9, 15, 18, 12,  5, 
6,  7, 11}
{
1,  4, 
6,  9, 12, 15, 18,  5, 
6,  7, 11}
{
1,  4, 
5,  6,  9, 12, 15, 18,  6,  7,
11}
{
1,  4, 
5,  6,  6, 
9, 12, 15, 18,  7, 11}
{
1,  4, 
5,  6,  6,  7,  9, 12, 15, 18, 11}
{
1,  4, 
5,  6,  6, 
7,  9, 11, 12, 15, 18}
Merge Sort
A merge sort works recursively. First it divides a data set in half, and sorts each half separately. Next, the first elements from each of the two lists are compared. The lesser element is then removed from its list and added to the final result list.
int[]
mergeSort (int[] data) {
   if (data.Length == 1)
      return data;
   int middle = data.Length / 2;
   int[] left = mergeSort(subArray(data, 0,
middle - 1));
   int[] right = mergeSort(subArray(data,
middle, data.Length - 1));
   int[] result = new int[data.Length];
   int dPtr = 0;
   int lPtr = 0;
   int rPtr = 0;
   while (dPtr < data.Length) {
      if (lPtr == left.Length) {
         result[dPtr] = right[rPtr];
         rPtr++;         
      } else if (rPtr == right.Length) {
         result[dPtr] = left[lPtr];
         lPtr++;
      } else if (left[lPtr] < right[rPtr]) {
         result[dPtr] = left[lPtr];
         lPtr++;
      } else {
         result[dPtr] = right[rPtr];
         rPtr++;         
      }
      dPtr++;
   }
   return result;
}
{18,
6, 9, 1, 4, 15, 12, 5, 6, 7, 11}
{18,
6, 9, 1, 4} {15, 12, 5, 6, 7, 11}
{18,
6} {9, 1, 4} {15, 12, 5} {6, 7, 11}
{18}
{6} {9} {1, 4} {15} {12, 5} {6} {7, 11}
{18}
{6} {9} {1} {4} {15} {12} {5} {6} {7} {11}
{18}
{6} {9} {1, 4} {15} {5, 12} {6} {7, 11}
{6,
18} {1, 4, 9} {5, 12, 15} {6, 7, 11}
{1,
4, 6, 9, 18} {5, 6, 7, 11, 12, 15}
{1,
4, 5, 6, 6, 7, 9, 11, 12, 15, 18}
Heap Sort
In a heap sort, we create a heap data structure. A heap is a data structure that stores data in a tree such that the smallest (or largest) element is always the root node. (Heaps, also known as priority queues, were discussed in more detail in Data Structures.) To perform a heap sort, all data from a list is inserted into a heap, and then the root element is repeatedly removed and stored back into the list. Since the root element is always the smallest element, the result is a sorted list. If you already have a Heap implementation available or you utilize the Java PriorityQueue (newly available in version 1.5), performing a heap sort is fairly short to code:
Heap
h = new Heap();
for
(int i = 0; i < data.Length; i++)
   h.Add(data[i]);
int[]
result = new int[data.Length];
for
(int i = 0; i < data.Length; i++)
   data[i] = h.RemoveLowest();
Quick Sort
A quick sort, as the name implies, is intended to be an efficient sorting algorithm. The theory behind it is to sort a list in a way very similar to how a human might do it. First, divide the data into two groups of "high" values and "low" values. Then, recursively process the two halves. Finally, reassemble the now sorted list.
Array
quickSort(Array data) {
   if (Array.Length <= 1)
      return;
   middle = Array[Array.Length / 2];
   Array left = new Array();
   Array right = new Array();
   for (int i = 0; i < Array.Length; i++)
      if (i != Array.Length / 2) {
         if (Array[i] <= middle)
            left.Add(Array[i]);
         else
            right.Add(Array[i]);
      }
   return concatenate(quickSort(left), middle,
quickSort(right));
}
{18,
6, 9, 1, 4, 15, 12, 5, 6, 7, 11}
{6,
9, 1, 4, 12, 5, 6, 7, 11} {15} {18}
{6,
9, 1, 4, 5, 6, 7, 11} {12} {15} {18}
{1,
4} {5} {6, 9, 6, 7, 11} {12} {15} {18}
{1}
{4} {5} {6} {6} {9, 7, 11} {12} {15} {18}
{1}
{4} {5} {6} {6} {7} {9, 11} {12} {15} {18}
{1}
{4} {5} {6} {6} {7} {9} {11} {12} {15} {18}
Radix Sort
The radix sort was designed originally to sort data without having to directly compare elements to each other. A radix sort first takes the least-significant digit (or several digits, or bits), and places the values into buckets. If we took 4 bits at a time, we would need 16 buckets. We then put the list back together, and have a resulting list that is sorted by the least significant radix. We then do the same process, this time using the second-least significant radix. We lather, rinse, and repeat, until we get to the most significant radix, at which point the final result is a properly sorted list.
For example, let's look at a list of numbers and do a radix sort using a 1-bit radix. Notice that it takes us 4 steps to get our final result, and that on each step we setup exactly two buckets:
{6,
9, 1, 4, 15, 12, 5, 6, 7, 11}
{6,
4, 12, 6} {9, 1, 15, 5, 7, 11}
{4,
12, 9, 1, 5} {6, 6, 15, 7, 11}
{9,
1, 11} {4, 12, 5, 6, 6, 15, 7}
{1,
4, 5, 6, 6, 7} {9, 11, 12, 15}
{6,
9, 1, 4, 15, 12, 5, 6, 7, 11}
{4,
12} {9, 1, 5} {6, 6} {15, 7, 11}
{1}
{4, 5, 6, 6, 7} {9, 11} {12, 15}
{6,
9, 1, 4, 15, 12, 5, 6, 7, 11}
{1}
{} {} {4} {5} {6, 6} {7} {} {9} {} {11} {12} {} {} {15}
Because radix sort is qualitatively different than comparison sorting, it is able to perform at greater efficiency in many cases. The runtime is O(n * k), where k is the size of the key. (32-bit integers, taken 4 bits at a time, would have k = 8.) The primary disadvantage is that some types of data may use very long keys (strings, for instance), or may not easily lend itself to a representation that can be processed from least significant to most-significant. (Negative floating-point values are the most commonly cited example.)
Sorting Libraries
Nowadays, most programming platforms include runtime libraries that provide a number of useful and reusable functions for us. The .NET framework, Java API, and C++ STL all provide some built-in sorting capabilities. Even better, the basic premise behind how they work is similar from one language to the next.
For standard data types such as scalars, floats, and strings, everything needed to sort an array is already present in the standard libraries. But what if we have custom data types that require more complicated comparison logic? Fortunately, object-oriented programming provides the ability for the standard libraries to solve this as well.
In both Java and C# (and VB for that matter), there is an interface called Comparable (IComparable in .NET). By implementing the IComparable interface on a user-defined class, you add a method int CompareTo (object other), which returns a negative value if less than, 0 if equal to, or a positive value if greater than the parameter. The library sort functions will then work on arrays of your new data type.
Additionally, there is another interface called Comparator (IComparer in .NET), which defines a single method int Compare (object obj1, object obj2), which returns a value indicating the results of comparing the two parameters.
The greatest joy of using the sorting functions provided by the libraries is that it saves a lot of coding time, and requires a lot less thought and effort. However, even with the heavy lifting already completed, it is still nice to know how things work under the hood.
 
No comments:
Post a Comment