Friday, February 11, 2011

Shell Sort

     Shell sort is a sorting algorithm, devised by Donald Shell in 1959, that is a generalization of insertion sort, which exploits the fact that insertion sort works efficiently on input that is already almost sorted. It improves on insertion sort by allowing the comparison and exchange of elements that are far apart. The last step of Shell sort is a plain insertion sort, but by then, the array of data is guaranteed to be almost sorted.


Analysis

     Although Shell sort is easy to code, analyzing its performance is very difficult and depends on the choice of increment sequence. The algorithm was one of the first to break the quadratic time barrier, but this fact was not proven until some time after its discovery.
The initial increment sequence suggested by Donald Shell was [1,2,4,8,16,...,2k], but this is a very poor choice in practice because it means that elements in odd positions are not compared with elements in even positions until the very last step. The original implementation performs O(n2) comparisons and exchanges in the worst case. A simple change, replacing 2k with 2k-1, improves the worst-case running time to O(N3/2), a bound that cannot be improved.
A minor change given in V. Pratt's book improved the bound to O(n log2 n). This is worse than the optimal comparison sorts, which are O(n log n), but lends itself to sorting networks and has the same asymptotic gate complexity as Batcher's bitonic sorter.
     Consider a small value that is initially stored in the wrong end of the array. Using an O(n2) sort such as bubble sort or insertion sort, it will take roughly n comparisons and exchanges to move this value all the way to the other end of the array. Shell sort first moves values using giant step sizes, so a small value will move a long way towards its final position, with just a few comparisons and exchanges.
     One can visualize Shell sort in the following way: arrange the list into a table and sort the columns (using an insertion sort). Repeat this process, each time with smaller number of longer columns. At the end, the table has only one column. While transforming the list into a table makes it easier to visualize, the algorithm itself does its sorting in-place (by incrementing the index by the step size, i.e. using i += step_size instead of i++).
For example, consider a list of numbers like [ 13 14 94 33 82 25 59 94 65 23 45 27 73 25 39 10 ]. If we started with a step-size of 5, we could visualize this as breaking the list of numbers into a table with 5 columns. This would look like this:


13 14 94 33 82
25 59 94 65 23
45 27 73 25 39
10


We then sort each column, which gives us
10 14 73 25 23
13 27 94 33 39
25 59 94 65 82
45
When read back as a single list of numbers, we get [ 10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45 ]. Here, the 10 which was all the way at the end, has moved all the way to the beginning. This list is then again sorted using a 3-gap sort as shown below.


10 14 73
25 23 13
27 94 33
39 25 59
94 65 82
45


Which gives us


10 14 13
25 23 33
27 25 59
39 65 73
45 94 82
94
Now all that remains to be done is a 1-gap sort (simple insertion sort).

Reference:
http://en.wikipedia.org/wiki/Shell_sort 

No comments:

Post a Comment