WHAT IS A BUBBLE SORT?
|
|
An Explanation With Javascript Graphical Support
|
In a previous file it has been dealt with the selection sort algorithms.
A Franz Ritter Von Stuck painting, Sin
|
Here I deal with further instances of quadratic sorting algorithms, though having at least a quick look at that file too rather than limiting yourself to reading this could be benefical.
The bubble sort is a somewhat counter intuitive algorithm, whose internal logics escape also seasoned programmers at times, if such logics aren't revisited routinely.
So, even before I start dealing with a bubble sort features description, I introduce the following simple javascript Dhtml interface which can produce an animation showing how a bubble sort works.
Since every sorting algorithm presumes an array as its elective data structure to manipulate, the following histogram is to be considered as a possible graphical representation of an array whereas each bar is a value of a different magnitude (graphically imagined as stretching capacities) held within such array (because, obviously, sorting algorithms must arrange an order within an array of disordered magnitudes).
To produce the animation just click repeatedly the button titled « STEP ON».
I suggest to you to do it now, so to have firstly an intuitive understanding of how a bubble sort walks and then you can come back to this very same representation for a closer look after you have read the textual "treatise" of the algorithm.
You should go on clicking the button repeatedly, each click furthers the animation one step forward so that you can also pause if you want to ponder about the ongoing process.
Eventually, an alert informs you when the sorting process is to be considered completed, and will prompt you on whether you want to reset the starting setting.
The similarity between selection sort (and its derived products like bingo sort) and a bubble sort is that both are quadratic algorithms, namely their codes include a loop within a loop, or anyway a double scanning of the array to be sorted.
The difference is that a selection sort searches either for a maximum or a minimum and then puts it on top or bottom of the array swapping with the elements that are there, whereas a bubble sort makes comparisons between adjacent elements in the array and swaps these latter if the one before is bigger then its next.
It is true that also a selection sort in order to detect a peak (maximum or minimum) shall have to perform a comparison (the newly supposed peak against the scanned elements), but traditionally this is not judged as enough to consider a selection sort as a really comparison oriented algorithm: apparently, the fact the comparisons are performed in order to find out an overall peak makes the search for the peak as more characteristic than the comparisons. I specify this because I agree it can be a bit confusing.
Under a coding perspective the similarity is found in the presence of the double loops.
Yet whereas a selection sort implements loops that flush in the same direction, a bubble sort's loops go counterwise each other.
As both algorithms proceed, a sorted section and a section still to be sorted get produced. A selection sort by swappings leaves its sorted selection (which need no longer to be scanned) behind itself (say towards the lower array indexes), whereas a bubble sort by swappings leaves its sorted section before itself (say towards the higher array indexes).
Keeping this distinction in mind may help you remember how the two algorithms (at least in their more canonical versions) proceed and differ under a coding perspective, also at a time when you might have lost your full grasp of their inner logics - as it may somewhat commonly happen when/if you don't have to practice or code them any more for some time.
A SELECTION SORT LOOPING ENGINE:
A BUBBLE SORT LOOPING ENGINE:
Besides this, keep also in mind that whenever in these algorithms you see a loop starting with a variable, say $i, not with index zero but with index one
for ($i=1; /*etc...*/)
that's an old programming "trick" that normally implies the loop is going to be used to perform comparisons between an item and its previous one
[$i] versus [$i-1]
Since all comparisons between two contiguous elements are valid, both a forward one ( $i versus $i+1) and a backward one ( $i versus $i-1), a loop so devised performs them backward simply because it is more convenient doing so rather than performing the same comparisons forward ( $i versus $i+1) which would imply one has to check also if the end of the array has been attained (whereas obviously $i+1 would point to nothing any longer): in the absence of which further check an out of array boundaries exception would be triggered when $i equals the last element of the array and thus $i+1 would fatally be out of its scope.
This is why when comparing in algorithms, comparing backward is preferred.
Still more, you may note with a little study that, since all sorting algorithms that work in place (namely do not use auxiliary arrays to store the ongoing sorted array as a new one, but work directly upon the input arrays alone) produce during their workings a sorted part of the array while they deal with the still unsorted one, in the case of the two algorithms above the loop that isolates the sorted array from the unsorted one is always the outermost loop.
The PHP implementation of a bubble sort I here provide you with has been taken, without additions by me in this case, by Algorithms In Perl by Orwant, Hietaniemi, Macdonald.
It is a faster and puzzling version.
It is not described particularly well in the book (actually, it is not explained at all, just provided the code and a benchmarking that proves it considerably faster), and I too cannot describe it very well. I think the main idea of the authors was how to speed the bubble sort up. The idea they came up with is: detecting whether, besides the uppermost shrinking edge that the outer loop of every bubble sort takes care of, one could also shrink the lowermost edge so to reduce the amount of comparisons.
It led to an engine different, in its appearances, from the standard one I outlined above.
What is puzzling is the way it detects whether the lower limit can be shrunk. Such limit shrinks and also recovers its original size depending on the items that are bubbling downward meanwhile the biggest bubble upward (this latter being the reason a bubble sort got its name).
A good way to see why it works could be considering an array like this:
array(4,47,4,4,4,56,5,1);
Then inspect below the following syntax highlighted report of its inner proceedings. It is a set of prints derived from within a successful comparison that finds:
$array[$j-1] > $array[$j]
as a true condition.
$startIndex(=$j)= <= $lastIndex=7
Array
(
[0] => 4
[1] => 47
[2] => 4
[3] => 4
[4] => 4
[5] => 56
[6] => 5
[7] => 1
)
swapping: 47 with 4 namely [1] with [2] so:
Array
(
[0] => 4
[1] => 4
[2] => 47
[3] => 4
[4] => 4
[5] => 56
[6] => 5
[7] => 1
)
$newStartIndex=1, $newLastIndex=1$startIndex(=$j)= <= $lastIndex=7
Array
(
[0] => 4
[1] => 4
[2] => 47
[3] => 4
[4] => 4
[5] => 56
[6] => 5
[7] => 1
)
swapping: 47 with 4 namely [2] with [3] so:
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 47
[4] => 4
[5] => 56
[6] => 5
[7] => 1
)
$newStartIndex=1, $newLastIndex=2$startIndex(=$j)= <= $lastIndex=7
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 47
[4] => 4
[5] => 56
[6] => 5
[7] => 1
)
swapping: 47 with 4 namely [3] with [4] so:
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 47
[5] => 56
[6] => 5
[7] => 1
)
$newStartIndex=1, $newLastIndex=3$startIndex(=$j)= <= $lastIndex=7
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 47
[5] => 56
[6] => 5
[7] => 1
)
swapping: 56 with 5 namely [5] with [6] so:
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 47
[5] => 5
[6] => 56
[7] => 1
)
$newStartIndex=1, $newLastIndex=5$startIndex(=$j)= <= $lastIndex=7
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 47
[5] => 5
[6] => 56
[7] => 1
)
swapping: 56 with 1 namely [6] with [7] so:
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 47
[5] => 5
[6] => 1
[7] => 56
)
$newStartIndex=1, $newLastIndex=6$startIndex(=$j)=1 <= $lastIndex=6
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 47
[5] => 5
[6] => 1
[7] => 56
)
swapping: 47 with 5 namely [4] with [5] so:
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 5
[5] => 47
[6] => 1
[7] => 56
)
$newStartIndex=4, $newLastIndex=4$startIndex(=$j)=1 <= $lastIndex=6
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 5
[5] => 47
[6] => 1
[7] => 56
)
swapping: 47 with 1 namely [5] with [6] so:
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 5
[5] => 1
[6] => 47
[7] => 56
)
$newStartIndex=4, $newLastIndex=5$startIndex(=$j)=4 <= $lastIndex=5
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 5
[5] => 1
[6] => 47
[7] => 56
)
swapping: 5 with 1 namely [4] with [5] so:
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 1
[5] => 5
[6] => 47
[7] => 56
)
$newStartIndex=4, $newLastIndex=4
here starts an interesting backward bubbling!
$startIndex(=$j)=4 <= $lastIndex=4
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 4
[4] => 1
[5] => 5
[6] => 47
[7] => 56
)
swapping: 4 with 1 namely [3] with [4] so:
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 1
[4] => 4
[5] => 5
[6] => 47
[7] => 56
)
$newStartIndex=3, $newLastIndex=3$startIndex(=$j)=3 <= $lastIndex=3
Array
(
[0] => 4
[1] => 4
[2] => 4
[3] => 1
[4] => 4
[5] => 5
[6] => 47
[7] => 56
)
swapping: 4 with 1 namely [2] with [3] so:
Array
(
[0] => 4
[1] => 4
[2] => 1
[3] => 4
[4] => 4
[5] => 5
[6] => 47
[7] => 56
)
$newStartIndex=2, $newLastIndex=2$startIndex(=$j)=2 <= $lastIndex=2
Array
(
[0] => 4
[1] => 4
[2] => 1
[3] => 4
[4] => 4
[5] => 5
[6] => 47
[7] => 56
)
swapping: 4 with 1 namely [1] with [2] so:
Array
(
[0] => 4
[1] => 1
[2] => 4
[3] => 4
[4] => 4
[5] => 5
[6] => 47
[7] => 56
)
$newStartIndex=1, $newLastIndex=1$startIndex(=$j)=1 <= $lastIndex=1
Array
(
[0] => 4
[1] => 1
[2] => 4
[3] => 4
[4] => 4
[5] => 5
[6] => 47
[7] => 56
)
swapping: 4 with 1 namely [0] with [1] so:
Array
(
[0] => 1
[1] => 4
[2] => 4
[3] => 4
[4] => 4
[5] => 5
[6] => 47
[7] => 56
)
$newStartIndex=0, $newLastIndex=0Array
(
[0] => 1
[1] => 4
[2] => 4
[3] => 4
[4] => 4
[5] => 5
[6] => 47
[7] => 56
)
WHAT IS AN INSERTION SORT?
|
|
An Explanation
|
A Keith Haring painting
|
The insertion sort has similarities with a selection sort.
Please note: an insertion sort is not to be mistaken with an insorter by any means!
An insertion sort adds nothing ( nothing) to an array, it only reorders (sorts) the items already present.
Conversely, an insorter is meant to add elements into an already sorted array so that the newly included element does not alter the ordered nature of the array.
Like a selection sort it searches for a peak (minimum or maximum) using a double loop, and once found the peak it slices it ( unlike a selection sort, which swaps) and repositions it nearby a major element - whereas the traditional example is that of a «card player» slicing out a card and inserting it in the right place.
The difference from a selection sort is thus all, basically, in this slicing process.
I am aware Robert Sedgewick (and with him many others) provides quite different a code. But I am following here once again the ideas of Orwant, Hietaniemi & Macdonald in Algorithms In Perl.
Now, as for the statement:
array_splice($array, $i, 0, array_splice($array, $indexOfMinimum, 1));
to grasp how and why it works, keep in mind the specification of the Php manual about array_splice:
«array_splice ( array &input, int offset [, int length [, array replacement ]] )
(...) If replacement array is specified, then the removed elements are replaced with elements from this array. If offset and length are such that nothing is removed, then the elements from the replacement array are inserted in the place specified by the offset.»
So, that behaviour of the PHP array_splice (like in Perl) amounts to inserting a removed offset into another specific offset of the array: a "card player" extracting and inserting: insert & sort.
The reason I didn't choose Sedgewick's approach, then, is that Php lends itself better to this procedure as Perl does, and that an algorithm derives its participation to a family by its behaviour, not by how it implements it.
But then, why then the authors of Algorithms In Perl differ from Robert Sedgewick so much in this algorithm's respect, I don't really know for sure. But I can make my own guesswork: it might depend on how the Php and Perl array_splice function work.
I developed a possible variation, which maybe can have some utility at least for getting familiar with this family of algorithms.
The difference of this implementation of mine from a selection sort and from an insertion sort is that:
- It does not swap array entries directly (like selection sort does), but firstly it cuts off a slice, and then pushes (which the insertion sort doesn't) this slice on an edge.
- The fact an element is pushed onto an edge is, thus, a second difference; in fact a swap, like in the selection sort, doesn't push onto any edge, but interchanges two existing elements, and the default insertion sort produces two slices rather than pushing.
- Unlike the selection sort code and the insertion sort code, my pseudo insertion sort code hasn't an ending conditional check within the outermost loop, thus maybe it could be slightly faster - though both the selection sort and the insertion sort are considered very inefficient algorithms (insomuch as they are quadratic: double looping) for long arrays with high disorder.
The reason why in a selection sort, which as we saw loops in the same way as a insertion sort does, such conditional check works whereas if the same conditional check gets introduced into my insertion sort that would make the latter fail in its ordering purpose (I tried that), is not entirely clear to my eyes yet.
It has arguably to do with the fact that, since an insertion sort moves rather than swap an entry, that moving implies a reindexing of at least part of the array (the one following the sliced item) as a consequence of the slicing process.
If so, the reason it is not clear has probably to do with the fact one would have to find an appropriate instance of incoming array to make this reason (more) apparent for educational purposes at least.
At any rate, the implementation I provide here is nearly identical to the canonical bingo sort one, but it includes a difference: both the loops do not loop the incoming array upward but downward. Why this?
I simply thought that by doing so it would have come more natural using push rather than unshift (another way to add to an array edge), and that push, adding to the growing end of the array, could be inherently faster than unshifting which, adding to the starting edge of an array, would implicitly require a whole reindexing in the background of all the entries following that start off edge.
That is, I thought that moving backward, leaving behind would seem more natural than throwing forward - whereas leaving behind in a forward loop means unshifting, not pushing - with the reindexing consequence just mentioned.
Of course, one could throw forward. But my being revolts against a loop that, going forward, puts more stuff forward. It would still stop, but I just don't like such ideas: I find them "inconvenient" in the same fashion as in a Jane Austen book a character could find "inconvenient" and reprehensible a social behaviour.
I called this snippet of mine straightFlushSort: keeping in mind the traditional image of the card player that extracts a card and inserts it elsewhere, I am that type of card player (a poor one, maybe) who extracts a card and firstly puts it to an edge, and makes the others follow suit this edge while he extracts them, thus producing the order: like a player arranging a straight.
It is clearly a mixture of the insertion sort of Orwant, Hietaniemi & Macdonald, and of the bingo sort implementations I provided.
WHAT IS A SHELL SORT?
|
|
An Explanation
|
A Henri Matisse painting
|
Named after its creator Donald Shell, a shellsort is similar to a bubble sort, only rather than performing comparisons between contiguous elements, it performs comparisons at a fixed distance between elements: starting with the array's breadth itself less 1, it then races downward shrinking the distance between the compared elements.
It seems being faster an algorithm than a bubble sort is.
How should these distances that are spiralling downward from the array's breadth (less 1) towards the unity (and not zero) be? Should they follow some pattern?
The fact is, these distances start with the breadth less 1 and end with the unity, but then whichever path they follow is just fine as long as the same distance is never repeated twice or, more specifically, as long as each new step is inferior to the previous one till the unity is attained. No paths with bumps, but just smoothly degrading following a pattern whatever as long as it degrades to the unity.
So, basically, you need kind of a mini algorithm that tells you how to produce these shrinking distances (you could envision them like steps of a ladder, but each at a different distance from the other), and then follow such path riddled with distances, applying them to the sorting process in order to perform the comparisons between array elements placed at those distances per round.
We can say the set of such downgrading distances performs as a track, or a map, or a ruler, along whose directives the algorithm runs.
Such map or set of rulers does not appear as another internally built array storing the values of the steps.
Rather, it has to be a mathematical formula (of your own liking, as long as it works) that satisfies the condition (attains a peak, can decrease towards number 1) which first produces the starting offset and then repeating backward the chosen mathematical formula arrives to the end.
Of course
«No series is always the best: the optimal series must be customized for each input. Of course, figuring that out might take as long as the sort, so it's better to use a reasonably well performing default.»
[Algorithms With Perl, Orwant, Hietaniemi, Macdonald]
Yeah, you are right. Me too: I do not like it at all. It seems rather convoluted, moreover it's not just quadratic (loop within loop) but even cubic ( three nested loops), and yet it is reported as somewhat faster than the faster bubble sort can be.
Taking the example from Robert Sedgewick's Algorithms In C (yet mine is more complete, in Sedgewick's example a set of swaps is entirely skipped), we can conveniently imagine, for illustration purposes, as the argument array a set of chars aligned in a sentence:
a s o r t i n g e x a m p l e
whose corresponding array numerical indexes are:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
We also imagine that our distances are:
13 4 1
When at map 13, we compare positions 0 and 13 namely 13 elements excluding the leading including the ending:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
which means we compare a with L.
Swaps occur only if the first compared element is bigger. This is not the case so they are left untouched.
The second pass shifts onward the length 13: it still fits and thus now intercepts s and e. Being s bigger, they get swapped. We now have:
a e o r t i n g e x a m p l s
The next pass could not use 13 as a ruler any longer. We switch to using 4.
With a rule such as 4 we can compare staying within the array boundaries:
a e o r t i n g e x a m p l s
A chain set of swappings is performed between all the elements that appear bigger than the next intercepted by the ruler, because the ruler is fit to be included within the array's length more than just once:
a e o r t i n g e x a m p l s
a e o r e i n g t x a m p l s
a e o r e i n g t x a m p l s
a e o r e i n g p x a m t l s
Shifting onward the ruler from the starting offset, it still fits:
a e o r e i n g p x a m t l s
This triggers further comparisons and swaps among the elements that are bigger than the next ones intercepted by the ruler:
a e o r e i n g p x a m t l s
a e o r e i n g p l a m t x s
The ruler 4 can shift onward again, and can now compare:
a e o r e i n g p l a m t x s
Which causes the following swaps:
a e o r e i n g p l a m t x s
a e n r e i o g p l a m t x s
a e n r e i o g p l a m t x s
a e n r e i a g p l o m t x s
a e n r e i a g p l o m t x s
a e a r e i n g p l o m t x s
We now enter the ruler of length 1: which compares each letter with each letter, swapping recursively each time a letter bigger than the one on its right edge of the ruler is found, and whenever such condition applies also after a swapping. I leave to you to imagine the rest of this nightmare. The eventual result is:
a a e e g i l m n o p r s t x
Here is your Php implemented shellsort, whose unique difference from Sedgewick is that it runs from the end of the array towards the beginning rather than the other way round.
A Henri Matisse painting
|
|