SCRIPTING:

Adriana Lima
SELECTION SORT ALGORITHM AND GOLDBERG VARIATIONS ON IT. SCHWARTZIAN TRANSFORM. BINGO SORT.
Given a relatively simple algorithm as the Selection Sort, it is explained how it works and its main features are outlined.
Once we have its feature, we try to apply combinatorial variations on it and verify what type of different data gleaning may derive.
The role of alphabetical orders is dealt with in relation with the Schwartzian transform. An example of a Bingo Sort, which is a non trivial variation on the selection sort, is provided.
October 2005
{ @ }
The model above is Adriana Lima
LOADS OF ADRIANA LIMA ON THE NET


WHAT IS A SELECTION SORT?
A description of the traditional algorithm and of the approach we are after here

A David J. Nightingale photo
A beautiful David J. Nightingale photo
A selection sort is an algorithm (the simplest of its kind, not the more efficient) whose purpose, as its name suggests, is that of scanning and sorting incoming data after an order - traditionally from the smaller to the bigger (whence we could already hint a variation: from the bigger to the smaller).
A selection sort, though considered inefficient, is still the better choice when these conditions apply:
  • As Robert Sedgewick points out, when the records that have to been ordered are very big in filesize: performing the algorithm only one swapping per each record, if the records are of many kilobytes this algorithm assures that the RAM memory is occupied by one big file at a time only.
  • When you have small arrays, whose records are arguably not ordered in the least, and they are unlikely to be duplicates.
Please note that all sorts introduce an order within the (arguably) disordered incoming data, but such order does not exist as dictated by natural or cosmic laws: all orders are orders imposed by implicitly presumed look up tables that determine what comes before what, and what comes after what accordingly to a mere convention.
In fact even an alphabetical order isn't but an arbitrary order, neither good nor worse than any other alternative arrangement that might have been possible, and whose unique characteristic that makes it special and adopted is that it is an order (better: a pattern sequence) agreed upon.

Max Jammer in his book "Concepts Of Force" mentioned that:
«In ancient Egypt (...) force as personified by 'nht' is not only force in the sense of violence and ferocity, but includes already at this early stage an element of order and morality»
Force is not to be seen necessarily as destruction that causes disorder, a disarrangement of the affected elements, but as a factor that can impose a defined pattern shape too to objects otherwise unrelated by binding them, making thus an order possible.
Such order is therefore betrayed as not implicit, but enforced.
Van Hayek (an economist, by the way: because algorithm logics do not insist upon programming fields only) reminds us of the distinction between the greek term nomos (from which the term: norm) which is an exogenous order imposed by a rule, and the greek term cosmos where the endogenous order blossoms spontaneously from within (think of universal gravitation).
So, there are natural orders, and arbitrary orders.
A sorting algorithm sorts after an arbitrary order that became a standard by convention, no matter how much we are used to consider it customary and as a given.

Still Max Jammer elsewhere ("Concepts Of Space") declares:
«replacement of the content of a vessel by another content reveals that place as something different from its changing contents and so proves the reality of space. (...) not only that locality or place is a reality but also it exerts an active influence.»
An Andres Serrano painting
An Andres Serrano artwork
The spatial preeminence (whence a hierarchy) of a location upon another was originally derived by the events that took place in it and by the emotional charge related with such events: it is the latter that consecrated that place, singling it out as meaningful from a spatial continuum of intervals otherwise all identical, meaningless and indistinguishable.

Sorting is a process that currently dons a mathematical helm, and yet its origin is still this magical one not a mathematical one (Isaac Newton has been dubbed by some authors not as the first scientist, but as the last magician), namely that of singling out a hierarchy via a significance that was originally determined by the emotional charge assigned to a specific station in space (think of milestones, aleph, sacred precincts).
This fact is still visible in the hebrew alphabet where the supernatural or propitiatory nature of each letter constitutes a legacy that has not vanished entirely yet.
Nowadays we are used to consider as a given fact that the letter b comes after the letter a: this goes as obvious; and yet the origin even of such alphabetical hierarchies was magic, therefore neither mathematical nor obvious.
Death and nativities made places assume an order, and out of a barren or modest landscape you can have the detour of a pilgrimage.

So, being this order arbitrary, a sorting process isn't but a pattern making process, and all algorithms, including the sorting algorithms as a selection sort, aren't but the application of a pattern after a look up table that directs the ordering rank. They are still just a pattern making process to whose eventual result we bestow a dedicated name: sorted. This does not change the fact that we just applied a pattern: a pattern with a given name.

The characteristics of a selection sort are:
  • It is implicitly presumed the existence of a look up table of the alphabetical order. This is never mentioned, but it is real.

    This becomes particularly evident with the so called Schwartzian transforms (Algorithms In Perl, by Orwant, Hietaniemi & Macdonald), a «caching technique» which sorts and at the same time yields a look up map for future use (long duration caching as opposed to temporary caching).

    Consider a term like: McDonald and Macdonald, or terms like opal-shaped, with the dash, as opposed to opalshaped: they are identical and yet a sorting algorithm might consider them as different. None the less, we might want them to appear as contiguous, namely see in an eventual list the term MacDonald followed by McDonald and not interposed with something like, say, MacEric.
    As Donald Knuth says:
    «Newly coined nonce words of English are often spelled with a hyphen, but the hyphen disappears when the words become widely used. For example, people used to write non-zero and soft-ware instead of nonzero and software; the same trend has occurred for hundreds of other words. Thus it's high time for everybody to stop using the archaic spelling e-mail
    However, we still have, and we'll probably always have, this trend in action whence two identical words are spelled as two different lexical units and yet they are still identical and only a minor portion of them changes (a morpheme, in our example represented by the dash or, as in the case of MacDonald versus McDonald, represented by an interposing letter a).
    A smart sorting algorithm should be enabled to consider them as identical despite the minimal morphologic variation.
    A Schwartzian transform is a digital procedure designed to deal with this issue. It produces look up tables where terms like e-mail are associated to email and thus both are dealt with as having the same level of sorting priority. As hinted, such look up tables of equivalences, are built on runtime, and then can be either stored or discarded as soon as they got used.

    An order is therefore imposed, not derived.
    The Schwartzian transform works as:
    map - sort - map (incoming array)
    We analyze it as everybody analyzing an algorithm should do: from within towards without rather than the other way round.

    The innermost (say rightmost) mapping takes in an input array of words, whereas arguably morphologic variations are in place. It then evaluates each incoming element of the array (each word, that is) applying to it a user defined mapping procedure. This mapping, in our example, should strip off the dashes. Such evaluation produces a growing list of associated pairs, the original word with the dash, the same word without the dash, example:
    ['opal-shaped' , 'opalshaped']
    This map is the look up table that can be cached once completed, or thrown away once used by the next algorithmic steps.

    The above mentioned and newly generated list is then passed (we're "climbing" leftward with our analysis) to the sort declaration.
    This sorting procedure should be instructed to sort the incoming words looking after the above generated mapped equivalences
    ['opal-shaped' , 'opalshaped']
    sort $a->[1] compareWith $b->[1]
    where the index [1] means: compare the word a with the word b not looking at them but looking at their mapped equivalence in the array pair:
    [0]='opal-shaped', [1]='opalshaped'
    thus taking into consideration the incoming words after their non dashed versions, so to ignore morphemes as they got stripped (or at any rate dealt with) by the innermost mapping.
    In this fashion, we can have an order no longer implemented after an alphabet, but after a sort of a custom (smarter?) alphabet where
    opal-shaped is to be regarded as identical to opalshaped
    in the very same fashion two other incoming and "fully" identical words such as say water and water would be, and sorted and allocated as adjacent regardless of their morphemic variation(s).

    The outermost map takes care of this: the middle sort process returns as sorted not the words but the whole pairs:
    ['opal-shaped' , 'opalshaped']
    therefore the outermost map simply returns as a sorted list not the pairs but the first element of them, which from the innermost mapping we know it was the original word.

    This description is an elaboration of mine rather faithful, though maybe more long winded for clarity purposes, to the concepts that we can find in the mentioned book Algorithms With Perl.
    Yet that book doesn't mention a fact that to my eyes is important; that is, I don't know whether the book doesn't mention what I am about to mention because it is a wrong deduction or just because it forgot it.
    But to my eyes it seems apparent that beside mapped pairs like
    ['opal-shaped' , 'opalshaped']
    we'd also have mapped pairs like:
    ['opalshaped' , 'opalshaped']


    Here is an important case (with a PHP implementation) where a Schwartzian transform makes the difference.
    take for instance an array like:
    array('McEric', 'MacDonald', 'McDonald', 'MacEric', 'McIntosh', 'Macintosh')
    If you sort it via a standard php call to sort you would get:
    MacDonald MacEric Macintosh McDonald McEric McIntosh
    Note that for instance MacDonald and McDonald are separated by MacEric and Macintosh. On its own turn, McIntosh is separated from Macintosh for the same reasons namely being the latter not written omitting the first a (Mac).

    With a Schwartzian transform especially geared for detecting the differences between "Mac" and "Mc", additionally ignoring the uppercase and lowercase variations, you may have an order that, though no longer strictly alphabetical, none the less it may be the user friendly order that an application may desire.

    Here is a set of PHP function for this type of Schwartzian transform (the code includes an interesting comment within a function named cmp):


    On the same example array, you could invoke it as follows:
    $anarray = array('McEric', 'MacDonald', 'McDonald', 'MacEric', 'McIntosh', 'Macintosh');
    $foo = map1( isort( map2( $anarray ) ) ); /*a Schwartzian Transform*/
    If you now print $foo:
    print_r($foo);
    that would print the following order:
    MacDonald, McDonald, MacEric, McEric, Macintosh, McIntosh
    Compare it with the order without the Schwartzian transform which as we saw was:
    MacDonald MacEric Macintosh McDonald McEric McIntosh
  • «In place»: which means the algorithm is not allowed to store whole arrays of partial results or indexes of the ongoing process with the intention to use them at a further iteration.
    All the incoming data rearrangements must be brought about by manipulating the incoming data itself without allowing any extra space for any extra storage such as, say, temporary slices of the rearranged input awaiting the eventual outcome. When this happens, only one element of the incoming array should be used, and never multiple elements.
  • There must be a loop and another loop within it (so called "quadratic" algorithm).

    The outermost loop scans the array.
    At each element it scans, the innermost loop stars from that offset and scans the array as well
    for($length=sizeof($array), $i=0; $i<$length; $i++){
    $indexOfMinimum=$i;
    for($i2=$i+1; $i2<$length; $i2++){//etc.
    So if the outermost loop is currently on the, say, 4th element, the innermost loop scans starting from the 5th element included and onward: whenever this innermost loop finds an element that is smaller than the one currently engaged by the outermost loop, it stores its index.
    Exiting the innermost loop, if a smaller element has been detected, it gets swapped with the offset upon which the outermost loop was suspended to give way to the innermost loop.
    The outermost loop now passes to the next element of the incoming array, and repeats the above till all elements have been scanned.
    The result is an ordered array.
  • During the iterations, a minimum or a maximum (we could speak of peaks) within the scanned data must be identified.
  • The dispatching (allocation) of the identified maximum or minimum (peaks) in the input data so to produce the rearranged (sorted) output is performed via swappings of the identified peaks with slots of the incoming data located elsewhere.


SELECTION SORT CODES
The Php implementations of a selectionSort and of a few variations on the theme.

A David J. Nightingale photo
A beautiful David J. Nightingale photo

As first I produce the code for a standard selection sort:


A variation on it detects the maximum rather than the minimum, thus producing an inversed order:


Another variation on it loops from the top of the incoming array rather than from its beginning, thus producing an inversed order:


Another variation again cumulates the last two variations above, yielding a result identical to the default selectionSort:


Eventually, we have one of the most interesting cases: a selection sort where the alphabetical order is no longer the default one but where you could custom it.
Though the code still assumes as default a rather conventional alphabet, still the point with the following implementation is that you can impose an order of your own for the sorting process, somewhat like we did in the Schwartzian Transform.

The code is a class.
It is such because it needs an auxiliary function that is invoked within this custom selection sort, whereas such auxiliary function (method) performs comparisons to detect the minimum looking up a custom table that determines, in such custom version, what type of new priority has been assigned to each char.

The name of the class is: selectionCustomSort.
You run it via an initialization such as:
$foo = new selectionCustomSort(/*arguments here*/);

The arguments that must be passed are as follows:
  • $array:
    the input array to be ordered. It can be an array of numbers, of chars, of words, of all the above mixed.
  • $order:
    either an array of chars or a string. If a string it will be spit.
    The order in which the chars are listed in the array is the one that will be assumed as the look up table to determine the new hierarchy in the chars. So, for instance, in such argument the letter x could be listed before letter say k and the sorting process will be delivered keeping in mind this new set of priorities.

    This argument is case sensitive, therefore both capital and lowercase chars should be taken into account.
  • $treatNumbersAsNumbers:
    either passed as zero or as 1 (which is the default).
    If it is kept as the default namely as 1, numbers will be listed in their traditional order (but not letters or words!).
    If passed as zero, numbers too will be dealt with as if they were chars (String data type namely), and therefore if numbers are treated as Strings, 100 would (somewhat paradoxically but not erroneously for this type of task) precede 11 because in strings 'baa' would precede 'bb' and if numbers are treated as Strings, they will be treated as such indeed.
  • $orderSplitter:
    defaults to an empty string. It splits by such value the $order argument if detected it was passed as a string rather than as an array, thus making an array out of the string.

The class stores the results in a class variable named result.
For this custom selection sort, we can make an example using the default alphabet first:
$output = new selectionCustomSort(
array('water', 'air', 'fire', 'dolph', 'bb', 'earths', 'grass', 'earth', 15, 'water', 'aqua', 'zbz', 'acua', 11, 'za', 'baa', 100, '101', 'zd', 100, 'bab', 207, 'zz', 110, 'acue', 204, 'zzz', 'zz', 112, 10, 'zb')
);

print( implode(' ', $output->result) );
which would print:
10 11 15 100 100 101 110 112 204 207 acua acue air aqua baa bab bb dolph earth earths fire grass water water za zb zbz zd zz zz zzz
But if you custom the $order alphabet to something where, say, z comes before b
$output = new selectionCustomSort(
array('water', 'air', 'fire', 'dolph', 'bb', 'earths', 'grass', 'earth', 15, 'water', 'aqua', 'zbz', 'acua', 11, 'za', 'baa', 100, '101', 'zd', 100, 'bab', 207, 'zz', 110, 'acue', 204, 'zzz', 'zz', 112, 10, 'zb')
,/*next argument now passed:*/
'_-.,:;!?()[]{}*+/\=|^°%$~#0123456789 @aAzZbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyY'
);

print( implode(' ', $output->result) );
that would now print:
10 11 15 100 100 101 110 112 204 207 acua acue air aqua za zz zz zzz zb zbz zd baa bab bb dolph earth earths fire grass water water


OTHER SELECTION SORT RELATED ALGORITHMS
A few known algorithms derived from the selection sort

A David J. Nightingale photo
A beautiful David J. Nightingale photo

An algorithm that derives from the selection sort is the so called bingo sort. I agree bingo sort is an ugly name, but it wasn't me who chose it.
A relatively canonical definition of bingo sort may be this:
«Definition: A variant of selection sort that orders items by repeatedly looking through remaining items to find the greatest value and moving all items with that value to their final location. This is more efficient if there are many duplicate values.

Note: To see why it is more efficient, consider one value. Selection sort does one pass through remaining items for each item moved. Bingo sort does two passes for each value (not item): one pass to find the next biggest value, and one pass to move every item with that value to its final location. Thus if on average there are more than two items with each value, bingo sort may be faster.» [source - NIST]
For this reason, bingo sort is regarded nothing less than the most efficient algorithm at tasks like, for instance, sorting out mails by zip codes (because arguably several mails share identical zip codes) in small counties or towns (that is: not exceedingly long array, with probably several repetitions of each value in it).

Anyway you won't find online codes for a bingo sort.
For some reason, it seems that anytime, at least till today october 2005, you look for a bingo sort all you get are academic assignments of tasks so to produce one yourself and happily graduate (or get a tidbit closer to it).

These assignments, in one of its versions, may sound like:
«In this lab, you'll write a method to sort a list of integers in-place using a bingoSort. A bingoSort is just a variation of SelectionSort that works well when you have a lot of duplicate entries. After finding the next biggest number to put in place, instead of swapping just that one entry, go through the entire list looking for all occurrences of that entry. Every time you find an occurrence, swap it into place.(...)

As always, just because it works doesn't mean it's good. Part of your grade will also come from things like modularity and understandability. You will also be graded on the neatness, presentation, and style of your program code. It's important to comment your code where appropriate and to do little things like space things properly, use readable indentation, and also to make sure the overall design and logic of the program are coherent.» [source]
A description of bingo sort may sound like:
«The coding of this algorithm is a little tricky, but not too bad compared to some of the more complicated searching algorithms we have studied. It uses some way of finding the minimum [or maximum, my note] value in the array; generally a function like max/min. It will also use a function to interchange the two, but this does not necessarily have to be the case. It will execute a while loop between these two functions that looks through the array for all elements equal to the least element. A nested for loop then takes control and works through the array for the next non-least element value to interchange with.» [source]
With so sad a lack of code to look upon (not a new thing for me, though), in this specific case I cannot assure you that what I produce here calling it bingoSort is precisely a bingoSort: it is rather what, arguably or probably, I would have devised if I would have had to undergo the above mentioned examination, and its complexity does not look much worse than the provided description: a while loop with two nested for loops one of which searches for all the equivalences of a given maximum.
Is it it?
The only code I ever found, and which I did not base my own code upon (although it includes, like mine, a while loop and two nested for loops. Yet it also includes one more loop, external to the while loop, that my code doesn't: good sign, or bad omen?), was located here.

If I got a good mark or a bad one, you'll have to ask your own professor when you present my code as yours because you found you're late with your homework this week. I take no responsibilities! If you got a vaguely decent vote, email me: it is gonna be our secret.
Anyway, if this can be of any consolation, rest assured that it took me many hours (and a good sleep between them, which is not immaterial), for it implied a few tricky things that one could easily overlook.



As far as I can see, there is another way to implement a bingo sort, this latter exploiting some specific PHP built in functions like array_search.

This new procedure actually builds a new array, but since each new item added to it is sliced off of the original input array, the overall effect should be even.

None the less, in order to use array_search, the next implementation I feature violates the rule of acting in place (that is, of producing no duplicated copies) in at least one point: it creates one temporary variable named $minor that holds the value of one single array entry at each iteration. Thus your program should allow for the space necessary to accommodate once the biggest element of the incoming array.

Though it may be considered a (minor) violation of the conditions, yet note that also in the book Algorithms With Perl the selection sort algorithm stores in a temporary variable the value of an entry at each round.
In any case, the following PHP version is undoubtedly worth the mention despite this minor infraction, because it could be faster (relying on a built in function), because of its brevity, and because it still does not produce auxiliary arrays but stores just one duplicated element:



A way to work around that aforementioned violation would be to use a pointer.
The following version uses such pointer upon the variable named $minor; thus the algorithm should now be a fully valid PHP bingo sort, vastly relying upon PHP characteristics:


In regard with bingo sort, you may want to have a look at some variations reported at the file about bubblesort.