SCRIPTING:

Cori Lane
Cori Lane
MULTI BINARY AND BINARY SEARCH. CASE INSENSITIVE BINARY SEARCH. COMPLEX MATRIX/OBJECTS BINARY SEARCH. SUBSTRINGS BINARY SEARCH
You may know a binary search is a procedure to scan an Array of even million entries in less than 10 moves, to locate a specific entry. None the less, such search speeds up an Array scanning so dramatically only if the Array has been previously sorted, namely has undergone a previous ordering alphabetical process.
My Multi Binary can speed up a loop an average 25%, without the Array having been sorted at all!

The code for a traditional and yet more powerful Binary Search is provided too.
May 2002
{ @ }
The model above is Cori Lane
LOADS OF CORI LANE ON THE NET


WHAT A MULTI BINARY IS AS COMPARED TO A BINARY SEARCH
Multi Binary can emulate what a binary search does, but without any need for the Array to have been sorted first!

Let me start stressing one important thing: I see at times programmers besieged by requests by their boss to produce the code asap: as soon as possible.
I'd take my chance here to gently remind you that programming is not a matter of productivity, but mostly a matter of fantasy and patience: wait for an insight, which is sent by fantasy, and which just can't be drawn out by an order: as Jon Bentley points out in his well affirmed Programming Pearls:
«Good programmers are somewhat lazy: they sit back and wait for an insight rather than rushing forward with their first idea»
And for more on this, see also the quote by Bentley at this section of the current file.

Now, there are a few ways to loop up the speed of a loop, but all of them should be undertaken only in one case: whenever the Array to be scanned has a length which goes beyond some common limits.
In JavaScript Arrays are normally just snippets, but it all widely depends on how much power you put in your scripts; whatever the case, it may not be uncommon to find Arrays of 1,000 entries.
We can say that each process whose aim is to speed up a loop cannot do too many complex things on the Array it scans, for the more the conditional statements and operations you nest in your loop, the slower it can get. Therefore the main thing power loops do is to locate specific entries.

Well, my multiBinary is an original approach, meant to speed up loops (as well as another script of mine called hasher is, and which can be found clicking here): in fact the classical binary search (and I provide you here with a javaScript implementation of a binary search as well, although that is not my invention, whereas multiBinary is) divides each Array in slices and if it finds at the middle of that slice a range within which the searched for item is located, it shrinks there, otherwise divides by an half again: a process that in less than 10 moves can locate an entry on a one million long object (a similar process is called insorting, whereas with an insorter instead of locating an entry, you search for the point where you can insert an entry: an insorter at United Scripters can be found clicking here).
All such processes named after either binary search or insorting heavily rely on the fact the given Array must (absolutely must) have undergone first a sorting process which listed it in alphabetical order: that's the condition upon which a process based on dividing by halves an Array can safely locate an entry.

Conversely, my multiBinary can speed up by an order of 100% or at times even 300% the search for an item on an unsorted Array. It therefore can return either the position index of the first item found matching the given searched value, or an Array which collects all the position indexes where such value has been found in the Array.

My concept was: given an Array whose length is remarkable, say 50,000 entries, you can divide it in a set of smaller segments; the default length of such segments is 50, but you can make them longer or shorter by passing the arguments; in our case, 50 would divide the 50,000 entries long Array into 1000 smaller Objects.
On each of these objects my multiBinary triggers what I call (another relatively cool feature) a double edge zip scan, namely one counter scans its lot of 1000 entries from the top down while another scans it from the bottom up and at most they stop midway (at most means: unless they find a match first!); at the same time all the other subsets of 1000 entries would be double edge scanned as well.
No wonder this may make you locate an entry position with a speed dramatically higher than a simple linear loop. Of course, if perchance your searched entry is in the very first dozens leading (head) positions in the 50,000 entries Array, probably a linear loop may appear faster still: but it doesn't appear such any longer if your given entry is at, say, index 37,508.
None the less, the meaningful speed up can be experienced only if you search for one single entry; in fact, if you search for all the entries that match the given searched value, then the speed up advantage as compared to a traditional loop gets lost: in fact in order to collect all the instances of a given value instead than stopping at the first met, the script cannot return the result as soon as it finds one but has to loop the whole array as well as a traditional loop: therefore, its advantage gets lost, for in within each scanned entry, the traditional loop has to do nothing whereas the multiBinary has to update all its counters for all the fragments the input array has been divided into. None the less, it is still faster if it gets run to search only one item (that is exactly, moreover, what mere binary searches search for), but I let the feature which allows you to collect multiple entries as well: better one feature more than one less, after all.
I think it is an interesting approach, which might really live up to its promises only with quantic computers though, which are capable of handling simultaneous processes.

You pass to the multiBinary script its arguments as follows:

/ / / / / ARGUMENTS
/ / / / / array
Just the input array object. if none or it has no length, the function returns null
/ / / / / find
The value to search for in the Array, maybe a Number, an Object, a String (in such case in between quotes obviously); if no find argument gets passed, the function returns null
/ / / / / continuous
If it is not passed or passed as zero, the function returns either:
  1. Number if successfully found an item in the Array which matches the find argument, and such number obviously represents the index position of the matching item in within the Array
  2. null if no match is found.
Conversely if continuous is passed (for instance as number 1), the function returns either:
  • Array whose each entry is the numerical index value of the input Array where an instance of the given find argument was located. In other words, continuous allows you not to let the script stop at the first found instance!
  • null if no match is found.
/ / / / / subset
This default value is 50. But if you pass it, the script will divide the input array in segments whose length is equivalent to the given subset value.
If such value is lower than zero or higher than the amount of available entry, the script would force it to be equal to the array length (which would boil down the issue to a traditional linear loop).

Obviously, it is possible the amount of entries cannot be evenly divided by such subsets. In such case the script produces the last subset as a bit lengthier or a bit shorter as much as needed to accommodate the possible remains, and moreover such last snippet would be looped autonomously: I chose so in order to avoid nesting in the main loop (the one parsing all the other perfectly evenly divided subsets, which almost necessarily are to be the majority) further additional conditional checks and statements that would have slowed down the whole when parsing long Arrays.


CODES & TEST FORM
MultiBinary codes and the TestForm for it.
The simple Binary search has only the code but is not featured in the test form since it is a procedure that has not been devised by me

I prefer focusing on what's new, in the Test Forms, instead than on mere implementations of already widely publicized procedures which can be found documented on a variety of books, therefore the Test Form tests only for the multi binary script.

To assess whether the speed is higher, do not ran in continuous mode, then consider changing the subset amount if the timing seems higher than the traditional loop (the Test Form, in fact, can compare these two timings!).
On the whole you're to discover that about 50% of the times the multiBinary is much faster (on a 200,000 entry long Array and with a subset of 500, it can score 20 milliseconds against 680 of a traditional loop), whereas for the other 50% it can be slower, but not to the same high percentage it can be when it is faster. On the whole, whenever you have to deal with very long Arrays and you're searching for one specific item, you may want to consider the multiBinary like a well pondered bet: when you gain, either you gain a lot (at times much over 300% as you noticed in the example above) or a little (at least some 10%), and when you lose you lose little (at most 150%. average loss is 30 or 40%). A whole balance seems providing you, when you run it many times on a variety of array lengths and different subsets, with something between a 20% and 35% of overall gain. So worth it, if your task is repetitive and on long Arrays.
Test it:
  1. Without continuous mode
  2. Changing the Array length
  3. Changing the subsets amount
  4. Shuffling the positions where the two entry carrying the text "Hallo World" (for which we search for in our simplified Test Form shell) are positioned at.

/ / / / / THE TEST FORM
/ / / / / THE MULTIBINARY CODEZ
Cori Lane

Internet Explorer: with comments? »
/ / / / / TEST IT
Step 1: produce a LONG array first.
Randomly introduce in it some text to find.
Array length:
"Hallo World" randomly inserted at pos.:

[you can also custom them by hand]
Step 2: set Parameters and run multiBinary
continuous? » | subset »
If you want to see this multiBinary run up to 3 times faster than a normal loop, please make sure the continuous checkbox above is not checked.

This Test Form can also give to you the times the multiBinary script may take to parse the Array and fetch the matches as compared to the time a normal linear loop would take. If you also want these tests being run (would take a few more seconds) as soon as you click the TRIGGER TEST button, please
check this »
Automatic Shuffling on Trigger? »


RESULTS:
Matches of ["Hallo World"] at:
TIMINGS IN MILLISECONDS:
MULTI BINARY score:

is
overall:
SIMPLE LOOP score:

is
overall:
Overall of overall: on Runs:


TRADITIONAL BINARY SEARCH
A javaScript implementation of a Binary Search

As I already stated, this code below is not an original idea but mostly, though not uniquely, a javaScript implementation of a traditional Binary Search: the reason I'm providing you with this is that I believe a website dealing mostly with javaScript could also provide a few implementations of scripts originally meant for other languages (moreover, as far as I know no website to date has a binary search javaScript implementation. So one more reason); none the less, the meaning of this website is to some degree the reverse of this: providing you with original scripts and algorithms written in javaScript but that as such can be easily translated/implemented in other languages by simple syntax swappings (at least those scripts which are not Dhtml geared).
The following implementation has been crafted, in its basic draft (though I added on it much more options as you'll see), after the one featured on Mastering Algorithms in Perl by Orwant, Hietaniemi, Macdonald, O'reilly editions: a complex book that I never succeeded in finishing but that surely features a deep approach to algorithms although Perl oriented.

Let me stress, for all those using copy and paste snippets and that, having committed no time to face a scripting riddle, and who therefore think each script they find is "such simple a task", that in order to develop "such simple a thing" like a binary search seems, it has taken 20 years. I find a variety of guys, at times relatively affirmed professionals, that on news groups spend their valuable time seriously mocking at beginners (at times even not so much beginners, actually: but the former invariably assume the latter must be, ya see) "oh such simple a thing, and you don't know how to do it!"; I believe they have no clue: as Jon Bentley says, again in Programming Pearls:
«[this codes have been used] precisely in their 1981 Software tools in Pascal to move lines within a text editor. Kernighan reports that it ran correctly the first time it was executed, while their previous code for a similar task based on linked lists contained several bugs. This code is used in several text processing systems (...) Ken Thompson wrote the editor and this reversal code in 1971, and claims it was part of the folklore even then.» [chapter 2]
and:
«while the first binary search was published in 1946, the first binary search that works correctly did not appear until 1962.» [chapter 5]
In order to use this script you must be aware of the following:
  1. A binary search can produce the right results only and exclusively if the inspected Array has been sorted, that is: is in alphabetical or numerical order.
    I still find at times some scripters that although experienced wonder what I mean when I say that simply sorting an Array doesn't wort it right: well, it means that if you just say: Array.sort(), the contents would NOT be arranged AT ALL in the order you may expect: in fact:
    • Capital letters (if they are letter) would all go either before or after the lowercase ones, with the consequence that a capital B may appear listed... before a lowercase a!
    • Numbers starting with the same number would be associated: therefore 1, 2, 14 would not be listed as such but as: 1, 14, 2 !
    To overcame this you must pass to your sort() javascript method an argument, which must be a function, to sort your Array in the expected way. All such snippets are available, with further documentation on how to use them, at United Scripters on this file (they are the four snippets at top of the page named after the shared string "highest").
  2. You pass to the function as first argument the already sorted array.
  3. You pass to the function as second argument the item you want to find a match for. if a string, in between quotes as usual.
  4. A third parameter, that normally binary searches allow for not, flags whether the search must be case insensitive: to make it case insensitive pass this parameter as 1, otherwise as zero.
  5. A fourth optional parameter, that normally binary searches allow for not: if you pass this as zero, the binary search will seek for a full match (default, "typical" binary search). If you pass this as number 1, the binary search would look for substrings starting with the searched string and spanning throughout the length of such searched string; example:
    var items=["Cambodia","Cameroon","Canada","Cape Verde","Cayman Islands","Central African Republic","Chad"];

    alert(binarySearch(items, 'ca', 1, 1));

    This would return number 3, for that is the first array element that a binary search, halving the input array, can find starting with "ca".
    You should note that such a match is a partial match, that is: a substring of the input array entries is checked, whereas the same check but passing the fourth parameter as zero rather than as 1 would have yielded null because the default check would not be for matching substrings but for matching whole strings.

    So, if an Array contains more than one instance of the given match, the binary searchers have a tendency to return only one somewhere midway found instance: if you have even an oversimplified example array like [5,5,5,5,5,5], the returned index number would be 2, though of course an instance is also at index zero, 1, 3 and so on.
    If you want to inspect the surroundings, you may first find such an instance index, and then make micro loops around such index unless you find a higher or a lower number in order to assess the boundaries of the item.

    I tried to implement a recursive binary search to find out the boundaries too.
    It can be done and it seemed a cool idea to me and I implemented it but I ended up with an obvious fact that showed that this idea was actually pointless, and that none the less escaped me at first.
    Since to recursively find the boundaries you have to invoke the recursion making sure that, once found an index, the next searches must start the one from one index lower than that of the last found match (to find the floor of the range) and the other from one index higher than that of the last found match (to find the ceiling of the range), this is nearly the same thing as looping the array item by item, as soon as the range narrows enough: and this fact jeopardizes, or renders moot at least to my eyes, the efficency gain sought for.
  6. There is, a feature, again typical of this script of mine, namely a fifth optional argument.
    If your input array you want to be searched upon is not made up of simple strings but is for some reason a matrix, namely an array of arrays so that each of its entries is an array on its own turn, it is arguably ordered after a specific entry.
    A banal example:
    var items=[
    ['Albania', 'ALL BOOKED', 'Nice country'],
    ['Algeria', '3 VACANCIES', 'Dry wheather!'],
    ['Germany', 'ALL BOOKED', 'Lots of beer'],
    ['Ghana', '4 VACANCIES', 'Colourful clothes!'],
    ['Venezuela', '1 VACANCY', 'Nice beaches']
    ];

    You may notice we have a global array named items whose each entry is an array on its own turn: the purpose of such objects is that they can thus host more connected and related data (in our silly example, hypothetical rooms availability status for a travel agency and a small country "descriptions"), and the whole is clearly sorted (in this example at least) after the first entry of each nested array.
    If you are performing a binary search onto such an object, then pass to my binary search the third parameter as the numerical index (though it would work as well with associative arrays whose indexes are strings!) in which is hosted the element that the whole structure has been sorted after, in this case index 0 (for in all records the country name is in the first entry, and javascript starts counting indexes by number zero not by number 1), say:
    alert(
    binarySearch(items, 'Germany', 1, 0, 0)
    );

    That would return 2, meaning the second entry in the passed searched array hosts at its slot #0 the searched for string 'Germany'. You can then pass that entry to, arguably, another script for further data manipulations - booking a room, perhaps?...

    Note that all these optional parameters that I intorduced in the binary search do not degrade the performance, for a binary search by default makes a few dozens checks on even millions of entries to find the match, thus inserting three conditional checks in a binary search loop raises no real performance issue.
  7. Actually I have included another original variation in the binarySearch: since normally you'd have had even a further requirement (namely that the array is not just sorted and sorted right, but even that the lowest values must be at the beginning and the highest at bottom of it), I slightly modified it so that it can guess on its own whether the highest values are at top or at bottom (say: A-Z or Z-A?) and perform accordingly. otherwise only in the A-Z order a binary search would have worked.
    Interesting, don't you think so?
  8. The function returns either null if no instance found, or a number which is the numerical index of the array where the instance has been pinpointed.
/ / / / / THE BINARY SEARCH CODEZ
Bridget Riley's painting