|
|
|
|
|
|
|
|
|
INTEGRAL ANAGRAM OF A STRING
|
|
Yielding a thorough anagram of a string may be a daunting task. This script does it perfectly and fully, and can also sort or sift the outcome. Beware: a 8 chars string has... 40,320 possible anagrams! Anyway, you can set an output limit!
|
|
January 2002 |
|
{ @ }
|
|
|
|
|
WHY AN ANAGRAM
|
|
Anagrams and their limits
|
I know at least three reasons (although I bet there are many more) an anagram can be used for: - For (maybe not so trivial) fun
- To build up a dictionary of similar words
- To crypt texts (we have a whole section on crypt, see our index page)
What you must be aware of is that:
- a given string can produce an amount of anagrams which follow a mathematical increasing ratio which is amazingly and dramatically steep: you can find how many anagrams a given string produces using my small table below:
You will probably be surprised.
- Therefore you must be aware that you can rapidly meet physical limits which are not depending upon the script conception but upon the strict mathematical logics that belong to anagrams.
- The good news is that anyway I provided the script with a way to limit the output to the number you prefer: in such way you can get a consistent set of anagrams even in case you have to renounce to the one trillion thing for practical reasons. Moreover, such scope limiting (if set: if not set, the script would yield all the possible anagrams: but to get one trillion anagrams, well you can have an overseas vacation while your pc works...) would randomize the anagrams: this means that if a input string would generate 20,000,000 anagrams and you set a limit of 500, every time you run it, it will give to you a different 500 anagrams set! I call it cool.
If you really want to fully anagram a long string, I suggest to you to break it into smaller parts and then pass these to this script: you can try combine the outcomes, in case you really need some anagrams for a long string but you don't have a powerful enough computer (my isn't) to relinquish... a trillion anagrams in less than one month, you see! - If your string includes two or more identical chars (like it may happen with the greatest frequency, actually), you may get some anagrams that, apparently, are identical. I understand this may be puzzling: they are not.
A script which can produce all the given anagrams belonging to a string, like this script is, would never yield a double result: each given char is handled like an independent slot with its own index, and is manipulated by that mathematical index granting two identical combinations cannot occur: none the less, since some slots may hold the same letter, there we go: some combinations may appear identical, and none the less according to the logics of the computer they are not at all: when you find again, say, a letter "x" showing up again in position y, that's not the same x as before, but the second x letter which was present in the given input string. - Of course, we have also set a procedure to sift out these false and apparent copies: but what I want you to understand is that when we exclude them, we're not making an operation which has a mathematical sense: we are making an operation which has purely a human linguistic sense (not that I have anything against it!).
To sift out these false double we took avail of a script we already produced, namely I used the function I named hasher() and whose explanations and great utility is fully explained in this file (this means such hasher function has to be included in your script!). You will need as well the two small snippet of code called highestFirst() and highestLast() which can be found clicking here. Eventually, you need also our dealEntries() function that can be found here
So please, let me repeat: before using this function be sure you have included the following functions or be warned this algorithm would not work:
SCRIPTS TO BE INCLUDED All this bountiful (©) stuff for free don't remove website script comments as my only "reward": isn't fair a deal? |
|
|
- highestLast()
- highestFirst()
GRAB THEM HERE
|
dealEntries() GRAB IT HERE
|
YOUR CODE
|
Your code, excluding the hasher() script which you can view clicking here and the sorting small snippets which you can view clicking here and the dealEntries() script which you can get clicking here
|
A WORKING EXAMPLE
|
|
A working example with a few safety measures (alerts if you risk too long a recursion) Best viewed listening to Schumann's Piano Concerto
|
HOW THE SCRIPT WORKS
|
|
The way I have deviced
|
I try to explain how this works: I do not believe you can grab its logics soon, but if you think it over eventually you will:
- The validation: you can pass to the function also an array of chars: in fact, if you pass to it a string, it will convert it anyway into an array whose each entry is a char of the input string.
- A first loop (we avoided recursive approaches for they can easily exhaust a browser's memory stack in javaScript) iterates through each entry of the input (by then already converted into an array)
- We initialize an local array named pickLatest whose length is equal to an array named output,initialized outside all loops; you will see soon why.
- This output array is initialized as a matrix: that is: an array of arrays: an array whose each entry will be... another array: the intention is to store on each row of output, an array which collects all the combinations, so that we would ultimately yield an array whose each entry is an array consisting in all the chars arranged in one single combination.
- When we start the loop (2) we push (this means we add to the tail of this output entry) one char collected by the input string loop (2).
The first run the output array contains no data, so if you passed as a string, say, word "hallo", the first char pushed in the output array would be an h. The idea is that every time we iterate we build something like this, which is the backbone of the loop:- h
- ha
- hal
- hall
- hallo
Each of such stages can produce its own anagrams, correct? I mean, entry "h" at first run has no anagrams (the anagram of one single char is the char itself), but as soon as we have, for instance, "Ha" (two chars or more, that is), well we have possible anagrams; simpler instance:- Ha
- aH
and so on. Therefore, the idea is that each time we add a char, we make a new set of anagrams for the current substring, and when we restart the loop, to each entry produced by this anagram substring we add the new input char, and we go on anagramming this latest result until exhaustion of all input chars. This means that the anagram will grow with an exponential pace! Of course, when we have produced the latest anagrams, we can drop the old ones: I mean that once we have HA and AH we add HA+L and AH+L, therefore we have HAL and AHL like our new subset which must produce its own subset of anagrams, and therefore we can remove from output the HA and AH entries: this is what the snippet of code:
var e=1;
for(var i=1;i<=Inp+1;i++){//Inp+1 = current length
e*=i}
output= (!limit)?
output.slice( (output.length-e)):
(output.length>=limit)?
dealEntries(output,limit)[0]:
output.slice( (output.length-e))
is meant for! Variable e keeps track of the current input level we are at, and deduces the amount of relevant anagrams we have produced at each stage by making the exponential multiplication according to the current input level we're parsing: once we know this exponential number, we can slice (cut off) from the output backbone array all the lower entries that (from top to bottom) doesn't belong anymore to the current amount of relevant anagrams: we cut off for instance, like outlined above, Ha and aH for from this couple we can build two bigger couples which will yield 6 more subsets, which when we add another input char to each of them would then yield another even higher amount of snippets, and we cut off the previous ones for the latest ones include all the previous combinations already!! In a few words, we anagram lot by lot, and we store the results adding the new input char only to such results cutting off all the rest by deduct the exponential (e) level we're currently at.If you provided a , then we slice off accordingly to its value, verifying first that the output length is already at least long as the limit amount specifies.
- What happens in between this initialization (2) and the last step (5) is simpler: we push in each entry of the fore said subset the new char available from the input (we grab the input chars one by one, that is). pickLatest variable will store a copy of each current entry in the output array once we have pushed in it the new char: such copy is necessary. We want to anagram each of these entries, but if we manipulate the output array directly, we would find we have to add each new anagram to the output itself, with the consequence we'd enter an infinite loop: looping an array whose length is increasing while we loop it, would produce an infinite loop very easily.
- The anagram section is easy and I did invent this swapping procedure:
var j=pickLatest[A][X-1]
pickLatest[A][X-1]= pickLatest[A][X]
pickLatest[A][X]=j
for each entry of pickLatest (which holds copies of the current output) we swap entries, doing it backward (I found this is safer, but here my reasons fails: I confess I did it like rain dance, without understanding clearly why this happens: backward produces the correct anagrams, a forward loop... would not! I dunno why, not yet at least). You see the swap logics is easy: the entry which is to be swapped is assigned to a temporary var (j), therefore now can be swapped: imagine: ABC if I want to swap C with B I cannot say B=C and C=B for once B=C saying C=B So you have to store B somewhere temporarily , like j=B, now you can say B=C, C=j ! As you may notice, it is a self fulfilling syllogism.As soon as one of these swappings has been performed, that's a new anagram itself! So we store it immediately like a new entry in the output backbone array.
- The anagram described in point above implies:
- We browse the whole output array as it has been curtailed so far (5), and for each entry we make a full cross of a letter through it and store the string each time one single swap has been performed while performing the full crossing.
This double nested loop (output rows, then each entry) cannot be avoided: we're stuck to this nesting of loops, no way out in my humble opinion. - We have to avoid pointers: beware: whenever you assign an array to another variable, such as
var foofoo=new Array("hallo") var foofoo2=foofoo you are in a mined ground: in fact the new foofoo2 object is not at all independent of the foofoo original object, but whatever change in one of the two objects, would reflect in the other object! Therefore when you need to make a copy of an array in javaScript you have to make it in the hard and boring way: looping one array and populating entry by entry the other one: this grants independence of the newly generated object. keep this in mind. Without this setback, the script would have certainly been shorter. - Last: we give the options of the outcome:
- sorted argument: if you set it we sort the output so it will be alphabetically ordered:
- sorted=0 means no order
- sorted=1 means sorted in increasing order
- sorted=2 means sorted in decreasing order
-
noSift argument: if not set, we eliminate the copies form the output, as we said in the first section there could be false copies in an anagram (read above).
To sift we use the hasher() function that you can see in this file: for the reasons explained there (which I encourage you to read if you still here) the hasher function pre pend a "#" sign, therefore when we want to grab the output entries using hasher() we use the for-in loop as described in the said file, and therefore we have to exclude the # char. This is quickly made getting only a substring which starts from entry 1, for entry 0 (the first) would be #. If you don't want to sift, set noSift to 1
- limit argument: sets a limit to the output: if you set 500 and also sifting, be aware that the outcome might be inferior to 500, for siftings removes the so called "doubles" if any.
If you set a limit and don't sift, remember that if this script is run without sifting AND without sorting, relinquishes the output in the fashion of an array whose each entry might appear, on a flat screen, separated by a comma. - By the way if you want the output on different lines, do:
myvar=allAnagrams(myWord); myvar.join("") whereas the \n means that all entries are arranged separating them with a ewline char.
- Possible invocations:
- allAnagrams("hallo") //(default) sifts, but doesn't sort
- allAnagrams("hallo", 1) //sifts and sorts
- allAnagrams("hallo", 1, 1) //doesn't sift, but sorts
- allAnagrams("hallo", 0, 1) //doesn't sift, doesn't sort: moreover, returns object as ! By the way: !
- allAnagrams("hallo", 1, 1, )//same as above (#3, could have been #1 or #2 or #4), but this time a limit of 500 is set!
|
|