A quick briefing on how this file was born is to be the best vehicle to make you understand what it is for.
Once upon a not too far away a day, a friend of mine asked me if I was capable of helping him with a task apparently no other programmer had been able to provide him with.
Let me stress anyway that this file does
not deal with the so called Bible Code as described in a somewhat famous book by Michael Drosnin, although I do not exclude that I may sooner or later device also a function that may process text inputs with a scanning analogous to the one described in his book: but this fule currently is concerned with different type of processes.
The task was the following: having in a word file format (
.doc extension) chapters of the Bible (case in point: the Genesis) in the
original hebraic font, he needed to convert all those fancy chars in a set of pre determined numbers. Such numbers had to be assigned beforehand accordingly to a conversion table, and the aim of such a procedure was to enable my friend to analyze the bible accordingly to an esoteric approach which derives the analogies among biblical words by verifying whether the sum of those letters turned into number would match - and the alike.
The main challenge I immediately faced was the following one: foreboding of a fight between
charCodeAt and
fromCharCode javaScript built in methods, I noticed that in the word file all the chars weren't latin, but hebrew: now, for a web file this can be a challenge indeed, because html files can show only a
limited amount of chars (basically the latin ones) and even if you have a fancy font in your set of available fonts, this still isn't the equivalent of drawing these chars from an external source to pour them into an input field.
Moreover, I suddenly realized that in the word version of my friend there were sort of "
hidden" chars that although
did not show up in any visible form, none the less
they were there indeed: posing the cursor at the beginning of a letter whose length was apparently of, say, 5 letters, and shifting onward the cursor one char by one with the "arrow" keystroke on my keyboard (nearby the numerical keyboard pad), would immediately prove that the chars were actually, say, 12: there were some
invisible chars that none the less a computer could discriminate as such also if their human visible embodiment was...
nothing.
Also, such chars were
not accents or punctuations.
I immediately realized that if I were to arrange such a script, I had to make it
as much flexible as possible: if I were to make it, it should have been flexible enough to
accommodate whatever alphabet and
whatever conversion table and whatever grouping of letters, plus it should have been able of skipping, if instructed, those chars that weren't in the table; alternatively, I also thought of enabling the script,
if instructed, to arrange a new correspondence table
on the fly in case it met a non included char and this time the user wanted to include it as well.
The outcome has been a trove of utilities that firstly let you play around with chars, and later let you arrange a table of comparisons of whatever type you want, and that can both stick to such table or make it on the fly while the conversion process encounters the chars: in such latest case, namely if no comparison table is provided or if such conversion table is not all inclusive, the script would start assigning numbers from 1 onward to each new letter it finds, and each time it finds it again, it would convert it into the already assigned number.
What you have to understand in order to use this file competently, is that
any letter for a computer
isn't but a number.
Thus, a letter like an upper case A is for my computer the equivalent of number 65.
Computers consider all chars as numbers, for they have a comparison table of their own (do not be fooled, I'm now talking of a built in comparison table all computers have and which is called
UNICODE, which is an
international standard: I am no longer
now speaking about the comparison table for the bible we were previously talking).
This means that a computer sees
numbers,
not chars, and for each number it prints on the screen a corresponding symbol accordingly to the Unicode ( or better:
ANSI ) table.
What if the computer meets a char numerical code it has no corresponding symbol to print on the screen? Well, normally it has a default symbol for these missing cases, which may at times appear like a tiny empty (or on some platforms black filled) rectangular shape.
Thus if your computer meets an hebraic char whose UNICODE corresponding number is, say, 65,900 or 67,408 or whatever, it
still discriminates they are two entirely different things,
but lacking two different symbols to print them as different items, it may show to we humans two identical rectangular shapes:
do not get confused: the computer still
perfectly realizes they are two entirely different chars (char code numbers, actually!) and they have to be dealt as such: only, it cannot
print them down as two different symbols.
Thenceforth, some of these tools may show up strange chars apparently all equal, and none the less the computer do knows they're different codes in the background!
So, on the whole, all the tools on this file rely on the following procedure:
- You manually copy from another file the chars in the fancy font they've been written there.
- You paste these copied strings in the input fields of this file (although many fields here would have default input values, just in order to make more intuitive your understanding, that show latin chars: this does not mean they would not accept whatever other type of chars! They would!).
- You will be facing then a set of apparently different fancy chars or a whole set of apparently identical small rectangular shapes, as previously warned. This is all right, do not believe this is a malfunction!
- You can now convert all of them into your favorite alternative chars (in my friend's case: numbers!) by using the tools of this file.
So on the whole, here are the things this file does:
- Provide (paste) a set of chars, find out their numerical char codes
- Perform a cabalistic sum (sum all the numbers of a figure as if they were single digits [2309 becomes 2+3+0+9=14], and yield the sum of these singled out digits as an eventual number. Fit to cope with long numbers too).
- Perform a cabalistic reduction (sum all the numbers until an eventual one digit number is returned [2309 becomes 2+3+0+9=14, 1+4=5], or an eventual limit in the length of the returned number set by yourself is attained. Fit to cope with long numbers too).
- Perform a full set of transformation of an input text given a comparison table that can be build accordingly to a great deal of flexible possibilites, which are accounted for in the technical section that describes the arguments apssed to the two main functions committed to this last point tasks (this section is below here - click).
Before I introduce the form for the main tasks of this universal alphabet converter, I'd explain to you the inner workings of the two main functions that are behind it. Of course, if you're not interested in technicalities you may skip this section, but giving a glance at it is going to help you use more competently the converter sported further on on this file.
The whole file relies on the creation of a
comparison table which would accept as inputs the
chars of
whatever alphabet you provide and which would consequently match and
transform each thereby given char with the a set of other given char you want it to be turned into (and which you're to provide next as you are to see) any time the parsing of a text meets such char.
So, on the whole the function to make this comparison table gets in your chars provided (if they're other than latin) you
copy such chars from a rich text format file (word
.doc documents and the alike say) and then you
paste them into an input field which would pass such list of fancy chars to the
makeTable function. Then you insert in another input field the list of chars you want to be
swapped with the former ones any time they are met - and of course the two strings of chars have to respect the
correspondences you mean to be in place: example (with standard latin chars now):
ABCDE
12345
That would mean that any time a capitalized E is met, it would be converted into number 5 because 5 is on the same column of E.
Actually, as you're to see, when you pass the letters you're not necessarily required to divide them with a white space or with whatever splitter symbol you may choose, although you can do that and instruct the
makeTable function to split the input list of chars accordingly to the splitter you prefer.
None the less, it is somewhat mandatory that when you pass
numbers or in case numbers are
involved, you
do divide such input string of numbers with a
white space or whatever splitter you may prefer, provided you indeed divide the numbers and chars with such divider.
In fact if you don't and your numbers are, say, 12 and 3, then writing 123 instead than 12 & 3 would be moot: in fact, does
123 mean or or does it mean or even ?
So do include a splitter to produce a safe form: a white space may do:
ABCDE
1 2 3 4 5
The function would make on the fly from such inputs an associative array whose each entry
index would be the
UNICODE numerical char code of each char (the script would find them on its own, don't worry), moreover converted into a string (this to avoid generating very long arrays - for more on this issue: see
this file - click), and would match each of such entries of this associative array either with the corresponding numbers if you've provided along also a string of numbers (or of chars or of mixed items), or if you've not provided such a second string, the function would assign by default to each char/element a number starting from 1 and increasing from there.
If by chance you provide two strings of chars of
different lengths, and
consequently the two strings would
mismatch, keep in mind the function loops the
first list of chars and for each of them checks whether there is a match in the second provided list of chars (or numbers!) to convert the former into the latter: if the latter is missing, or some instances are undefined, the script by default would assign to the former char which didn't find its match, a number which is either increasing from 1 onward or, if some number have been previously met in the line of the second list of chars, it would assign by default a number one digit higher of the higher among all the numbers previously met... wow.
This function has the following arguments:
- chars: the list of input elements to be transformed. It can be a string or an array.
- nums: the list of input elements the previous ones mentioned above have to be transformed into when met; despite the name of the argument, this list can accepts further chars as well and not only numbers. It can be a string or an array.
- charsAreCodes: pass it as number 1 in case the chars argument is a list of numerical char codes and not of letters or numbers. You may have noticed there is in the section above an utility to derive charCodes from input chars or numbers. Obviously if you pass such argument as zero, it does nothing.
- numsAreCodes: pass it as 1 to mean the argument nums is a... list of charCodes as well! If you pass it as zero, it does nothing.
- charsSplitter: splitter for the first argument; if none, defaults to nothing and splits letter by letter. If you mean to split with a white space you do have to pass an empty string as argument: example: " ".
It cannot be composed of more than one char.
- numsSplitter: splitter for the second argument; see previous point.
It cannot be composed of more than one char.
- optionalWhitespace: manages the white space of the input chars as an isolated special case: in fact at times it might be somewhat unsafe to split with a white space if... the white space too has is indeed a char you what to account for!
By passing the white space here as a empty string " " (or whatever it may be) you thoroughly bypass this potential problem if it is of concern for you. Keep in mind that if you passed the charsAreCodes argument, then also this argument must be the corresponding charCode of a white space in the input alphabet.
If argument charsAreCodes is 1 (true), then also this value must be the corresponding numerical charCode and not the char as such.
- whitespaceReplacer: it is the replacer (the converted item, say) for the white space. Keep in mind that if you passed the numsAreCodes argument, then also this argument must be the corresponding charCode of a white space in the conversion alphabet.
If argument numsAreCodes is 1 (true), then also this value must be the corresponding numerical charCode and not the char as such.
Here is the function:
Now I describe the second and last function before we switch to the playground for this file: the second function is named
alphabetConverter and performs the following: given a comparison table, it parses an input text and converts it accordingly.
It is somewhat the executive branch of the former
makeTable function.
It has a few peculiarities of its own that can be described accounting for the arguments it wants:
- input: it is the input text which you want to convert.
- compareTable: it is the table, arguably generated by the previously described function makeTable
- stickToTable: if passed as 1, the script would convert from the input only and exclusively those chars that have a correspondence in the comparison table, bypassing all the rest which it may encounter and which was not accounted for in the comparison table!
If such argument is not passed, or it is passed as 0 or false, the script would work as follows: as soon as it meets a char which was not accounted for in the passed comparison table (even in case it is empty or not passed at all!) it would build a comparison table of its own on the fly (or add entries to the passed one if incomplete!) assigning to each new chars never previously met either a number from 1 increasing onward, or a number from the highest number found in the comparison table onward, if such a number can be located.
- originalWhitespace: the white space as it is in the input text: in fact, note that what is the corresponding char code for a white space in the latin set, is not the same char code for the white space in another type of font!
- whitespaceReplacer: what is meant to replace originalWhitespace.
- whitespacesAreCodes: passed as number 1, this argument tells the script to consider the passed arguments whitespaceReplacer, originalWhitespace as both charCodes! Namely those two arguments have to be numerical codes.
- replaceDoubleWhitespaces: this feature works only if you have passed the argument originalWhitespace. If you pass replaceDoubleWhitespaces it would remove from the input all the white spaces (exceeding one instance) found in the input and matching originalWhitespace: in other words it replaces things like, say, " " with " ".
- forceWhitespaceReplacing: if passed as 1, this argument forces the script to rewrite inside the passed comparison Table the entry corresponding to the white space with the entry passed to this function (alphabetConverter).
- forceNRReplacing: it is important to pass this argument as 1 in case you're dealing with inputs either drawn or meant to be printed on a html file or most significantly on a form field: in fact in this way you can account for a few "hidden" chars typical of web forms that are the new line (\n) char, the carriage return (\r) char, and the tabulation (\t) char. The script does it by default, but if you pass arguments up to this level, I do suggest to you to always pass this argument as a number higher than zero to flag it is passed.
- caseInsensitive: when you pass the input chars to a table, there are cases like most importantly the latin alphabet where you may want to convert chars into the same number regardless whether they are uppercase or lowercase: by passing this argument you mean that the input must be parsed under a case insensitive approach. Actually, I do suggest to you always to account for every type of possible chars when you make your comparison table, and therefore not to rely on this case insensitive feature. It is in fact good programming practice to say the least remembering that "a" and "A" although apparently the same thing to a human eye, still are for a computer two entirely different things (charCodes in the background, that is!) as much as A and Z are.
- separator: you may remember that above I've suggested to you to pass numbers in the comparison table always separated by some divider (arguably, a white space) to discriminate whether,say, 12 is 12 or 1 and 2.
This issue affects, potentially, the output too in case the comparison table is instructed to transform all the input chars into numbers: printing on the screen 34 would still make us wonder: is this 34 or 3 and 4?
Therefore the script by default adds a white space between each char printed in the output (to override this, pass this argument as something, even an empty string such as "". But I guess the default behaviour is ok), while by default separates word by word (the originalWhitespace argument mentioned above) with a forward slash ("/").
WARNING: there is
one case when Netscape
4 may be unable to parse correctly fancy chars pasted in a textarea, if these chars are generated by clicking the button you will see below named "Auto-Insert", and which would insert char codes and not directly literal chars: to override this Netscape shortcoming, copy and paste
manually the alphabets to generate the comparison table.
This behaviour is due to the fact that a syntax like:
String.fromCharCode(61537) on Netscape
4 returns as a char always a
question mark, whose code proves then being 63 if such char is checked for its charCode on Netscape 4. This is puzzling, and certainly a Netscape 4 shortcoming, because if you copy from a rich text file a char whose unicode is 61537, and you paste it in a Netscape 4 textarea, then Netscape doesn't show a question mark but... the letter "a". Therefore it is certainly a problem with the inner workings of the Netscape 4 engine, because ya see: the representation of one same thing cannot be two different things!
Netscape 6 gives no problem, and obviously none Explorer.
Given a pretty bad Opera bug at least until Opera 6 these script can't work on Opera (as said: at least until version 6) - for more on this bug:
see this Opera knowledge base file
Here is your function: