The following section explains some technical ideas working inside these scripts. If you're
not interested in them just skip this section.
While writing an article, I came to rethink these functions and I realized at least three elements could be changed:
- A couple of conditional statements within the previous versions were not really necessary: I included them because, as you may have noticed, I am over-cautious: when we release into a promiscuous environment such as the whole public of the internet a few functions, you never know what type of mischief a third part could do while using them.
Thus my tendency to forestall "uncompetent" users led me to a couple of moot statements that were not really needed.
- A misconception has been fixed: within the function I included an if conditional statement that, though not detrimental in the least for the validity of the output, was indeed useless. That has been "fixed". Sleepless nights can sneak into codes these things.
- An attempt to overcame initializations of strings or numbers like:
new Object("String here")
has been dropped: the functions are thus faster, though they don't detect any longer this type of exceptions. Yet, these exceptions are not something an algorithm should really take care of, but are something a matrix should not include in the first place: strings should never be initialized as Object Data Types!
- Eventually, the arrangement of the original functions is so powerful, that it was possible also to implement a wealth of similar new functions, recently added here: functions that not only scan but also find elements within an irregular matrix.
So it is not that the original functions were bad, it is that they were good enough to lead to a dramatic set of improvements.
These functions deal with matrixes (though would work also with simple linear arrays).
A matrix is a
data structure where there can be a variety of arrays nested within each other.
Such object could sport a regular pattern in their skeleton, yet the assumption we'll hold throughout the scripts is that we don't know if such patterns are actually there, and if they are we don't know what their recurrences are, and/or most likely they don't have any regularity in their structures at all.
This problem engrossed me since ever for you're more likely to find this type of objects when scripting for web documents, rather than structured trees such as
binary trees and the alike: the
soundenss of this fixation of mine and the truthfullness of my assertion can be proved true when you all of a sudden face a tree of Html or Xml nodes: they are nearly undefined and "unstructured" by definition: who knows how many tags the guy included in that section, isn't it?
So, the problem boils down to this, basically: having a HTML or a XML document or a DOM set of deeply nested nodes, collect and report all the eventual data reached at the edges of each ramification.
As said, he structure of the tree is assumed as unknown by definition. No pattern is anywhere to be foreseen.
Obvious variations of the algorithm must be able to report not just all the eventual edges (instance: an Array typically carries relevant data only at its ends), but also the whole set of traversed nodes in between, because they could raise to significance (instance:
childNodes are all valid objects for further manipulation, regardless of their final or interlocutory position).
Additionally, for all the algorithms a map of all the trails undertaken (that is, the traversed indexes) in the process is required as well.
This would allow not only for reporting for instance the edges, but also for re-finding where they were located in the original input object/matrix; which of course is not of immaterial importance.
For associative matrixes, namely matrixes whose indexes can be of mixed Data Type and not merely of the Numerical Type, both maps are required: the one carrying the path of associative indexes, and one of numbers, emulating the sequences of the associative indexes orders; so it would be possible to convert an associative matrix into a numerically indexed array whose index-progressions would faithfully reproduce the scheme of the original associative version.
This is not trivial an accomplishment.
I didn't yield to the vanity of making the algorithm recursive, namely a function returning itself. Recursion is the main source of
Stack Overflow errors in javascript, and DOM nodes could flock in by the thousands, making overflows nearly a certainty (in the test form further on, running the scan dom function on one table yields as many as nearly 3000 nodes in this very same document).
Thus, we'll convert whatever recursive exigency that may be likely to arise, into a conventional loop.
To implement these objectives, we grab a stack.
A stack is an on-the-fly built up Array whose lifespan will be interested by the following phenomena: whenever we push a new element on its head (I consider the head its increasing index entry), upon revisiting the Array the last pushed in element is the one that is going to be popped off:
LIFO, acronym for
Last In First Out. By the way its conceptual opposition is the
FIFO queue:
First In First Out.
For the algorithm to work, each element of this stack is bound to include information on the currently pushed element, and not just the element itself. Thus each entry of your stack cannot contain one item, it must contain more because it must allow for information fields; and we're to find out that three slots suit our case.
var stack= new Array( new Array(0, 0, 0, new Array()) );
As you see, each stack instance is an array of as many as four information fields, the last of which is an array once again.
Let's feed the first two entries with a dom node or whatever object with a length; our placeholder name for this incoming generic tree is 'node', and we assume we can know its length (array.length, childNodes.length):
var stack= new Array( new Array(node, node.length, 0, new Array()) );
This is just the seed to feed our algorithm.
The runtime loops through the stack by a
while cycle.
while(stack.length){}
the stack, having being fed, has a starting length of 1 (one nested array), so the loop ignites.
Its only currently available index for its last (and for now only) entry (LIFO) is thus zero, or:
var currentStackIndex=stack.length-1;
You know that
stack[currentStackIndex][0] is an object.
Now, let's wonder whether stack[currentStackIndex][0], namely the first entry of the array nested in the currently inspected index of stack, is an Object (array, node):
if(typeof( stack[currentStackIndex][0] ) == "object")
Also, has this currently engaged object a length, and is thus another valid object (array, node) on its own right? Do not mistake the length of the
stack by the length of the stack
entry currently piled on it simply because at the
beginning they coincide.
Remember that we have passed such length as the second entry of the nested stacked element, so our question now boils down to this twofold one:
if(typeof( stack[currentStackIndex][0] ) == "object" && stack[currentStackIndex][1]){}
If the above is true, we never met this branch (object, node) before.
We're on an object with a length of branches; we never visited this current object's branch, so we now need to know the index of the first entry on this branch. Most assuredly, it is given by an operation like:
stack[currentStackIndex][0].length - stack[currentStackIndex][1];
In fact, the first is the object itself and we're drawing out its length, and the second is the length of such object itself once again, but referred as we passed it to the
stack. They are the same only
for now alone. The tautology is only apparent, as you'll soon see.
Now, would you dare contend such a difference yields zero now, and that such number is precisely the index of the first offset of this input object's first branch?
So, store this index on a temporary variable:
var nextObjectIndex=( stack[currentStackIndex][0].length - stack[currentStackIndex][1] );
Thus, since that is an Index, the actual object (branch) entry which is indexed by such number is exactly:
stack[currentStackIndex][0][ nextObjectIndex ];
Now, let's do the
critical trick, which you'll understand shortly: let's decrease by 1 the numerical value of the length of the current object and which is stored in stack[currentStackIndex][1], so:
--stack[currentStackIndex][1];
Don't forget what we have done.
Now, you have a third entry in each nested array of the stack that you have never used yet, and which you initialized as a mere number zero, remember? it is:
stack[currentStackIndex][2]
We will use it like a flag to specify that this dish on the stack has already been visited, by setting it equal to 1:
stack[currentStackIndex][2]=1;
And now finally push on the stack the new object, remembering each stack element must be an array and taking care of populating it with the three fields it requires:
stack.push( new Array( stack[currentStackIndex][0][ nextObjectIndex ], stack[currentStackIndex][0][ nextObjectIndex ].length, 0 ) );
Now, imagine this part of the algorithm in action:
In fact assume an Array is of 5 entries. I put this array5 on the stack.
array5LENGTHmemorize=array5.length
stack.push( new Array(array5, array5LENGTHmemorize, 0, new Array(0)) )
Then I start inspecting its 1st entry, namely [0]:
I firstly decrease the second entry by one:
--array5LENGTHmemorize
which thus now holds 4.
Then I add the first object found at index [0] on the stack:
behold: I put on the stack this latest object:
stack.push( new Array(array4, array4LENGTHmemorize, 0, new Array(0)) )
Now to simplify say the recursion at the next iteration whereas it inspect this newly produced items finds this object is the end of a branch, say a string or whatever: it pops it off the stack, as you'll see, and saves it to the output.
We
thus come back to the previous element in the stack array5. How can you know you must now
not inspect
again index [
0] but you'd start from index [
1]?
Well, "simple"...:
I just interrogate what
array5.length-array5LENGTHmemorize
returns, and since the length is 5 and I
previously decreased array5LENGTHmemorize by one thus making it equal 4, guess what: it tells me... 1!
There we go: grab index [1], and soon after obviously decrease
array5LENGTHmemorize by one again to repeat the trick the next time the popping out dishes bring the level of the searched stack back to this plate!
At each round you're thus sure that you don't consider twice an already exhausted branch.
Thus, the algorithm adds on the stack only objects, and upon revisiting an already visited object, thanks to the operation
--stack[currentStackIndex][1]
is smart enough to understand it has to grab the new subsequent index it has not scanned yet, and not once again the visited one(s).
We're left at this stage with two options:
1] we meet an entry which is not an Object data type and has no length both (instance: String, Number).
2] we meet an exhausted stack branch - which has such still does is an object (array) and does has a length, but we simply scanned it all already.
For these cases, we arrange an
else statement matching the main
if statement; an else statement where all items that are no longer object Data Type, or whose branch length is exhausted because all sub-branches have been visited already, are delivered.
In fact, remember that if a stacked object has reached the condition:
stack[currentStackIndex][1]=0
which represents its currently scanned length, it won't qualify to enter the 'if' conditional statement
again.
That will invariably happen because each time we re-visited a stack entry we declared:
--stack[currentStackIndex][1]
Thus we can deliver each entry which has attained such conditions to an 'else' statement, within which we perform what follows: we firstly verify whether the till now forgot stack[currentStackIndex][2] element has been set to 1. This happens when a branch offset has already passed in the jaws of the algorithm.
The reason we do this is to avoid collecting data which is not final, but which is, conversely, exhausted branches; exhausted stack branches have no longer a length stored into stack[currentStackIndex][1], though
still being objects: thus they would not enter the main 'if' statement, for they don't qualify as far as the length is concerned, though still being objects (branches); yet, we don't want to collect them, because they are not eventual items, but whole (already visited) branches. So with such 'if' statement within that 'else' we make sure we skip them and we collect only the actual edges.
Note that this would include
edges represented by
empty arrays!
Done that, we pop the stack; that is: we completely trash this stack entry.
The last thing we said, we want to map it.
We put in the code:
var map=stack[currentStackIndex][3].concat([stack[currentStackIndex][0][ nextObjectIndex ]Index]);
As you see, it just concatenates all the traversed indexes while it traverses them.
The javaScript method concat requires its argument being an array, and appends its unfurled elements, not the array as a whole. Additionally, concat yields the repleted array as a brand new object, unrelated with the original ones.
I need to account for its use, for it may seem just like a 'push' method emulation: it is.
The reason concat and not push is employed is this: javascript implements
pointers with non primitive Data Type; that is, you manipulate an Object and you assign it to a new variable as 'map' is, with the assumption it bequeaths an independent copy; but it does not.
If you would use push, always the same (originally pointed) array gets affected & thus transmitted to each map, with the consequence that you'd eventually find every stack item carrying the same map. That's bad.
You can push on the stack as a whole, because it does is meant to be always referred as the same one object; but you need fully autonomous copies for the singular objects that populate each stack instance. You must dodge the pointer: you first need to make sure you have a brand new object, and later push; concat achieves both purposes in one strike.
There is an alternative: initializing inline a new array each time, but that may eventually lead to a somewhat over-complexified output shape. Try it.
Then when you add to the output, add two items at a time since you want to report the maps too; arguably, your output will thus be an array of arrays, namely an array of scanned edges, whose each entry is an array of two elements: the reported edge, and whose second entry is again an array listing all the numerical indexes traversed to get at it:
output=array[ item, array[] ]
To make the variations collect and report with the output also all the visited nodes we just have to:
a] include a copy of the 'if' statement meant to reside inside the
else, soon after the main 'if' statement, because we have to report anything anyway, regardless of whether it is an edge or not: so we don't reap only within the
else, but also within the main
if statement!.
Such
if' statement still requires checking whether stack[currentStackIndex][2] is zero, to make sure you don't report twice a node upon revisiting it. If this check holds true, push on the output this stack entry.
c] you perform this as soon as you enter the outermost 'if' conditional statement.
By the way, by doing this the root node -first output entry- will report no associated index/es. Correctly.
For DOM usage, you introduce childNodes statements:
As already said, the "shortcoming" of the algorithm (which could be addressed adding more code, which I won't do here) is that the matrix has to be populated in a non tricky way; if you initialize an edge as:
node[X]=new Object("I'm a string. No, yes really.");
that would return "object" to the typeof query, and the algorithm will be fooled thinking it is such and will try to scan it. But, like a two headed eagle or a two edged ax, it is also a string: thus a conflict will arise among what is reported by the chosen initialization (Object) and what is yielded by the execution of such initialization (String). Moreover, releasing a string it does has a (chars) length: thus would fully qualify for the first 'if' conditional statement, and would grisly enter it.
Would you initialize strings like that? I would not, yet it is my duty to highlight to you these types of nasty habits that could show up in matrixes. But don't worry: DOM nodes don't do that, only humans do.
To tailor these functions to scan associative arrays, you must keep in mind that you cannot safely rely at all, with associative matrixes namely with objects that are indexed after
whatever Data Type and not just after
numbers, on the trick we used namely that of the predecremental of the entry:
--stack[currentStackIndex][1]
Thus, what we need to do is to
find a way to make this become possible.
To achieve that, you add a few more fields in the stack instances to carry additional information.
Then, as soon as we enter an object of the stack, we verify if we have already collected a map of all its indexes; this
sadly yet inevitably, requires looping at least
once (I managed to do this once and once alone for each branch) all the indexes of each
newly found branch and gather them in a collection.
While you gather this collection, you collect these associative indexes as values of a
numerically indexed array named
collected whose indexs are, as said,
numbers, and yet whose held values are the associative indexes themselves.
In this way you have now a numerically indexed array (the just collected array), and being an array you have also its length, which we can now pre-decrement in order to implement what I called the "critical trick". Thus, we are back, for an associative array too, to the
same type of implementations we used for standard numerically indexed arrays: we have a length to decrease, and all the associative indexes are linked to a corresponding numerical index match.