«NOT FIT FOR OUR READERSHIP»?
HOW TO MATCH OR BEAT A WINDOWS ® ALGORITHM
|
|
These are not only implementations from javascript but also truly php specifically crafted functions, with interesting performances that at times matched those of built in Windows ® algorithms
|
a Joan Miro painting: Woman at night, (1970)
|
|
a Joan Miro painting: Peasant under moonlight, (1968)
|
A Francesco Hayez painting: Secret delation, (1847-1848)
|
A Francesco Hayez painting: portrait of Sarah Louise Strachan Ruffo di Motta e Bagnara, (1851)
|
A Francesco Hayez painting: portrait of the singer Matilde Juva Branca, (1840-1844)
|
A Francesco Hayez painting: The meditation, (1851)
|
A Francesco Hayez painting: Portrait of Teresa Zumali Marsili with her son Giuseppe, (1833)
|
Why when I wrote my article about «scanning, unfolding and mapping irregular matrixes and dom nodes» they rejected it, will always be a mystery to me.
They said "it is not fit for our readership", and well: it was all text and codes, no pics of Nina Moric or of the strange abstract paintings I like. And the site was good, though of course not too good for me: I mean, I didn't propose myself to the Scientific American. I'm not insane.
When I politely asked for the reason, I never got an answer.
So I just don't understand. Probably Salinger was right: "Never tell anybody anything. If you do, you start missing everybody."
You have to understand, as many outsiders & entirely self taught shakespearian braggarts, I have a tendency to get enthusiast when after many sleepless weeks I contrive an algorithm that is as fast in doing some tasks as what a Windows ® algorithm does at the same task: and at times much faster.
Not that I want to "match" Windows ®: on the contrary I actually cannot conceive using anything else but Windows ®, and anyway who the hack am I to hope for this?
It is that what I have on my machine is Windows ® and I'm not a millionaire who can afford four different machines to test more systems. So I compare with that.
But a few of the algorithms published here proved as fast if not faster in searching a file, as for instance the search file utility in my operative system ever did.
This is not an urban legend: it positively happens. Test it. Or see this, from HD main root to a deeply nested folder:
You have all the codes to test these functions, also the PHP interfaces at bottom. Do test them a variety of times first, and then draw your own conclusions.
For a self taught guy without a penny that has been scribbling unnoticed over this website for years, that's not bad, this is what I properly mean; so I am sure you will be indulgent with my humble bluster. And anyway after all, nobody's listening to me, so who cares isn't it?
"You plan, But It is God Who answers" [Proverbs]
Because He is The Lord, and His is the Glory, and His is the Kingdom, and no one stands before It.
Anyway here I feature many implementations from that harvest of functions that is the family of the scan dom xml associative arrays functions. Of course, to a great degree you need to consult that file in order to have a truly comprehensive documentation on the inner workings of these functions; being the javascript documentation fully valid in its internal logics for these php implementations as well, it would have been a true waste of time and space restating here the 95Kb written there.
Anyway that family of functions started as a group of 4 and over time they evolved, becoming about 10, and now that I moved their logics here, they not only proved fit to be implemented in php as well and are still delivering truly useful features (scanning and reporting irregular matrixes can be a daunting task not only client side, but maybe many a time also on server side): they even proved quite good in order to implement strictly php specific instances such as the scanDirectories version, which scans a directory, throughout all its subfolders, reports all, then reports also discriminating the group of the found directories, and then discriminating only the files in each directory.
I call this a rather high and good amount of development from a starting core of only 4 javascript functions on 2002, so I was truly befuddled when they rejected my professionally written article featuring and describing three of them.
So, here are the implementations. I start with the more php specific, the scanDirectories function, which performed truly good: it scanned 5500+ files on remote HD in 5 secs.
It scanned a whole local HD of 68500+ files in less than 70 secs (of course remember that the first time in a browsing session you start php, that first time can be significantly slower for the engine is loading stuff that at the next rounds it won't need to load again. Let me insist: do not draw a judgment after a first try, allow at least three tries and you'll see I am right). This speed is darn good.
The Php interfaces I provide at bottom to test these scripts report in the results the speed calculated as soon as the function is called in and as soon as it has finished scanning, namely the times are obviously enough exclusive of the output printing times.
The function scanned a whole winnt folder & related subfolders in less than 2 secs. Time consuming only when, after having scanned, you may want to print on the screen: printing 68000 lines can be time consuming for a browser, but of course that's not the point for what matters is the scanning rapidity not the reporting one.
Not only, but the findFile version, when launched to search a file purposely deeply nested within a system folder, and launched to search after it from the main root of the system, took between a tad less than half a second and 5 seconds to spot it. Performances are calculated as soon as the first match is reported. The same task performed by the built in operative system find file feature launched from the main root of the system as well, took from a tad less than 4 seconds up to 30 seconds: ok, let's say they are at least equivalent as far as performances are concerned, namely these algorithms are at times a match, under a performance point of view, to the best algorithm nested within a widely diffuse commercial product as Windows ® is: therefore I bet now you see why I was wondering why my article was rejected "because probably not fit for our readership". Uh? And at that time I didn't even know it could be as fast as commercial algorithms, because I hadn't implemented it in Php yet, so I didn't even mention this aspect; all was firstly conceived in javaScript.
Of course, in the version I feature here of the scanDirectories, you're welcome to modify minor things, such as for instance in case you want to report only the files, remove all the references to $directories and to $output, and so on. $output reports both directories and files, $directories reports only the directories, and $files reports only the files.
The returned object is thus an array of three entries:
array($output, $directories, $files)
Whereas each listed entry is an array on its own turn.
So indexes are in the shape [0-2][x]
As I said, you could fine tune that if you want.
Just keep in mind that elaborating these functions took 10 days at first, including 5 sleepless nights (maybe you don't know how a sleepless night spent script fighting and not pillow fighting may look like), and then after 2 years I came back to think of them and they took 4 more days, and now that I put them in php they took 7 more days: I payed a high price for these codes that are presented to you in such an apparently smooth way; so use them, modify them, but remember that a guy payed a personal price for them and is now donating them for free: either keep the comment in the code leading to http://www.unitedscripters.com or provide me with a link on some good site of yours, ok? Just fair.
Potential Improvements and choices |
I was faced with two options while developing these algorithms:
-
"Do it later":
Scan your matrix/directory and just look after building the stack on the fly (whatever the currently scanned object may be).
Then at a second round inspect what is on the stack. This implies more rounds than truly necessary but avoids nesting within the kernel of the tasks the conditional statement which verifies if what you want to find (for instance a file) has been met.
This saves cycles but cycles are more encumbered with tasks while building the stack.
-
"Do it now":
Inspect each item as soon as you find it, prior to adding it on the stack (and don't add at all, thus, if the currently scanned object is what you're looking for). This is arguably faster but also implies nesting within the kernel of the tasks the conditional statement which verifies if what you want to find (for instance a file) has been met.
This adds cycles, but cycles are less encumbered with tasks while busy building the stack.
-
And now?
What is going to be faster: more cycles, which is what computers are incredibly good at, or less cycles and more frequent conditional checks? I eventually opted for the latter option: less cycles.
Yet this is how vast the potential of these algorithms was when I envisioned them. This is why they can match performances of Operative-System-resident search-and-report algorithms.
This is why your server should use them.
And this is why other guys would have made you pay for them whereas I give them to you for free: because they're not suitable for all readerships lol.
|
I added a feature to these scanners (those for the files at least), namely you can limit the depth level of the scanning process. All the file/directory scanners have an entirely optional third argument (second argument for scanDirectory) named limit: if you pass it as a number different than the (default) zero, as soon as the scanning process has reached the level of depth represented by that number, it will stop scanning.
You have to understand that, of course, the scanning level means level from the current one onward.
So for instance if within a folder you have say 5 nested levels of folders, by setting for instance the limit argument to, say, 3 and you set as starting node the root of souch group of nested folders, the functions will unfold the folders up to that level included and then they would stop and return the result.
Example: starting root is C:/foo/foo2 and limit is 2, it will scan and report things like, say:
C:/foo/foo2/folder1A/
C:/foo/foo2/folder1A/folder1B
Thus if you set the limit argument to 1, these scanners will limit to scan the current folder without getting inside the nested ones.
So keep in mind that if you want to enter inside a nested directory and not just report it, your limit argument must be set to a unit higher than the level such directory is located at. For instance:
scanDirectories("C:/program files", 1);
Would report only things like, say, C:program files/adobe, C:/program files/internet explorer etc... namely only stuff 1 step further on from the current level.
If you want to search inside those folders:
findFiless("C:/program files", "findme.txt", 2);
A set of features I thought you may like.
Remember that to scan the folders, these functions perform a validation that changes all folder separators that appear as a backward slash into a forward slash; also, .
If your Operative System uses/allows a different folder separator than a backward slash, you can pass it as the fourth (third for scanDirectory) parameter of the functions, as a String namely in between quotes.
THE CODES
|
|
All the codes, first the specific php new codes (for directories and file finding, that is), then the php implementations of previously developed javascript codes
|
ALL THE CODES |
a Wynn Bullock photo: Big Sur Sunset, (1957)
|
scanDirectory |
|
lines:
|
This is php specific and it scans a directory, following it in all its subfolders. Then it reports an output of three entries as follows:
- [0]: an Array, listing all the folders and directories in the order they have been visited. Since the algorithm adds to the increasing head of the array, the order is reversed: the entries listed with the higher index number, are those visited first, as an upward ladder.
- [1]: an Array, listing only the directories.
- [2]: an Array, listing only the files.
Needs one argument only, the String of the path of the starting directory.
The last argument is the optional limit.
|
findFile |
|
lines:
|
This is php specific and it finds one file, following a search directory in all its subfolders. First argument the folder you start searching from, second the name of the file, exclusive of the path, you want to find. Returns the path to the file, inclusive of the file name.
Fatser than Windows ® if it finds the file. Slower to report failure (no such file found), for it traverses all subfolders before giving up: Windows ® is faster in reporting a failure, and no: I don't know why.
It is case sensitive. A case insensitive version (implying a truly minor variation) is the next one: I didn't add the feature here for these algorithms should try being fast and avoid additional conditional statements when possible (though I didn't strictly stick to this principle for the scanDirectory function). The last argument is the optional limit.
|
findFilei |
|
lines:
|
This is php specific and it is the conceptual equivalent to the function above findFile, yet this one is case insensitive.
Note the trailing i in the name of the function, to show it is case insensitive (following here a php naming convention, as it appears to be at least).
It returns the file path inclusive of the file name. The last argument is the optional limit.
|
findFiles |
|
lines:
|
This is php specific and it finds all files that match with the passed $value argument.
It is slower than Windows ®: apparently php lags behind when requested to print on the screen while it's still busy running an algorithm concerned with fast performances; though it still finds a file in the 3 reported seconds in a deeply nested and populated directory, when requested to print each found instance by a command within the loop, it apparently refuses to print that until the whole directory has been searched, though the first file at least has been certainly spotted with the same rapidity as findFile did (note here the trailing s in the name).
It returns an array whose each entry is a file path inclusive of the file name. The last argument is the optional limit.
|
findFilesi |
|
lines:
|
This is php specific and it is the equivalent of the function just above, yet this one is case insensitive.
|
findFiless |
|
lines:
|
This is php specific and it finds all files which have in their file name the second argument $value as a substring: that is, no matter if the whole of their name matches with $nvalue, a match that simply contains that $value portion is good as well.
Good to report for instances all files with some specified extension like, say, ".jpg"
Note that the name of the function has two trailing ss to mean find files (with) specific feature.
This function is case insensitive by default: if you are happy also with a substring match, this time I assumed also a substringed though not matching case match, would do as well. The last argument is the optional limit.
|
findFilesr |
|
lines:
|
This is php specific and it finds all files after a regular expression pattern: the file name is compared against this regular expression. Thus the second argument must be a Regular Expression: using php preg_match, such regular expression must be a String, carrying the leading and trailing forward slashes Example:
"/\.jpg/"
note the escaped dot char (if you don't know what regular expressions are: Mastering Regular Expressions In Javascript)
Note the trailing sr chars in the function name. The last argument is the optional limit.
|
findFiler |
|
lines:
|
This is php specific and is like the function above, namely requires a regular expression, but it reports only one file, the first one found matching that regular expression in its filename. Thus it returns a string, the full path to the file inclusive of the file name. The last argument is the optional limit.
|
a Wynn Bullock photo: Stark tree, (1956)
|
scanArray |
|
lines:
|
This is documented at its javascript version file, check there the function with the same name.
Scans and maps an irregular array. Returns an array, whose each entry is an array of two entries, the first reporting the found edge, the second an array reporting all the numerical indexes that lead there.
|
scanArrayAll |
|
lines:
|
This is documented at its javascript version file, check there the function with the same name.
Scans and maps an irregular array. Returns an array, whose each entry is an array of two entries, the first reporting the found edge/branch as well, the second an array reporting all the numerical indexes that lead there.
|
scanAssociative |
|
lines:
|
This is documented at its javascript version file, check there the function with the same name.
Scans and maps an irregular associative array. Returns an array, whose each entry is an array of two entries, the first reporting the found edge, the second an array reporting all the numerical indexes that lead there.
|
scanAssociativeAll |
|
lines:
|
This is documented at its javascript version file, check there the function with the same name.
Scans and maps an irregular associative array. Returns an array, whose each entry is an array of two entries, the first reporting the found edge/branch as well, the second an array reporting all the numerical indexes that lead there.
|
find |
|
lines:
|
This is documented at its javascript version file, check there the function with the same name.
This finds the first instance of $value in the input array, and returns it.
|
findAll |
|
lines:
|
This is documented at its javascript version file, check there the function with the same name.
This finds all the instances of $value in the input array, and returns an array.
|
associativeFind |
|
lines:
|
This is documented at its javascript version file, check there the function with the same name.
This finds the first instance of $value in the input associative array, and returns it.
|
associativeFindAll |
|
lines:
|
This is documented at its javascript version file, check there the function with the same name.
This finds all the instances of $value in the input associative array, and returns an array.
|
a Wynn Bullock photo: Child on forest road, (1958)
|
PHP INTERFACES
|
|
Here are the interfaces to use on your own computer your own United Scripters php file finder tools
|
I provide you here with the codes for a Php interface to run all these utilities on your machine. This assumes that:
- Of course, you have a PHP interpreter installed & running on your computer, and a server (such as Apache for instance). Make sure it is PHP 4.2 onward.
A rather big installer (about 50Mb) for free and that would install PHP and an Apache server on your machine is Apache2Triad.
- I assume you're aware that in order to run a php file on your machine, after you have your Apache and Php interpreter, you don't load the file clicking on it, but you open your browser and you connect emulating an internet connection by:
http://localhost/findfilers.html
findfilers.html is a placeholder for the html file whose code will be shown below.
- Thus, you put that html page in your htdocs folder within Apache.
In that very same page you put a file with extension ".php" and whose name is:
findfiles.php
If you change that name, you have also to change it in your previously mentioned html file because the form within there carries an action="findfiles.php"
statement within it.
- Lastly, since the findfiles.php calls in by an include statement an external Php file named
finders.inc
You also have to create such file with the ".inc" extension (or, alternatively, change within findfiles.php the include argument that looks for such file, residing in the same directory path where your findfiles.php file is).
Such .inc file has to include all the first 8 functions sported here:
findFiless findFile findFilei findFiles findFilesi findFilesr findFiler scanDirectory
Remember that also .inc files must start with a leading <? and an ending ?> set of tags.
- Last but not least, you may want to consider modifying in your php.ini (after installed usually residing within the C:/windows folder in your system - by the way I just found its location on my machine with these utilities) the statement that concerns:
max_execution_time
variable and change its value and set it to something like 180 seconds (it is usually set to 30 seconds, modify only that number; scanning and listing, for instance, 70,000 files and reporting them all may take from this utility about 60 or 90 seconds - whereas my windows find file utility lists them after endless minutes).
If you change that, restart Apache to reflect the changes (if you don't know how to do it, well reboot your computer!).
Here is the HTML code, of course the interface is without too many frills. One warning: the default search path is set to C:
You may want to change it to something more limited in scope.
[ if you want to see that printed in a new window ]
Here is the code for your findfiles.php.
Remember: you still need to create your include file finders.inc with all the find file/s functions and the scan directory function.
Requires php 4.2 and above.
|