Web Excavator


Version 1.30 - January 2, 2010
  • Added a minimum image width and height option to the Advanced Options dialog.
  • Fixed a bug where the save file could become corrupted if the application crashed.
  • Now better handle links that end in a forward slash.

Version 1.29 - January 2, 2008
  • Fixed a bug in the update duplicate files system and made some efficiency improvements.
  • Fixed a possible crash bug with a particular sequence of actions in the download list dialog.
  • Now when searching with nothing it displays the first set of URLs.
  • Fixed a bug with the list dialogs being unresponsive when saving.
  • No longer guesses site names with URL encoded characters (e.g. %3F, etc.)

Version 1.28 - October 28, 2007
  • Fixed a bug in the search algorithm (e.g. when searching for zza in zzza).
  • Fixed a bug when clearing the URL history while files are downloading.
  • Many improvements to the web page parsing algorithm.
  • Many improvements to the algorithm that converts web pages for offline viewing.
  • No longer guesses file names with URL encoded characters (e.g. %3F :)
  • When using the up or down arrow to get the previous search option the cursor is now defaults to the end of the phrase.
  • Now remembers the last ten search terms.
  • Added additional default sound and movie file types.

Version 1.27 - September 30, 2007
  • Fixed a bug with the 'Pause After Downloads' where it could turn itself off.
  • Save files are now slightly smaller (this change is backwards compatible)
  • An empty 'To Download' list dialog will now automatically reload the list when files are scheduled to be downloaded.
  • Also on the 'To Download' list dialog you can now 'Delete All' the downloads in the queue (if you are not performing a search - in which case the button deletes all the search entries found)
  • Pausing the program will now update the graphic in any open list dialog.
  • Added an Advanced option to save downloaded files with URL encoded files names.
  • Fixed a possible deadlock.
  • Added .xml to the default site types.
  • Fixed a bug with loading option (.ini) files.

Version 1.26 - September 1, 2007
  • Fixed a bug with duplicate files where deletion would occur if the same file was added again to the download list.
  • Fixed a bug loading file types from an ini file.
  • Within the list dialogs, Ctrl-S selects all displayed entries and Ctrl-D deletes displayed selected items.
  • You can now have both 'To Excavate' and 'To Download' list dialogs open at the same time.
  • Also both 'To Excavate' and 'To Download' minimize when Web Excavator is minimized.
  • Optimized loading the duplicates data file.
  • If 'Open Last Download' fails it now tries to open a previous download.
  • Fixed a bug in the Wizard with selecting 'Guess URLs'.
  • Minor UI improvements.

Version 1.25 - August 19, 2007
  • Fixed a rare crash bug.
  • Made improvements to the HTML parsing engine.
  • When downloading a web site, offsite images are now downloaded. More and more websites are storing their images offsite.
  • Made some optimizations, particularly when checking to see if a site is to be added to the download list.
  • Found an instance where Web Excavator could create a sub-directory when configured not to.
  • Now extracts links that do not have a corresponding closing quote.
  • Now displays the size of large files correctly.

Version 1.24 - July 14, 2007
  • Pressing the up arrow in the search field within the 'To Download' or 'To Excavate' dialogs, will now display the last phrase searched for.
  • Added 'Save' and 'Load As' toolbar buttons.
  • Saved files are now named with the site you started plus ".sav". Old saved files will still load.
  • The save files above also now have a corresponding ini file.
  • In the wizard when downloading files skipped phrases are applied to both Site URLs and file URLs.
  • Improved the handling of local 302 redirects.
  • Fixed a bug with checking for updates.
  • Now handles links that only have a directory and do not contain a file name.
  • When downloading web pages that do not have a file extension, Web Excavator now appends a file extension.
  • Fixed a bug when downloading within a domain, where it could download html files outside of the domain.
  • Fixed a bug where the stop command would not stop downloading if a particular situation occurred.
  • Fixed a bug where a URL to excavate could be added twice.
  • History data is now backed up to History.bak
  • Changed it so Web Excavator will not guess URLs if a duplicate file is found.
  • More minor UI improvements.

Version 1.23 - June 24, 2007
  • Minor UI improvements.
  • Zero byte length files are now automatically deleted.
  • As requested, added a 'Pause After Downloads Completed' menu option that, when activated, stops processing URLs and waits until the download queue is empty before pausing Web Excavator.
  • Last image download is now updated while 'To Download', 'To Excavate' or 'In Process' dialogs are open.
  • Fixed a bug where occasionally the first file downloaded would be saved into the same directory as Web Excavator.

Version 1.22 - June 2, 2007
  • Fixed a bug with updating Web Excavator that was introduced in version 1.21.
  • Added option to 'Download Dir' to store checksums/file size information about every file in the download directories so that when a new file is downloaded it compares the checksums/file size of this new file with the data it has stored. If it is found to be the same it deletes it. Also, if checked, all new downloads checksums/file size are added to this list.
  • There is a new menu item to delete this checksums/file size data file.
  • Improved file compare functionality.
  • Fixed threading related crash bug when removing files from download list.
  • Fixed bug with application not receiving the focus properly after restoring from system tray.
  • Added advanced option to resize the last downloaded displayed image so as to fit within the display window.

Version 1.21 - April 11, 2007
  • Now checks minimum file size upon receiving file header information.
  • Fixed rare bug where Web Excavator would not restore properly after being minimized.
  • If not saving web pages and audio/image/etc file is really an html file, then this file is not saved.
  • In the list dialogs the search term now defaults to active.
  • Version check file is now always downloaded to the same directory as the exe.

Version 1.20 - January 14, 2007
  • Added ability to search within the list dialogs and to remove all files containing the search term.
  • Reload within the list dialogs now starts at the given index.
  • Within the list dialogs the shift key now can be used to select multiple files .
  • Now displays the last jpg or gif image saved. Left click on this image to open in default viewer.
  • Added Advanced option to disable displaying the last image saved.
  • Send cookies (which defaults off) now also saves cookies too.
  • Fixed: Downloading from secure sites now updates the number of files downloaded correctly.
  • Fixed: 'Erase URL History' now deletes all ignore lists.
  • Delete key will now deletes selected items from list dialogs.

Version 1.19 - December 28, 2006
  • Added buttons to the application window to do common tasks.
  • Added download directory to main application window (after typing press enter to use new directory).
  • No longer guesses URLs on a redirect
  • No longer guesses file names when the file downloaded is below the minimum file size (set in options)

Version 1.18 - August 6, 2006
  • Fixed a bug with saving Sub-domain data to separate folders.
  • 'Within This Domain Only' which downloads into sub-domains folders (see below) will now change the original code so that it points at these.
  • The above feature can be turned off within the advanced options by selecting 'Do Not Alter Web Pages On Download'.
  • Added an 'Log Valid URLs' advanced option to log all the valid downloads to a ValidURLs.log file located in the same directory as the executable.
  • Added an option to not save any files when 'Log Valid URLs' is checked.
  • Fixed a bug with not downloading after starting and stopping a previous download.
  • Fixed a bug with guessing download URLs and IP addresses.
  • Added option to apply any changes to the search keywords, or exclusion keywords to the appropriate lists immediately.

Version 1.17a - July 5, 2006
  • Added advanced option to define a custom user-agent.
  • Added .dll to the default list of web site types. Saves with dll extension. Do not run downloaded dlls.
  • Fixed a bug with extracting URLs from an unusual (to me) JavaScript coding style.

Version 1.17 - July 3, 2006
  • When minimizing Web Excavator to the system tray the 'File duplicates' dialog, if open, is now also minimized.
  • Properly handles web servers that do not allow disconnected downloads to continue.
  • Fixed a potential program lock up when exiting in the middle of searching for duplicate files.
  • Now handles style="background-image:url(URL)"
  • Fixed problems with setting "More Options" on the Wizard Dialog.
  • Fixed a bug with connecting to websites that do not use port 80.
  • Added jhtml to the web page types to search
  • 'Within This Domain Only' will now also download from sub-domains (e.g www.example.com, images.example.com, scripts.example.com, etc.). Sub-domain data will be saved in a separate folder (e.g. scripts_example_com).

Version 1.16a - March 23, 2006
  • Now appends 'index.html' when a sub-directory ends in '/'.
  • Fixed a bug with regards to newlines when handling redirects.

Version 1.16 - March 11, 2006
  • Streamlined Wizard for downloading Single web pages
  • Improved downloading single web pages.
  • Fixed a bug in the wizard where you could not turn off guessing URLs
  • Fixed a bug where one of the previous buttons took you to the wrong place
  • Fixed a bug where it did not automatically prepend http when downloading files
  • In verbose mode it now displays if it failed to download a file
  • Reduced false positive rate when parsing JavaScript.
  • Now saves the 'Min. Downloads (below which search for more)' setting.

Version 1.15 - February 12, 2006
  • Fixed a couple of potential buffer overrun crash bugs.
  • Added option to search in the sub-directories of the 'optional directories' to see if a file has already been downloaded.
  • Added 'Download Dirs' to the options menu.
  • Now automatically appends a ';.' if the search types are all exclusions (!).
  • Fixed bug with setting defaults in the Options Dialog not resetting previous download file types.
  • Fixed a bug with not ignoring first excluded site file type.
  • Fixed a bug with using non-exclusions (!) with Keywords to skip.
  • Added Advanced Option to change the minimum number of downloads before it searches for more.
  • Minor interface changes/improvements.

Version 1.14 - February 2, 2006
  • Optimized the code that parses web files.
  • Fixed a bug where Web Excavator would, under a particular circumstance, parse non-web files.
  • Added ability to type a starting index when viewing the 'Download' and 'Sites To Scan' lists.
  • Made the 'check for updates' easier to use.
  • Now closes faster if having a connection problem.
  • On failure to connect to a site, it now checks against a known site to verify that the Internet connection is good.
  • Fixed a spelling mistake on 'Find Duplicate' dialog.
  • Now remembers the 'Search Sub Directories' setting between duplicate file searches
  • Fixed a bug where web pages did not have a extension when saved in 'Do Not Create Any Sub Directories' mode.
  • Clicking the exit button a second time will force the software to exit no matter what.

Version 1.13 - February 2, 2006
  • This one was unlucky ;)

Version 1.12a - December 11, 2005
  • Added Portable Application Descriptions XML file to the installation program.
  • Now sets all options to there original values after pressing the 'defaults' button.
  • Included additional exclusion defaults when downloading entire websites.

Version 1.12 - December 4, 2005
  • Fixed a download crash bug.

Version 1.11 - November 26, 2005
  • Made some optimizations to the code, in particular Web Excavator now shuts down faster.
  • Fixed a crash bug when parsing web page after installing Norton Internet Security.
  • In the 'In Process' view it now shows if it is trying to connect.
  • Added an option not to guess URLs after Web Excavator finds a question mark.
  • Fixed a display bug in the download list.
  • A few small interface changes.

Version 1.10 - October 27, 2005
  • Added a wizard to simplify the use of this application.
  • Added an option to the advanced options dialog to always start in 'classic view'.
  • Added ability to remove files from the download list while paused.
  • Fixed a bug when exiting while paused.
  • Optional 'Keywords In URL To Skip' is now checked when a save file is loaded.
  • Fixed a bug where it would sometimes not check all download directories.
  • Can now 'row select' on the Download List Dialogs. Also right clicking will now open that URL.
  • Fixed a bug where the optional 'history of sites visited' list was loaded twice.
  • Also, when using the above option, the program now exits faster.

Version 1.09 - August 13, 2005
  • Added menu item to search for duplicate files in any directory / sub-directories and optionally delete them.
  • Added advanced option to set the priority of the program. Set this to low if you want it to run as a background process.
  • Fixed a bug where it would not download files starting with '../'.
  • Minor tweaks to the parsing algorithms.
  • Added ability to copy a link to the clipboard from file or site download lists by right click or Ctrl-C.
  • Added an option to send a cookie to the site being downloaded from (see options dialog)
  • Added an option to view the files currently being downloaded.
  • Added two more optional download directories to check in to see if a file has been downloaded.
  • Some minor graphical and display tweaks.

Version 1.08 - June 22, 2005
  • Added advanced option to check for updates at start-up.
  • Fixed a bug in the download code.
  • Added a menu option to open the last file downloaded.
  • Within the view site and download lists it now displays the index range of the items being shown.
  • With 'Guess URLs' it now does not try to guess IP addresses.
  • Corrected a few grammar errors.

Version 1.07 - June 17, 2005
  • Fixed a possible crash bug when guessing URLs.
  • Added a menu option to remove all files to be downloaded.
  • Made many tweaks and improvements to downloading sites where a redirect file is needed.
  • Added an Advanced Option to set default redirect scripting language when file type is not supported. Defaults to PHP. Currently only ASP, JSP and PHP are supported.
  • Added a Open Debug file button to Advanced Options dialog.
  • If loading a saved state fails, it will now automatically attempt to load the back up.

Version 1.06 - June 5, 2005
  • Fixed bug handling URLs that are not in quotes.
  • Added option to not download files if their name contains a given keyword.

Version 1.05 - May 31, 2005
  • Now handles redirecting web pages. When handling PHP, ASP and JSP redirecting web pages it writes out a redirecting page in that language; all others are written out as ASP. Defaults to only using just one parameter.
  • Can now set the Maximum file size to download.
  • Can now set the number of URL download threads (see Advanced Options).
  • Many minor optimizations and improvements with the web page processing.

Version 1.04 - April 24, 2005
  • Now displays the number of files downloaded this session.
  • 'Save and Exit' now shutdowns the program if download list is empty.
  • You are now prompted if the program is about to overwrite the save file with an empty one.
  • Fixed bug in download directories to check display and added two more.

Version 1.03 - April 18, 2005
  • Fixed bug when downloading single web page.
  • Fixed potential crash bug on exit.
  • Added additional optional directories to search in to see if a file has already been downloaded.
  • Optionally keep a list of files already downloaded so as not to download them again.
  • User deleted files are added to the 'already downloaded list' (if enabled).
  • Fixed bugs in download and site list display.
  • Fixed problem with resuming interrupted downloads.

Version 1.02 - March 13, 2005
  • If you specify a save directory that doesn't exist it will now create that directory for you.
  • Added option to specify other directories (up to 5) to search in to see if a file has already been downloaded.
  • Now saves the date when logging URL errors.
  • Added stop adding to URL History option menu option.
  • Added 'Erase URL History' option to the menu.
  • Added a 'fetch' individual file option on the main window.
  • Fixed a bug where it would not search the exclude list under certain circumstances.
  • Fixed a small memory leak with the file view dialog.
  • Some cosmetic changes, including making the file view dialog wider.
  • Added a reload button to the file view dialog.
  • Fixed a crash bug in the file view dialog if you selected or deleted from an empty list.
  • Some minor tweaks to the programming logic.

Version 1.01 - February 6, 2005
  • Made numerous changes to the interface, including:
    1. Adding a 'Back' button and 'Select All' to the file view list.
    2. Adding a Web Excavator graphic.
  • Can now optionally tell it to guess URLs. Currently, this only works for URLs that contain numbers. For example, if it finds a Neptune01.jpg it will look for a Neptune02.jpg and a Neptune10.jpg.
  • Added an option to search for keywords on the web page not just in URLs.
  • Added 'Save As' and 'Load As' current options to the File menu.
  • Added Menu item to 'Erase URL History' that clears the list of previously downloaded URLs.

Version 1.00 - January 30, 2005
  • Initial release.
