[thelist] Archiving saved web pages to CD - need to rename - app or Perl script?

George Dillon <> Evolt! evolt at georgedillon.com
Wed Mar 27 04:41:00 CST 2002


Saving web pages from I.E. often results in files whose names cannot be
written to CD since by default I.E. uses the page title to name both the
page and its folder of linked files and these names often exceed the 64
characters (including path) allowed on CD.  If the CD-burning software
simply truncates the folder name, then the links on the page to all the
files in that folder get broken.

Has anyone got a good solution to this?

At first I tried using the update feature of Dreamweaver to update the links
whenever I manually changed a page/folder name (having defined my tree of
saved pages as a new site) but this was laborious manual process and not
100% reliable.

I've though about doing a Perl script for it (and an outline is below) but,
as ever, my next thought was that this is a common enough problem - so
someone has probably done this already... have they/you?

TIA

George Dillon


Here's my outline for a Perl script...


<PSEUDOCODE type="If it can be called that!">

# Script to rename folder-full of saved web pages and adjust their links
# so they can be saved to CD and still more or less work

1    INPUT name of folder(s) containing the saved pages you want to check
1.1  INPUT Scan recursively or not (e.g. are all your saved pages in one
folder or in a tree)?
1.2  INPUT Delete/leave/backup changed files ?

2    Scan folder to find all files/folders with invalid names (e.g too long
or containing invalid characters)

3    PROMPT for new names of all pages to be changed (For speed suggest a
simple truncation)
3.1  Validate new name (inc. check that it's not already used)

4    For each page
 4.1  Check for existence of files folder
 4.2  Create a new folder for the files named by new page name
 4.3  Move the files
 4.4  Delete old directory (IF/WHEN script is working)
 4.5  Check files for invalid names & change these automatically (save list
of these changes for link adjustment later)
 4.6  Open page
 4.7  Scan for a) links to renamed folder & b) links to renamed files
 4.8  Adjust links
 4.8  Save page with new name
 4.9  Delete old page (IF/WHEN script is working)

5    Print report to screen

6    Exit

# Problems/Issues...
# 1    Need to scan page folders for subfolders (e.g. created by framesets)
# 2    JS links


# Wish list
# 1  Find duplicate file folders and merge (e.g. when 2 or more pages have
# been downloaded from the same site each will have its own files folders
# but they are likely to contain the same files)
# 2  Remove undesirable cookie scripts, pop ups etc.

</PSEUDOCODE>







More information about the thelist mailing list