[thelist] site auditing

David.Cantrell at Gunter.AF.mil David.Cantrell at Gunter.AF.mil
Mon Feb 10 09:15:00 CST 2003


>find . -name '*.html' = 591. ugh. since i'm volunteering for them now,
>doing anything manual and tedious is bad.

Since you are on UNIX, how about taking that list, dumping it to a
carriage-return-delimited text file (each file on a new line), running a
shell script against it to convert the physical paths
(/home/usr/site/web/foo.html) to the virtual path (/foo.html) and dumping
this result into a new file, then running another shell script against the
new list and, for each path encountered, grep against all files in the
original list to see if the virtual path to the file exists in any of the
files there.

I would do this for all files under the web root, not just the HTML files,
to handle images, etc.

Keep in mind I am not a UNIX guy, so I can't give you code to do this, but I
used to mess with it so this should be very easy to do using your favorite
shell script, find, and grep. Maybe awk or sed, but perl is probably much
more powerful and perfect for this. Too bad I can't stand perl syntax.
Python is much easier on the eyes. :)



More information about the thelist mailing list