[thelist] broken link checker software with restrict filters

Steve Axthelm steveax at pobox.com
Thu Feb 19 12:23:40 CST 2009


On 2009-02-18 Stuart Young wrote:

>What do you use for checking for broken links ... except for Xenu, thanks.
>
>Xenu is great but it doesn't have filters to restrict which URLs to download
>... it does have a "Do not check any URLs beginning with this:" option, but
>the URLs I want to remove do not have a common start. I want to prevent the
>checking URLs with query parameters, e.g. to remove printable versions and
>email to a friend versions and so on. Specifically I want to run it on a
>mediawiki site which has literally hundreds of thousands of duplicate URLs
>containing &action=history, &action=edit, &oldid=, &diff= etc etc.

You didn't specify platform, but since you mention Xenu I 
presume you're looking for Windows software? Thought I'd mention 
Integrity[1] just in case OSX was a possibility. It has an 
exclude field that takes a list of strings to exclude.

Also, the W3C LinkChecker perl module[2] has --exclude and 
exclude-docs options which take regular expressions.

HTH,

-S

[1] http://tinyurl.com/3pmk2t
[2] http://search.cpan.org/dist/W3C-LinkChecker/bin/checklink.pod


-Steve

-- 
Steve Axthelm
steveax at pobox.com




More information about the thelist mailing list