[thelist] Crawling for headers

Tony Crockford tonyc at boldfish.co.uk
Tue Mar 19 08:00:01 CST 2002


>Hi,
>
>I'm in need of a tool that can crawl through websites and look for
>specific meta-data in the pages. For instance to identify
>which pages that
>have meta-data with expiry dates and that have already
>expired, or to
>locate pages that don't contain specific meta-data at all. The more
>generic the better.
>
>It's intended for a large intranet containing of a
>significant number of
>sites, hosted at separate locations, so a simple find and
>grep won't do.
>
>Anyone knows of such a tool?

This might be a *sledgehammer to crack a nut* but would an open
source search engine do the job?

http://www.htdig.org

IIRC you could probably configure the indexing part of it to do the
job?

maybe?  maybe not.....




More information about the thelist mailing list