[thelist] Avoiding logging of bots

Max Schwanekamp lists at neptunewebworks.com
Fri Jul 7 01:15:44 CDT 2006


> From: Ian Anderson
> I believe the best solution is to use JavaScript to deliver the thing 
> you're tracking, and this is the approach taken by Sean Inman's 
> excellent Mint, which I would recommend for any site with no 
> more than 
> moderate traffic and a LAMP environment.

Thanks for the suggestion Ian; Mint would be great, if the destinations were
all unique.  But in this case we can and will have multiple URLs pointing at
the same destination.  The point is to track the traffic for each of those
URLs, not the amount of traffic landing on the page itself.  E.g. both
http://www.example.com/abc/1234 and http://www.example.com/xyz/w7hyg might
both ultimately redirect to the same destination, namely
www.example.com/categories/6/courses/7.  But one link URL is for a banner ad
link, the other is given to employees of a separate organization via their
own website.  My client wants to track the traffic generated by each of
these links, as distinct from the traffic ending up at the final destination
page.  Spiders may follow the links, and some will ignore our robots.txt
file, so I'd like to try and reduce their potential impact on the traffic
stats.  Again, we're looking for more access control than log analyzers can
provide, AFAIK.

My other thought is to track IPs hitting a given link, and throttle it such
that the same IP cannot cause a hit to be recorded more than once every X
minutes.  But surely someone else has been down this road.

-- 
Max Schwanekamp
http://www.neptunewebworks.com/





More information about the thelist mailing list