[thelist] Avoiding logging of bots

Max Schwanekamp lists at neptunewebworks.com
Thu Jul 6 14:53:26 CDT 2006


Hi List,
 
I have a tracking link mechanism (using PHP and MySQL) for my site which
logs a click for each unique visitor.  User clicks a link, script checks for
cookie and if it finds none, it records the click and sets a cookie, then
sends the user on to the final URL destination.  If user clicks the link
again, the cookie value is present, so the script does not record the click,
but still sends them to the right place.  If cookies are blocked by the
client browser, each request gets recorded as a click.  No problem if it's a
human, but robots in their multitudes can hit a link hundreds of times in a
few minutes, and they generally don't accept cookies.
 
So, I'd like to have a database of known robot user-agents and/or IPs, so I
can exclude them from the stats.  Any such list is bound to be incomplete,
but if I can detect most of the common bots, I'd be happy.  Right now I'm
playing around with the robotstxt.org list, which is a text file that may or
may not be up-to-date, and putting it into a MySQL table that can be queried
for each click (actually two: a MyISAM one for storage, and a HEAP tbl for
runtime).  Anyone on the list been down this road before, and either have
data/prefab solution they're willing to share and/or suggestions for a
better way to approach this?
 
TIA!
 
-- 
Max Schwanekamp
http://www.neptunewebworks.com/



More information about the thelist mailing list