[thelist] State of the art in CAPTCHA decoding?

kasimir-k kasimir.k.lists at gmail.com
Wed Apr 18 14:47:16 CDT 2007

Bernardo Escalona-Espinosa scribeva in 18/04/2007 18:05:
> These CAPTCHAs are beginning to get ridiculous, in the sense that the
> degree of garbledness needed is so extreme that I can barely read the
> "secret words" myself in some of the sites.

Indeed, and as the idea is to "tell Computers and Humans Apart", the 
CAPTCHAs and spam bots soon will reach the point where every bot can 
read the CAPTCHAs while no human can...

> philosophical question would be: what do you imagine the next step to
> be?

One approach (as Brian noted) to ask simple questions. IMO this on its 
own hardly is enough though, but it complements other approaches.

Inspired by Joel's post earlier today I did some browsing on this 
subject, and found this: <http://www.nedbatchelder.com/text/stopbots.html>
"Rather than stopping bots by having people identify themselves, we can 
stop the bots by making it difficult for them to make a successful post, 
or by having them inadvertently identify themselves as bots. This 
removes the burden from people, and leaves the comment form free of 
visible anti-spam measures."

And as I already commented there, a couple things I've used:

1. Every time a form is served, it gets a random token, which is also 
saved in a text file together with its creation time. When a form 
submission is received, it must have a valid token. A token expires in a 
couple hours, and the form must not be submitted too quickly after the 
creation of the token (no human fills and submits a form in one second).

2. The most difficult spammers are actually human, not bots. Using 
rel="nofollow" attribute does discourage some, but instead of that I 
just disallow the string "http://" in the messages, and ask the users to 
remove them from the web addresses. That way I have to copy-paste any 
URLs instead of just clicking them, but I find that small enough trouble.


More information about the thelist mailing list