[thelist] spammers/spambots

Jon Molesa rjmolesa at consoltec.net
Mon Jul 27 12:31:08 CDT 2009


*On Mon, Jul 27, 2009 at 09:04:57AM -0600 Bob Meetin <bobm at dottedi.biz> wrote:

> Date: Mon, 27 Jul 2009 09:04:57 -0600
> From: Bob Meetin <bobm at dottedi.biz>
> Subject: [thelist] spammers/spambots
> To: "thelist at lists.evolt.org" <thelist at lists.evolt.org>
> 
> Just curious,  I am finishing up a little program, the preprocessor, 
> which will be used to grab $_POST or $_REQUEST content, and if it meets 
> certain criteria, reject any further processing. 
> 
> So the first question, automated spambots, do they attempt to fill in 
> content in any/all fields even if the field is bogus/contrived?
Not sure but it'd be trivial to grab a form, parse out the fields and
submit URL and just fill the fields and submit.  Over and over.  I
suspect that is what most bots do.
> 
> And the second question, much of the spam content I see is posted in 
> non-English dialects, way not English.  If I knew where to start I can 
> probably include some of this "stuff" in a reject list, but I'm not 
> surehow to get or convert these odd looking characters into something my 
> forms can handle.  Suggestions?
> 
> -- 
> Bob
> 
> -- 
> 
> * * Please support the community that supports you.  * *
> http://evolt.org/help_support_evolt/
> 
> For unsubscribe and other options, including the Tip Harvester 
> and archives of thelist go to: http://lists.evolt.org 
> Workers of the Web, evolt ! 

I believe some of the bots grab the form once and then constantly submit
to it over and over.  You can detect this my issuing a hidden form MD5 value
when the form is requested and check that it is present an matches upon
submission.  You'd save it in the session when the form is requested and
verify that the value submitted matches what's stored in the session.
It works well, but will only catch those submissions that are direct
post without first a get.  Should at least filter out bogus posts.

Another approach would be to add http://recaptcha.net support to the form.  
Go here http://recaptcha.net/resources.html for various libraries.
The result would be the same but more work for real people where the first
one handles it transparently.

Assuming that the bots first make a get request before the post then
then you have some other things to consider.

1) Banning IP's that makes excessive requests to the specific URL.

2) Validate the data being submitted that it makes sense of what's
requested.

3) Sanitize the data if it's headed for a database.  Escape the
string.

4) Integrate http://akismet.com/ into your application.  To catch known
and suspected spam automatically.  There are several libraries for
integrating with apps other than Wordpress.
http://akismet.com/development/

5) Finally, have a human review false positive and negatives to train
akismet better.

-- 
Jon Molesa
rjmolesa at consoltec.net
if you're bored or curious
http://rjmolesa.com


More information about the thelist mailing list