[thelist] Re: Email Obfuscation may be beaten?

John.Brooking at sappi.com John.Brooking at sappi.com
Mon Feb 2 11:28:34 CST 2004

Date: Mon, 2 Feb 2004 02:23:23 +0100
From: "Damien COLA" <damiencola at wanadoo.fr>
> ...
> As was being demonstrated earlier in this thread, I also believe that
> ASCII encoding is one obfuscation technique that worked when it was new,
> it's so easy to implement in a spam harvester program that it has been
> Javascript is still for now a new technique that need a javascript 
> interpreter to be embed in email haresting programs in order to be able to

> detect. So here's the one I use, given earlier on the list, I added the 
> subject, so the email link will be 
> mailto:myemail at mydomain.com?subject=aSubjectLine
> <script type="text/javascript"><!--
> function print_mail_to_link(lhs,rhs,subject)
> {
> document.write("<A HREF=\"mailto");
> document.write(":" + lhs + "@");
> document.write(rhs + "?subject=" + subject + "\">" + lhs + "@" + rhs +
"<\/A>"); } // --> </SCRIPT>
> and when you want to show an email + link you do :
> <SCRIPT LANGUAGE="JavaScript" type="text/javascript"><!-- 
> print_mail_to_link('myemail','mydomain.com','aSubjectLine')
> // --> </SCRIPT>

I spent some quality time :-) Friday writing an even more obfuscated
JavaScript function, but in the end convinced myself that the Holy Grail of
*long-term* spam harvester-proofing will take more than that. By long-term,
I mean that I don't want to be changing how I do all my mailto: links (and
possible my email address) once a year as the harvesters catch up with me.
I'm hankerin' for a permanent solution.

The basic problem, as I see it, is no matter how convoluted I make my JS
function, as soon as the human writing the harvester catches on to what
function "print_mail_to_link" does, all he/she has to is instruct the
harvester to read and concatenate the arguments (whether plain text or ASCII
codes) whenever it sees that function being called. It doesn't even have to
actually read or execute the function! Furthermore, the likelihood of some
spammer noticing this *increases* if more sites start using the same
function. This is undesirable if you're looking for a standard solution, as
I presume we are.

(Would a spammer go to this amount of trouble? Maybe, maybe not. I don't
know their psychology and where they consider the effort/payoff demarcation
line to be. So throughout this message, I'm assuming for the sake of
argument that it's possible that they would.)

I could make my function name short and meaningless, like maybe "a". If
other sites use the same function, they could call it "b", "c", "aoeu777",
etc., to decrease the likelihood of detection. But now Mr. Unfriendly
Neighborhood Harvester Author could instruct the software that there is a
function in use out there, it could be called anything, but it can be
recognized because the code contains the characters

    document.write(rhs + "?subject=" + subject + "\">" + lhs + "@" + rhs +
So if you, harvester software, see that line, back up to the word after
"function", and that's the function's name! Voila, now it can do the same
thing as before.

So, vary it by whitespace, you say? Any whitespace variation goes away if
the harvester simply removes all whitespace before comparing it to its
"signature". Vary the algorithm? Yes, but then we're back to a series of
solutions rather than a single standard one.

The little gem that someone pointed out last week at
http://www.hiveware.com/enkoder_form.php is brilliant; it has a few
different variations, and is virtually indecipherable (as the page says) to
harvesters. Of course, "virtually" is not the same as "truly" and
"proveably", and I would wager that sooner or later, some harvester author
will catch on, especially with more widespread use and publicity. Also, I
don't want to have to go to that site every time I want to drop in a mailto
link, it's just a pain. But as soon as I encapsulate it into a reusable
function with parameters, which is what would make it more convenient for
me, we're back to the function name recognition problem.

So I'm leaning towards always using a contact form, never a "mailto". You
can put links to your contact form wherever you want to. Anyone you don't
know who has a valid reason for contacting you can darn well use the form
first, then your reply, should you choose to make one, will have your actual
address for that person to use going forward.

What about referencing others' email addresses on your site? They deserve
the same consideration you're giving yourself, meaning not using a "mailto"
link. You could pass their address through your form, too, but now you're
back to putting an email address in your page source (as a form parameter)!
(Plus, you'd need to protect the form itself from use by spammers by making
sure your script allows only one destination address at a time, and probably
to a predefined select few addresses.)

The only thing I can think of here is to create yourself a server-only
"address book" of allowable addresses for the form, and refer to entries in
it only by some non-identifiable parameter such as the person's name or even
an ID number. (The ID number is most private, but if you happen to be
already mentioning the person's name anyway, using their name as the form
parameter isn't revealing any additional information. If it's a small list,
you might get away with using just first names, and maybe first initial of
last name to resolve duplicates. This seems to be the best combination of
easy to remember but still preserve necessary privacy.) It should be pretty
easy to modify your contact form's CGI script to respond to some identifier
by which to send to different people.

Automatic lengthy company disclaimer follows - sorry!

This message may contain information which is private, privileged or
confidential and is intended solely for the use of the individual or entity
named in the message. If you are not the intended recipient of this message,
please notify the sender thereof and destroy / delete the message. Neither
the sender nor Sappi Limited (including its subsidiaries and associated
companies) shall incur any liability resulting directly or indirectly from
accessing any of the attached files which may contain a virus or the like. 

More information about the thelist mailing list