[thelist] Form Security

Sun Jul 18 19:46:44 CDT 2010

I started with a simple answer, then got carried away. Forgive the  
length. Hope this helps.

Form security starts before the user hits your form's submit button.  
Before you ever start processing your forms, consider the following:

First, where did it come from? Geographically? From where on the  
network? From which page? Did the user even tap the data entry form?

Second: How fast is it coming?

Third: What kind is it?

Some "meta strategies".

1) Restrict how data is able to be passed to your application, thence  
to the database. As one example, consider rejecting all data that is  
not in the POST scope, rather than the GET scope.

2) When possible, use pre-set data rather that text input fields.  
Dates, counties, regions, pre-determined answers in a select or radio  
button will be inherently more secure than a black field. And they're  
usually easier to use, too.

3) When there's a security validation error, treat it as an honest  
error, don't give away too much feedback. No need to tell the user  
"SECURITY VIOLATION! 403! ACCESS DENIED, BAD HACKER PERSON!!!" Simply  
and quietly send them back to where the data should be correctly  
entered, and give them a clear indication of how to fill correctly out  
a field.

4) Before doing an insert or update query, consider that it might be a  
good idea to ensure that a session validation exists. No going  
straight to the form action page. Perhaps insist that they have tapped  
at least the home page before they can access the form page. This is  
at your discretion.

The Actual Form processing

Let's consider that a user's input on a simple form field can be  
addressed in many very simple ways, from most basic to more refined,  
each creating a framework that enhances security without necessarily  
addressing each move as a hacker threat, where, the overall product of  
your validation will both ensure a higher level of security, database  
consistency and good user feedback.

The overall strategy is to validate input from largest to smallest  
possibilities. Instead of taking one string and chewing the heck out  
of it, I prefer to layer-by-layer, from vaguest to most specific  
discard non-matching elements--a Default Deny policy. That means that  
I need to start off with a very clear idea of what I DO permit.

Let's think about our data

When you get a bad value, you can either throw an error, or simply  
drop it altogether and/or replace it with a default value. My  
preference is to always throw an error. It leaves less mystery in the  
logs, database and makes debugging and maintenance easier in the long  
run.

I'm not a PHP guy, but the notions are language independent

validate input existence
validate input type
validate input range
validate input form
validate relational data (optional)
strip strings (optional)

Example fields:
    name
    birthyear
    phone
    email

Step 1: test for existence of value.

There are two ways to approach existence validation: the first is to  
instantiate a default value ("Name Missing") and provide it by  
default, where if the user fails to enter their name, the system will  
process the string "Name Missing". While this eases up the user's  
load, it doesn't make for very good validation and wastes database  
space. The second, and most likely the most frequent is to test for  
the length of the string of form.name. If form.name length == 0; fail,  
else, pass

Step 2: defining and testing for value types.

The first thing we need to do is to set a type. This might seem  
obvious, because your database forces you to choose a type when you  
create the column. It's wise to orient your entire strategy around  
that type per-column. So, we also force consistency in all code long  
before it ever gets close to the database.

Types
    name = string
    birthyear = int
    phone = string
    email = string

The first thing you can do is either check to see if they are of the  
desired type. Something like IsString() or IsInt() should exist. If  
the input submitted does not match, you have a choice:

a) Throw an error
b) Attempt to force (cast) into the type and if successful continue as  
normal: toInt(myInput)
c) Attempt to force (cast) into the type and if failed throw an error

The pro of a) is that you potentially block automated inputs. The con  
is that the user does more work.
The pro of b) is that it's less work for the user. The con is that  
your database might get filled with junk
The pro of c) is that it catches the error, there is no con, other  
than the user having to correct

At this point your required input exists, and is of the correct type.

Step 3: Defining our input ranges.

Whether you are validating a string, or a number or a date or some  
binary, there will always be a loose range (length and content) to the  
input.

Some examples

name:  More than 5 chars long, and less than 60 chars long (for  
example).
How many people do you know have a total of 200 chars in their name?  
Or 4? How common is it to use numbers in one's name? I'd hazard, it's  
not common. Even Thurston Powell III would use roman numberal. We can  
with reasonable assurance filter on [^0-9]. What about Asterixes? Math  
symbols? [^0-9*%;+] Now we're tightening things up. I've found that  
simply ensuring that there's no semi-colon goes a long way. D

Don't strip errors out, fail them.

birthyear:  this one is easy: four digits [0-9]{4,} && input < this  
year - 21. Unless you want to permit someone not yet born, or underage  
to submit! :)

phone: allowed chars [ \(\)\-\.0-9]

The excellent part of this bit, is that it forces you to think in  
terms of what you will allow in and what you will process. "Bob" has  
three characters. Is that enough info for your business? Even most  
short Chinese names will have at least 4 chars "Yue Xi" (that's 6 :) )

Step 4: Defining the form of our data

This one gets a bit trickier, but is important. Let's look at name

"Bob Jones", "Mary Edward-Smith", "Rob: Roy", "Jean Dupré", "Yrjö  
Lindegren". Do we want to ensure there's a minimum of 1 char space 3+  
chars? That might be reasonable.

What about phone numbers? When it comes to phone numbers, I like to  
strip all but the digits, then count how many there are. Depending on  
your type of form, it might be a good idea to add an extra field for  
the telephone extension so as to not have to handle it in the phone  
field.

As a general rule, I tend to use Coldfusion's native function  
IsValid("phone", myInput) or IsValid("email", myInput). It's a real  
time saver, probably better than what I could write in a short amount  
of time. I'm sure PHP must have a similar validation function.

See here for a good postal code/zip regex:
http://geekswithblogs.net/MainaD/archive/2007/12/03/117321.aspx

Step 5: Validate relational data (optional)

This is more conceptual. If you have multiple countries, regions or  
provinces, this is where you ensure that they match up. Alabama is not  
in Canada. Or if you have a credit card number, you might want to see  
if the MasterCard number format matches up the user's input. How often  
have you entered a Visa number and forgot to change the default radio  
button? You might want to check for birthdays being in the past, or  
delivery dates being in the future.

Step 6: Strip strings or deny

This is where we get into the anti-xss or anti-sql injection domain.  
If you've done the above, you'll find that very little of it is  
necessary. I should emphasize that *less* of it will be necessary.

I do this at three levels:

The first is at my apache .htaccess level.

a) I deny known bad-guy domains and IPs
b) I deny known query string exploits
c) I deny certain use agents and proxies

(Email me privately if you'd like me to send you a sanitized version  
of a default security .htaccess file I use).

The second is at the request level of each page. I have a set of  
filters that search for strings such as ";" or words such as "select,  
insert, update union, drop" etc... Note: If you have admin forms that  
use "Update" in the button, it won't work, so choose which pages you  
allow this to run on. You'll also want to consider how this interacts  
with your fields--what if it's a discussion forum? Consider whether  
you want to transform the text and how.

Most of this will have been caught by the .htaccess document, but  
maybe we're in an IIS machine (Bleh!). So I don't mind making the  
server work a bit harder for the extra security.

The third is on the actual validation, per-field at run time. I never,  
never permit any data to go near my database that has not been  
processed first by code.

All of this is done server side: the simple reason is that most bots  
or attackers (IMO) will not be using javascript enabled agents. Thus,  
by lucky fortune for them, the lack of extra work to create a js  
parser will also bypass an "javascript security routines" (yeah,  
notice it's in quotes).

Additionally, Coldfusion (my native language) also does validation at  
the query level (<cfqueryparam ...>). I'm not aware if PHP has these  
types of safeguards.

If your form is international, meaning that it can accept inputs from  
various character sets (think of accents, or non-roman languages such  
as cyrillic or kanji), I'd recommend that you have some type of  
language detector system that shunts to the right set of validation  
routines.

Regardless of your language, I hope the above gives you a framework  
for thinking about form security. The fundamental principle is simply  
this: Look at the source providing the data, look for and restrict  
every possible way that data can make it to your database and, assess  
and process ALL that can come in.

Hope this is useful.

On 2010-07-15, at 4:19 PM, DAVOUD TOHIDY wrote:
> Hi there,
>
> I am working on my employer's site. I have a search engine and  
> Contact form. I have taken all the steps that I am aware of to  
> tighten the security such as using :
>
> $name =  
> mysql_real_escape_string 
> (strip_tags(stripslashes(htmlentities(trim($_POST['name'])))));
>
> I am planning to log the user in on the fly to the database without  
> letting the user know while providing the user with a user type with  
> "USER" priviliges.
>
> Does this make any sense at all in terms of increasing the security  
> of input by the user in search field and or in contact form fields?
>
> Unfortunately I will not be able to provide you with more source code.

--
Frank Marion
lists [_at_] frankmarion.com