Best practices for input validation with Active Server Pages

Intro

A NT based web site can be administrated very well, up to date with security hot fixes, well hardened (see the recent ora book by Stefan Norberg for an good resource on this), on a properly secured network, but your site can still be easily cracked if the asp code does not properly handle input.

As a former asp developer and now a "security guy" who does a lot of code reviews, I see many common coding errors over and over which can diminish the security of a site very easily.

Unfortunately, these problems are not widely discussed and it is very common for developers to be unaware of them, so I will attempt to cover them here. If anyone has any suggestions, feels I have omitted anything important, or spots any errors, then I would love to hear from you at the address below.

The scope of this article.

There is nothing new or revolutionary in this article. Much of it is well covered by Rain Forest Puppy at http://www.wiretrip.net/rfp/p/doc.asp?id=7&iface=2, which is well worth reading for the other issues it covers, and much of it is adapted from general input checking principles that applies to many other languages/applications. This article merely attempts to suggest best practices to deal with input to Active Server Pages. I assume your database uses T-SQL and VBScript is your scripting language.

The Problem

Active Server Pages provide a very quick, flexible method of creating a dynamic data backed web site. It integrates easily with data sources via ActiveX Data Objects. The following code snippet illustrates the ease with which you can query and use the results from a database.

Set cn = Server.CreateObject("ADODB.Connection")
cn.open "DSNname", "UserName", "Password"
'Previous two lines should be in a separate include file

StrSQL = "Select Name, Number From AddressBook"
Set rs = cn.Execute(StrSQL)

Do While not rs.EOF
     Response.Write "The telephone number of " & RS("Name") & " is " & RS("Number") & "<BR>"
     rs.MoveNext
Loop

So far so good. No real room for any problems to be introduced. However, code of the following form is very common.

Name   = Request.QueryString("Name")
StrSQL = "Select Name, Number From AddressBook where Name like '%" & Name & "%'"

The obvious intention here is to provide a method for displaying all records whose name matches the string provided. Unfortunately, it's trivial for an attacker to create an URL of the form

http://www.victim.com/getaddress.asp?Name=blah';delete%20*%20from%20sysobjects%20--

which will result in the string being

Select Name, Number From AddressBook where Name Like '%blah';delete * from sysobjects -- %'

The SQL server will happily execute this and, permissions allowing, delete your sysobjects table. Obviously, the statement is not limited to this and could access extended stored procedures to execute shell commands, query other tables, alter tables etc. This situation can be avoided by ensuring that quotes in input strings are double quoted as follows. (This specifically assumes T-SQL, which escapes quotes in this manner. Other database systems may behave differently, such as requiring a backslash '\' character before a quote, and may also need the the '"' character to be handled in a similar fashion).

Set re = Server.CreateObject("VBScript.RegExp")

Name   = Request.QueryString("Name")
re.Pattern = "'"
re.Global = true
Name   = re.Replace(Name, "''")
StrSQL = "Select Name, Number From AddressBook where Name like '%" & Name & "%'"

For certain types of queries, you may also wish to consider what effect inserting a character such as % into your query will have. Of course, this raises the question "How can I be sure I've covered all the possible bad characters?". It can be difficult to predict what uses data stored in a database will be used for, and what interpreters will act upon this data (for example, html browsers as discussed below) Best practice in this type of situation usually dictates "Know what to expect, and disallow everything else". The RegExp object is suitable for this task. The RegExp object is available with VBScript 5, which comes with Service Pack 5. The following example gets rid of all disallowed characters, but it may be more appropriate (depending on circumstances) to respond with an error or advise the user if a character is found which is not legal. Documentation on regular expressions is available from Microsoft's web site.

Set re = Server.CreateObject("VBScript.RegExp")

Username   = Request.QueryString("Username")
re.Pattern = "[^a-zA-Z0-9\.\-]"       ' Usernames on this system can only have letters, numbers, dots and hyphens
re.Global  = true
Username   = re.Replace(Username, "")
StrSQL     = "Select Name, Number From AddressBook where Username = '" & Username & "'"

When accepting strings for input, it can also be an issue if a user is allowed to enter html tags into a piece of text that will later be displayed as output on a html page (this can include data which will be sent back to the page it originated from to be used as the value parameter to an input tag, although this poses a lesser security risk). ASP provides a useful method as part of the Server object called HTMLEncode() for encoding such strings.

Set re  = Server.CreateObject("VBScript.RegExp")

BigText = Request.Form("usertext")
BigText = Server.HTMLEncode(BigText)

re.Pattern = "'"
re.Global  = true
BigText = re.Replace(BigText, "''")

StrSQL  = "Insert into UserComments (usertext) Values('" & BigText & "')"

So, the main concern with strings and databases is ensuring that the quotes in strings are properly escaped, and that the content of the data is checked so that anything else making use of the data inserted into the database will act in a defined manner.
This cannot be done with other data types however. Non-strings will not be quoted in SQL to begin with, but VBScript (and most other languages which can be used with ASP) are typeless and can not tell the difference between where it is supposed to receive a number, date etc., and where it should receive a string. A situation like the following is the most common security error I find while reviewing ASP code.

ID     = Request.QueryString("ID")
StrSQL = "Select Name, Number From AddressBook where AddressID = " & ID
Set rs = cn.Execute(StrSQL)

Much like the first example of why strings should have quotes escaped, an URL of the form

http://victim.com/address.asp?ID=3;delete%20*%20from%20sysobjects

will cause a second query to be executed. Of course, the obvious solution is to ensure that the parameter is numeric, as intended.

ID     = Request.QueryString("ID")
If not IsNumeric(ID) Then           'Someone feeding bogus info
     Response.Write("Invalid input")
     Response.End
End If
StrSQL = "Select Name, Number From AddressBook where AddressID = " & ID
Set rs = cn.Execute(StrSQL)

Upon detecting invalid input, the code prints an error and exits. Depending on the site or your requirements you could set the variable to a default and continue on, redirect to a start page etc.

The same principle holds with dates or any other data type. e.g, If dDate should hold a Date type, then check it with IsDate(dDate).

  dDate  = Request.QueryString("currentDate")
  If not IsDate(dDate) Then
     Response.Write("Invalid input")
     Response.End
  End If

It is not unusual for Dates or Currency types to be communicated to the server via a hidden field, and then used as part of the Where clause of a SQL statement. As noted above, the correct way to validate input is to know what to expect, but this is not easily done with regular expressions.
Fortunately, Most if not all types have a IsXXX() check, or can at least be cast using a CXXX() call (where the 'XXX' represents a type, e.g. CCur() ). This is good practice for catching errors anyway.

Thanks to Gaddo Bennedetti, Craig McFadyen, Jon Hanna and Graham Breed for providing worthwhile feedback on this document. Any remaining faults or shortcomings are my own fault, of course.

Jerry Connolly,

jerry@nologin.net