Many
years ago I took a secure coding class. I mainly remember one thing
from the course: “Assume all user input is evil.” This is fine because
the instructors did say “If you only remember one thing from the course,
remember this!”
What
can a user do with input alone? Let’s say you are a malicious user of a
web forum. You could create an account on the forum and set your
display name to “<script>alert(‘surprise!’);</script>”.
After registration, there is a script tag with javascript being stored
in the database where your username should be.
With
that in place, any time another user of the forum loads a page where
your username is displayed (say on any of your comments) your custom
javascript will execute on their browser! This is a Bad Thing because
usually a malicious user would not just pop up “surprise” but instead
use that snippet of javascript to, say, grab all the cookies in your
browser and send them to said user for nefarious purposes.
This is called Cross Site Scripting (XSS)
and works best on pages that are rendered on the server because the
script is always loaded by the browser that way. The script might not be
run if it’s added to the page after the page is loaded by an AJAX call.
But an AJAX application can still be susceptible, for example if the
username is added to the DOM like this:
// bad code that is susceptible to XSS
// JSON object returnedUser.displayName is
// “<script>alert(‘surprise!’);</script>”
var newdiv = document.createElement('div');
newdiv.innerHTML = returnedUser.displayName;
$('user_listing').append(newdiv);
Sometimes
developers in an overzealous commitment to security decide to html
encode all input, or strip all special characters such as “<” and
“>”. There are worse things to be overzealously committed to, but we
can do better. Instead of sanitizing every field every time, we can say
that the check depends on what the content is supposed to represent and
how it will be displayed. For example, in a user display name, a link
tag is probably not valid to have as part of the name, but a link tag or
style tag may be very helpful and relevant in a displayed product
description. For this reason, most HTML sanitizers are very flexible
about how they can be configured, so that we can easily allow different
kinds of HTML in different places.
Now
that we’ve agreed we need to sanitize user-supplied text, the next
question is when to do the sanitizing. You could make an argument to
sanitize user-supplied text on input or on output.
A reason for sanitizing input is that philosophically it makes sense to
catch potential security issues as early as possible, and you would
avoid storing malicious input on your system at all. This takes
malicious input and turns it into a validation issue, just like storing
numbers for a zip code is a validation issue. A reason for sanitizing
output is that the site then has the option to change sanitization
policy dynamically. At some point html may be considered unsafe (say,
allowing links in a product description) and later it could be
considered safe (due to a business decision to allow it). If you
sanitize only output, you are free to decide the policy whenever you
want.
Some libraries are designed to operate on input. Lacking a strong driver to be able to change the sanitization policy dynamically, I would favor sanitizing on input as well.
Stay tuned for Part II where we’ll look at a slick way to sanitize input as an input validation problem!
No comments:
Post a Comment