Sunday, May 26, 2013
Blog Moving to a New Home
This blog is moving to Wordpress.com. You can find it at http://thoughtfulsoftware.wordpress.com. All the previous posts have been imported there so you can read everything at the new place!
Sunday, May 19, 2013
Sanitizing User Input, Part I
Many
years ago I took a secure coding class. I mainly remember one thing
from the course: “Assume all user input is evil.” This is fine because
the instructors did say “If you only remember one thing from the course,
remember this!”
What
can a user do with input alone? Let’s say you are a malicious user of a
web forum. You could create an account on the forum and set your
display name to “<script>alert(‘surprise!’);</script>”.
After registration, there is a script tag with javascript being stored
in the database where your username should be.
With
that in place, any time another user of the forum loads a page where
your username is displayed (say on any of your comments) your custom
javascript will execute on their browser! This is a Bad Thing because
usually a malicious user would not just pop up “surprise” but instead
use that snippet of javascript to, say, grab all the cookies in your
browser and send them to said user for nefarious purposes.
This is called Cross Site Scripting (XSS)
and works best on pages that are rendered on the server because the
script is always loaded by the browser that way. The script might not be
run if it’s added to the page after the page is loaded by an AJAX call.
But an AJAX application can still be susceptible, for example if the
username is added to the DOM like this:
// bad code that is susceptible to XSS
// JSON object returnedUser.displayName is
// “<script>alert(‘surprise!’);</script>”
var newdiv = document.createElement('div');
newdiv.innerHTML = returnedUser.displayName;
$('user_listing').append(newdiv);
Sometimes
developers in an overzealous commitment to security decide to html
encode all input, or strip all special characters such as “<” and
“>”. There are worse things to be overzealously committed to, but we
can do better. Instead of sanitizing every field every time, we can say
that the check depends on what the content is supposed to represent and
how it will be displayed. For example, in a user display name, a link
tag is probably not valid to have as part of the name, but a link tag or
style tag may be very helpful and relevant in a displayed product
description. For this reason, most HTML sanitizers are very flexible
about how they can be configured, so that we can easily allow different
kinds of HTML in different places.
Now
that we’ve agreed we need to sanitize user-supplied text, the next
question is when to do the sanitizing. You could make an argument to
sanitize user-supplied text on input or on output.
A reason for sanitizing input is that philosophically it makes sense to
catch potential security issues as early as possible, and you would
avoid storing malicious input on your system at all. This takes
malicious input and turns it into a validation issue, just like storing
numbers for a zip code is a validation issue. A reason for sanitizing
output is that the site then has the option to change sanitization
policy dynamically. At some point html may be considered unsafe (say,
allowing links in a product description) and later it could be
considered safe (due to a business decision to allow it). If you
sanitize only output, you are free to decide the policy whenever you
want.
Some libraries are designed to operate on input. Lacking a strong driver to be able to change the sanitization policy dynamically, I would favor sanitizing on input as well.
Stay tuned for Part II where we’ll look at a slick way to sanitize input as an input validation problem!
Sunday, May 12, 2013
Containerless Web Applications Part I: Introduction
The first time I ever saw a Scala program run, it was running as a standalone program with the Play Framework. There was no application server, no servlet container, in fact no container of any kind. It was a program running all by itself, just like the programs we learned to write in college. From a JEE point of view, it was odd, but strangely beautiful.
Fast forward a few years, and we see this idea of container-less deployment becoming more widespread among the Java players. There has been a trend of developers looking for simpler ways to run their applications: think about people moving from EJB 2.0 to Spring, and from heavy Application Servers to simpler servlet containers. It only makes sense that eventually more people would eschew containers altogether and embrace the concept of putting HTTP-handling inside your application instead of outside.
This approach has a few names: containerless, war-less, or “embedded jetty” (if you want to google for more on this topic). There are some disadvantages, of course... nothing in life is free. But it’s the opinion of many developers that the advantages outweigh the disadvantages.
Let’s outline some of the advantages of containerless deployment:
- Better IDE support Since it’s a regular application instead of something hosted in a web server, it can be easily started and stopped by any IDE without special plugins and without attaching a server to your IDE. Additionally, profiling is a breeze, and debugging is instantaneous (no need to attach your debugger to a remote server).
- Simplified development Developers run the application exactly the same way in development as in production. This reduces the possibility of errors from differences between running in development and running in production. A pleasant side effect is that even between developers it’s impossible to have server-specific errors because someone ran the application on one server and someone else ran it on another.
- Easier to deploy Deployments are easier to reproduce. There’s no need for the maven cargo plugin, there’s no copying from here to there, and there’s no extra server configuration that you need to maintain separately from the application.
- Ease of re-use The application is already a jar and is easily usable as a standalone library itself.
- Startup is faster Even for a small application, Tomcat can take a minute (or more). With an embedded server there is less for the container to do and it can start in mere seconds.
- Less classloader issues Fewer classloaders means fewer classloader issues. Conflicts between your application’s dependent libraries and the libraries distributed with your server has been known to cause bizarre and difficult-to-diagnose classloader problems.
Hopefully this whets your appetite for containerless deployment. In upcoming posts, we will review a simple architecture for a containerless application (complete with working code).
Sunday, May 5, 2013
Using A Variable Depth Copy to Prevent Hibernate LazyInitializationException
What is LazyInitializationException?
I’m going to go out on a limb and say: Anybody who has ever used Hibernate has at some point triggered a LazyInitializationException. Seriously, questions about what this exception is and how to fix it pop up all the time.
What
is a LazyInitializationException? In a nutshell, if you use Hibernate
to load an entity from the database, and then try to access a
lazily-loaded portion of that entity’s object graph from outside
the database session, Hibernate will throw a
LazyInitializationException. The simple answer is to just not access
lazily loaded properties or collections outside of a session. However,
the devil, as they say, is in the details.
If you’re unfamiliar with this exception and its solutions, I recommend reading up on the many options to solve
this problem. The options have been covered enough and are google-able
enough that I don’t need to discuss all of them here (except maybe to
ask you to avoid OpenSessionInView). But what I can discuss here is my own solution to the problem, and
show how my solution provides a two-for-one bundle of goodness.
What were the issues I was looking to solve?
Besides preventing this exception, first of all I was looking for a solution that would convert Hibernate objects into a DTO that looks the same as my regular entity / domain object. I investigated mapping with Dozer,
but it looked a bit heavy for my needs, and I didn’t want to introduce
another library unless it was actually necessary. Another option is to
use a HibernateUnproxifier, but that still requires code to access each specific collection every time, and I was looking for something more general.
Secondly,
I was looking for a solution that would load collections to the right
depth (deep enough, or shallow enough). Sometimes I wanted just the
top-level properties of an object, sometimes I wanted the object’s
objects or collections, and so on. As mentioned above, the simple answer
is that you can just lazily load the objects and collections you want
as needed. But the devil in the details is that it quickly becomes
cumbersome to modify your DAO or Service with methods to load this
specific collection here and that specific collection there.
Finally,
I was looking for a solution that prevents circular references.
Sometimes my domain classes had to have circular references to satisfy
certain Hibernate mappings. When marshalling these objects to JSON, the
conversion would break. It’s easy to introduce @JsonBackReference, but again I felt like there had to be a better way.
Copy It!
Enter the VariableDepthCopier.
With this class you can copy an object coming from the database into a
new instance of the same class, and specify how deep the resulting
object graph should be in the new copy.
This copy can then be safely passed around or marshalled to the client
without worrying about LazyInitializationExceptions.
As for how to specify the copy depth: an
object copied with level 0 has just the primitives and java.lang
immutable classes (such as String, Number, Date, etc). Non-primitive
properties will be set to null, and collections will be empty. Level 1
contains non-primitive properties, and collections will be filled. These
child objects and objects in collections are set to the equivalent of
what we saw as level 0 for the first object. The pattern goes on, you
can copy to level 2, level 3, and so on. I’m going to make the argument
that you should not return a variable depth copy of level
higher than 2.
How does this satisfy the three issues I was looking to solve?
First, this
converts my domain object to a new object of the same class but without
Hibernate’s persistent collections. This prevents the exception.
Hibernate's persistent collections aren't copied, their contents are
copied into the natural collection of the target class.This way we don’t
need the HibernateUnproxifier, everything can be copied without knowledge of Hibernate.
Secondly,
the copy still needs to be performed inside a transaction, but it can
be done generically without needing to specify which specific
collections are loaded. The copier provides complete control over how
shallow or deep the copy/mapping occurs.
Finally,
this copy performs cycle detection and sets any repeated copies to
null. With this mapping technique, I was able to remove @JsonBackReference from all of my domain objects.
Caveats
It’s important to note that the copy
is done according to bean properties, not necessarily through field
reflection. Because of this, the object to be copied should be a domain
object following the Java Bean pattern. Additionally, the copier depends on Spring's BeanUtil, so it works best in a project already using Spring. If this were to be distributed as a more general purpose tool, I would probably try to rework it to use field reflection, and to not have other dependencies. As of this writing, this solution works fine in my project so this is the state it's in right now.
Conclusion
As
mentioned above, there are many ways to deal with
LazyInitializationExceptions. I thought this was a neat and useful idea.
If you have ever dealt with LazyInitializationExceptions, hopefully
you’ll find it useful too.
Subscribe to:
Posts (Atom)