Thoughts on Software Engineering: May 2013

Sunday, May 26, 2013

Blog Moving to a New Home

This blog is moving to Wordpress.com. You can find it at http://thoughtfulsoftware.wordpress.com. All the previous posts have been imported there so you can read everything at the new place!

Sunday, May 19, 2013

Many years ago I took a secure coding class. I mainly remember one thing from the course: “Assume all user input is evil.” This is fine because the instructors did say “If you only remember one thing from the course, remember this!”

What can a user do with input alone? Let’s say you are a malicious user of a web forum. You could create an account on the forum and set your display name to “<script>alert(‘surprise!’);</script>”. After registration, there is a script tag with javascript being stored in the database where your username should be.

With that in place, any time another user of the forum loads a page where your username is displayed (say on any of your comments) your custom javascript will execute on their browser! This is a Bad Thing because usually a malicious user would not just pop up “surprise” but instead use that snippet of javascript to, say, grab all the cookies in your browser and send them to said user for nefarious purposes.

This is called Cross Site Scripting (XSS) and works best on pages that are rendered on the server because the script is always loaded by the browser that way. The script might not be run if it’s added to the page after the page is loaded by an AJAX call. But an AJAX application can still be susceptible, for example if the username is added to the DOM like this:

// bad code that is susceptible to XSS

// JSON object returnedUser.displayName is

// “<script>alert(‘surprise!’);</script>”

var newdiv = document.createElement('div');

newdiv.innerHTML = returnedUser.displayName;

$('user_listing').append(newdiv);

Sometimes developers in an overzealous commitment to security decide to html encode all input, or strip all special characters such as “<” and “>”. There are worse things to be overzealously committed to, but we can do better. Instead of sanitizing every field every time, we can say that the check depends on what the content is supposed to represent and how it will be displayed. For example, in a user display name, a link tag is probably not valid to have as part of the name, but a link tag or style tag may be very helpful and relevant in a displayed product description. For this reason, most HTML sanitizers are very flexible about how they can be configured, so that we can easily allow different kinds of HTML in different places.

Now that we’ve agreed we need to sanitize user-supplied text, the next question is when to do the sanitizing. You could make an argument to sanitize user-supplied text on input or on output. A reason for sanitizing input is that philosophically it makes sense to catch potential security issues as early as possible, and you would avoid storing malicious input on your system at all. This takes malicious input and turns it into a validation issue, just like storing numbers for a zip code is a validation issue. A reason for sanitizing output is that the site then has the option to change sanitization policy dynamically. At some point html may be considered unsafe (say, allowing links in a product description) and later it could be considered safe (due to a business decision to allow it). If you sanitize only output, you are free to decide the policy whenever you want.

Some libraries are designed to operate on input. Lacking a strong driver to be able to change the sanitization policy dynamically, I would favor sanitizing on input as well.

Stay tuned for Part II where we’ll look at a slick way to sanitize input as an input validation problem!

Sunday, May 12, 2013

Containerless Web Applications Part I: Introduction

The first time I ever saw a Scala program run, it was running as a standalone program with the Play Framework. There was no application server, no servlet container, in fact no container of any kind. It was a program running all by itself, just like the programs we learned to write in college. From a JEE point of view, it was odd, but strangely beautiful.

Fast forward a few years, and we see this idea of container-less deployment becoming more widespread among the Java players. There has been a trend of developers looking for simpler ways to run their applications: think about people moving from EJB 2.0 to Spring, and from heavy Application Servers to simpler servlet containers. It only makes sense that eventually more people would eschew containers altogether and embrace the concept of putting HTTP-handling inside your application instead of outside.

This approach has a few names: containerless, war-less, or “embedded jetty” (if you want to google for more on this topic). There are some disadvantages, of course... nothing in life is free. But it’s the opinion of many developers that the advantages outweigh the disadvantages.

Let’s outline some of the advantages of containerless deployment:

Better IDE support Since it’s a regular application instead of something hosted in a web server, it can be easily started and stopped by any IDE without special plugins and without attaching a server to your IDE. Additionally, profiling is a breeze, and debugging is instantaneous (no need to attach your debugger to a remote server).

Simplified development Developers run the application exactly the same way in development as in production. This reduces the possibility of errors from differences between running in development and running in production. A pleasant side effect is that even between developers it’s impossible to have server-specific errors because someone ran the application on one server and someone else ran it on another.

Easier to deploy Deployments are easier to reproduce. There’s no need for the maven cargo plugin, there’s no copying from here to there, and there’s no extra server configuration that you need to maintain separately from the application.

Ease of re-use The application is already a jar and is easily usable as a standalone library itself.

Startup is faster Even for a small application, Tomcat can take a minute (or more). With an embedded server there is less for the container to do and it can start in mere seconds.

Less classloader issues Fewer classloaders means fewer classloader issues. Conflicts between your application’s dependent libraries and the libraries distributed with your server has been known to cause bizarre and difficult-to-diagnose classloader problems.

Hopefully this whets your appetite for containerless deployment. In upcoming posts, we will review a simple architecture for a containerless application (complete with working code).

Sunday, May 5, 2013

Using A Variable Depth Copy to Prevent Hibernate LazyInitializationException

What is LazyInitializationException?

I’m going to go out on a limb and say: Anybody who has ever used Hibernate has at some point triggered a LazyInitializationException. Seriously, questions about what this exception is and how to fix it pop up all the time.

What is a LazyInitializationException? In a nutshell, if you use Hibernate to load an entity from the database, and then try to access a lazily-loaded portion of that entity’s object graph from outside the database session, Hibernate will throw a LazyInitializationException. The simple answer is to just not access lazily loaded properties or collections outside of a session. However, the devil, as they say, is in the details.

If you’re unfamiliar with this exception and its solutions, I recommend reading up on the many options to solve this problem. The options have been covered enough and are google-able enough that I don’t need to discuss all of them here (except maybe to ask you to avoid OpenSessionInView). But what I can discuss here is my own solution to the problem, and show how my solution provides a two-for-one bundle of goodness.

What were the issues I was looking to solve?

Besides preventing this exception, first of all I was looking for a solution that would convert Hibernate objects into a DTO that looks the same as my regular entity / domain object. I investigated mapping with Dozer, but it looked a bit heavy for my needs, and I didn’t want to introduce another library unless it was actually necessary. Another option is to use a HibernateUnproxifier, but that still requires code to access each specific collection every time, and I was looking for something more general.

Secondly, I was looking for a solution that would load collections to the right depth (deep enough, or shallow enough). Sometimes I wanted just the top-level properties of an object, sometimes I wanted the object’s objects or collections, and so on. As mentioned above, the simple answer is that you can just lazily load the objects and collections you want as needed. But the devil in the details is that it quickly becomes cumbersome to modify your DAO or Service with methods to load this specific collection here and that specific collection there.

Finally, I was looking for a solution that prevents circular references. Sometimes my domain classes had to have circular references to satisfy certain Hibernate mappings. When marshalling these objects to JSON, the conversion would break. It’s easy to introduce @JsonBackReference, but again I felt like there had to be a better way.

Copy It!

Enter the VariableDepthCopier. With this class you can copy an object coming from the database into a new instance of the same class, and specify how deep the resulting object graph should be in the new copy. This copy can then be safely passed around or marshalled to the client without worrying about LazyInitializationExceptions.

As for how to specify the copy depth: an object copied with level 0 has just the primitives and java.lang immutable classes (such as String, Number, Date, etc). Non-primitive properties will be set to null, and collections will be empty. Level 1 contains non-primitive properties, and collections will be filled. These child objects and objects in collections are set to the equivalent of what we saw as level 0 for the first object. The pattern goes on, you can copy to level 2, level 3, and so on. I’m going to make the argument that you should not return a variable depth copy of level higher than 2.

How does this satisfy the three issues I was looking to solve?

First, this converts my domain object to a new object of the same class but without Hibernate’s persistent collections. This prevents the exception. Hibernate's persistent collections aren't copied, their contents are copied into the natural collection of the target class.This way we don’t need the HibernateUnproxifier, everything can be copied without knowledge of Hibernate.

Secondly, the copy still needs to be performed inside a transaction, but it can be done generically without needing to specify which specific collections are loaded. The copier provides complete control over how shallow or deep the copy/mapping occurs.

Finally, this copy performs cycle detection and sets any repeated copies to null. With this mapping technique, I was able to remove @JsonBackReference from all of my domain objects.

Caveats

It’s important to note that the copy is done according to bean properties, not necessarily through field reflection. Because of this, the object to be copied should be a domain object following the Java Bean pattern. Additionally, the copier depends on Spring's BeanUtil, so it works best in a project already using Spring. If this were to be distributed as a more general purpose tool, I would probably try to rework it to use field reflection, and to not have other dependencies. As of this writing, this solution works fine in my project so this is the state it's in right now.

Conclusion

As mentioned above, there are many ways to deal with LazyInitializationExceptions. I thought this was a neat and useful idea. If you have ever dealt with LazyInitializationExceptions, hopefully you’ll find it useful too.