Sunday, May 26, 2013

Blog Moving to a New Home

This blog is moving to Wordpress.com. You can find it at http://thoughtfulsoftware.wordpress.com. All the previous posts have been imported there so you can read everything at the new place!

Sunday, May 19, 2013

Sanitizing User Input, Part I


Many years ago I took a secure coding class. I mainly remember one thing from the course: “Assume all user input is evil.” This is fine because the instructors did say “If you only remember one thing from the course, remember this!”


What can a user do with input alone? Let’s say you are a malicious user of a web forum. You could create an account on the forum and set your display name to “<script>alert(‘surprise!’);</script>”. After registration, there is a script tag with javascript being stored in the database where your username should be.


With that in place, any time another user of the forum loads a page where your username is displayed (say on any of your comments) your custom javascript will execute on their browser! This is a Bad Thing because usually a malicious user would not just pop up “surprise” but instead use that snippet of javascript to, say, grab all the cookies in your browser and send them to said user for nefarious purposes.


This is called Cross Site Scripting (XSS) and works best on pages that are rendered on the server because the script is always loaded by the browser that way. The script might not be run if it’s added to the page after the page is loaded by an AJAX call. But an AJAX application can still be susceptible, for example if the username is added to the DOM like this:


  // bad code that is susceptible to XSS
  // JSON object returnedUser.displayName is
  //  “<script>alert(‘surprise!’);</script>”
  var newdiv = document.createElement('div');
  newdiv.innerHTML = returnedUser.displayName;
  $('user_listing').append(newdiv);


Sometimes developers in an overzealous commitment to security decide to html encode all input, or strip all special characters such as “<” and “>”. There are worse things to be overzealously committed to, but we can do better. Instead of sanitizing every field every time, we can say that the check depends on what the content is supposed to represent and how it will be displayed. For example, in a user display name, a link tag is probably not valid to have as part of the name, but a link tag or style tag may be very helpful and relevant in a displayed product description. For this reason, most HTML sanitizers are very flexible about how they can be configured, so that we can easily allow different kinds of HTML in different places.


Now that we’ve agreed we need to sanitize user-supplied text, the next question is when to do the sanitizing. You could make an argument to sanitize user-supplied text on input or on output. A reason for sanitizing input is that philosophically it makes sense to catch potential security issues as early as possible, and you would avoid storing malicious input on your system at all. This takes malicious input and turns it into a validation issue, just like storing numbers for a zip code is a validation issue. A reason for sanitizing output is that the site then has the option to change sanitization policy dynamically. At some point html may be considered unsafe (say, allowing links in a product description) and later it could be considered safe (due to a business decision to allow it). If you sanitize only output, you are free to decide the policy whenever you want.


Some libraries are designed to operate on input. Lacking a strong driver to be able to change the sanitization policy dynamically, I would favor sanitizing on input as well.


Stay tuned for Part II where we’ll look at a slick way to sanitize input as an input validation problem!

Sunday, May 12, 2013

Containerless Web Applications Part I: Introduction



The first time I ever saw a Scala program run, it was running as a standalone program with the Play Framework. There was no application server, no servlet container, in fact no container of any kind. It was a program running all by itself, just like the programs we learned to write in college. From a JEE point of view, it was odd, but strangely beautiful.

Fast forward a few years, and we see this idea of container-less deployment becoming more widespread among the Java players. There has been a trend of developers looking for simpler ways to run their applications: think about people moving from EJB 2.0 to Spring, and from heavy Application Servers to simpler servlet containers. It only makes sense that eventually more people would eschew containers altogether and embrace the concept of putting HTTP-handling inside your application instead of outside.

This approach has a few names: containerless, war-less, or “embedded jetty” (if you want to google for more on this topic). There are some disadvantages, of course... nothing in life is free. But it’s the opinion of many developers that the advantages outweigh the disadvantages.

Let’s outline some of the advantages of containerless deployment:

  • Better IDE support   Since it’s a regular application instead of something hosted in a web server, it can be easily started and stopped by any IDE without special plugins and without attaching a server to your IDE. Additionally, profiling is a breeze, and debugging is instantaneous (no need to attach your debugger to a remote server).

  • Simplified development  Developers run the application exactly the same way in development as in production. This reduces the possibility of errors from differences between running in development and running in production. A pleasant side effect is that even between developers it’s impossible to have server-specific errors because someone ran the application on one server and someone else ran it on another.

  • Easier to deploy  Deployments are easier to reproduce. There’s no need for the maven cargo plugin, there’s no copying from here to there, and there’s no extra server configuration that you need to maintain separately from the application.

  • Ease of re-use  The application is already a jar and is easily usable as a standalone library itself.

  • Startup is faster Even for a small application, Tomcat can take a minute (or more). With an embedded server there is less for the container to do and it can start in mere seconds.

  • Less classloader issues Fewer classloaders means fewer classloader issues. Conflicts between your application’s dependent libraries and the libraries distributed with your server has been known to cause bizarre and difficult-to-diagnose classloader problems.

Hopefully this whets your appetite for containerless deployment. In upcoming posts, we will review a simple architecture for a containerless application (complete with working code).

Sunday, May 5, 2013

Using A Variable Depth Copy to Prevent Hibernate LazyInitializationException

What is LazyInitializationException?


I’m going to go out on a limb and say: Anybody who has ever used Hibernate has at some point triggered a LazyInitializationException. Seriously, questions about what this exception is and how to fix it pop up all the time.

What is a LazyInitializationException? In a nutshell, if you use Hibernate to load an entity from the database, and then try to access a lazily-loaded portion of that entity’s object graph from outside the database session, Hibernate will throw a LazyInitializationException. The simple answer is to just not access lazily loaded properties or collections outside of a session. However, the devil, as they say, is in the details.

If you’re unfamiliar with this exception and its solutions, I recommend reading up on the many options to solve this problem. The options have been covered enough and are google-able enough that I don’t need to discuss all of them here (except maybe to ask you to avoid OpenSessionInView). But what I can discuss here is my own solution to the problem, and show how my solution provides a two-for-one bundle of goodness.

What were the issues I was looking to solve?


Besides preventing this exception, first of all I was looking for a solution that would convert Hibernate objects into a DTO that looks the same as my regular entity / domain object. I investigated mapping with Dozer, but it looked a bit heavy for my needs, and I didn’t want to introduce another library unless it was actually necessary. Another option is to use a HibernateUnproxifier, but that still requires code to access each specific collection every time, and I was looking for something more general.

Secondly, I was looking for a solution that would load collections to the right depth (deep enough, or shallow enough). Sometimes I wanted just the top-level properties of an object, sometimes I wanted the object’s objects or collections, and so on. As mentioned above, the simple answer is that you can just lazily load the objects and collections you want as needed. But the devil in the details is that it quickly becomes cumbersome to modify your DAO or Service with methods to load this specific collection here and that specific collection there.

Finally, I was looking for a solution that prevents circular references.  Sometimes my domain classes had to have circular references to satisfy certain Hibernate mappings. When marshalling these objects to JSON, the conversion would break. It’s easy to introduce @JsonBackReference, but again I felt like there had to be a better way.

Copy It!


Enter the VariableDepthCopier. With this class you can copy an object coming from the database into a new instance of the same class, and specify how deep the resulting object graph should be in the new copy. This copy can then be safely passed around or marshalled to the client without worrying about LazyInitializationExceptions.

As for how to specify the copy depth: an object copied with level 0 has just the primitives and java.lang immutable classes (such as String, Number, Date, etc). Non-primitive properties will be set to null, and collections will be empty. Level 1 contains non-primitive properties, and collections will be filled. These child objects and objects in collections are set to the equivalent of what we saw as level 0 for the first object. The pattern goes on, you can copy to level 2, level 3, and so on. I’m going to make the argument that you should not return a variable depth copy of level higher than 2.

How does this satisfy the three issues I was looking to solve?


First, this converts my domain object to a new object of the same class but without Hibernate’s persistent collections. This prevents the exception. Hibernate's persistent collections aren't copied, their contents are copied into the natural collection of the target class.This way we don’t need the HibernateUnproxifier, everything can be copied without knowledge of Hibernate.

Secondly, the copy still needs to be performed inside a transaction, but it can be done generically without needing to specify which specific collections are loaded. The copier provides complete control over how shallow or deep the copy/mapping occurs.

Finally, this copy performs cycle detection and sets any repeated copies to null. With this mapping technique, I was able to remove @JsonBackReference from all of my domain objects.

Caveats

It’s important to note that the copy is done according to bean properties, not necessarily through field reflection. Because of this, the object to be copied should be a domain object following the Java Bean pattern. Additionally, the copier depends on Spring's BeanUtil, so it works best in a project already using Spring. If this were to be distributed as a more general purpose tool, I would probably try to rework it to use field reflection, and to not have other dependencies. As of this writing, this solution works fine in my project so this is the state it's in right now.


Conclusion


As mentioned above, there are many ways to deal with LazyInitializationExceptions. I thought this was a neat and useful idea. If you have ever dealt with LazyInitializationExceptions, hopefully you’ll find it useful too.

Sunday, April 28, 2013

How to run an IDE INSIDE a Vagrant VM


Vagrant is intended as a runtime environment for use with your development, not as the development environment (IDE, tools, etc) itself. But I thought it would be nice if I could run my IDE inside the VM. This would have several advantages, including: leverage IDE shortcuts for executing the app, make it easier to connect the debugger, and ease ramp up time for other developers.


So I looked into setting up a Vagrant VM with Netbeans. The necessary steps are downloading and installing Netbeans in a provisioner, and forwarding X11 from the VM to the Host.

There are many options to do silent installation of Netbeans. I had this inside my shell provisioner (note, this takes quite a while to download and install, and you should do this after installing java)
wget http://download.netbeans.org/netbeans/7.3/final/bundles/netbeans-7.3-javaee-linux.sh
chmod 755 netbeans-7.3-javaee-linux.sh
sudo ./netbeans-7.3-javaee-linux.sh --silent -J-Dnb-base.installation.location=/opt/netbeans-7.3
rm netbeans-7.3-javaee-linux.sh

To do X11 forwarding requires a few steps
  • Add to Vagrantfile: config.ssh.forward_x11 = "false"
  • run vagrant with “vagrant up --no-provision” so you don’t download netbeans again!
  • vagrant ssh into the VM, running netbeans brings up the UI on your host display, it “just works!”

At this point, it works but performance is horrible. Clicking on a menu in Netbeans takes about 12 seconds to respond.

To improve the performance, we can specify a faster cipher and compression for X11. To specify this, connect with ssh directly instead of using vagrant ssh so we can pass in the ssh options. Use vagrant ssh-config to configure ssh to connect to the vagrant VM: vagrant ssh-config > ~/.ssh/config (be careful not to blow it away if that file already exists). The command looks like ssh -F ~/.ssh/config -c arcfour,blowfish-cbc -XC default  Note that the ssh config is set to “Host default” and we need to specify that (“default”) in the ssh command. Using localhost in the ssh command will not work. After this change, performance is much better but still bad (response time opening menus goes from 12 seconds to 4 seconds)

Additional things to try include manually modifying the VM video RAM to 64MB from 8, but that had no effect. (Indeed another VM I use for development that has the full gnome desktop installed and regularly run an IDE with no issue has only 12MB of video RAM) Another initially promising option was to enable VirtualBox 2D acceleration, but that is for Windows guests only so is not an option for Linux guests.

More potential steps you can take to improve the performance
  • VMWare’s provider is rumored to have special graphics optimizations that let you work with an IDE inside a VM.
  • You can try VRDP, it may have better performance than X11
  • NX Server is supposed to be faster than X11


In conclusion, using Vagrant as your complete development environment does not really work out as well as you’d hope. Perhaps not so surprisingly, it’s best to use Vagrant as it was intended.

Sunday, April 21, 2013

Clock Angle Problem Part II

Recently I posted a solution to the clock angle problem. At the end we were left with a few questions involving how to spice the problem up a little bit. In this post we will explore these question more. Namely, we will introduce a seconds hand along with the hour and minute hands, and we will explore caching.


First of all: what if the hour and minute hands ticked for every second and we wanted second-level accuracy? What we need to do is presume that there’s a second hand, and add the degrees changed for each second to the hour and minute components for each second tick. We can find the math for it, and write some code like this:


  // calculate all units in degrees per second
  // minute hand is 6 deg/minute plus 1/10 deg/sec
  double md = m*6 + s*0.1;
  // hour hand goes through 30 degrees per hour
  // plus 30/60=1/2 deg/min,
  // plus (1/2)/60=1/120 deg/sec
  double hd = h*30 + m*0.5 + s*(1d/120d);
  double result = Math.abs(hd-md);
  return result > 180 ? 360 - result : result;


Second of all, if the calculation was really expensive, how could we implement caching? (Hint: Not WeakHashMap) There are a couple ways to approach this.


It helps to understand the nature of what we’re computing. Is it time-intensive? Memory-intensive? Does it make sense to pre-compute values? In my caching example I took two approaches: one that pre-computes all possible values, and one that caches values as they are calculated.


It might make sense to pre-compute values if the set of possible values is limited and known, and the calculations do not adversely affect the startup time of the application. In this case, both are true. If the calculation took a lot of time or the amount of data to store was extraordinary (say, if we had a clock with nanosecond precision!) then this approach would obviously not be viable.


If pre-computing is not viable, we need a way to determine a way to store cached values. If we are already using a caching solution elsewhere in our application already, the choice is easy and we can just incorporate that. One options is Spring Cache, which  provides a nice abstraction over a couple implementations (see some handy posts here, and here). Additionally, you can include EHCache and use it directly.


However, if you are not already using caching and just need a quick one-off caching solution (on the way to a more mature solution as your needs grow... no need to re-invent the wheel), we can try extending LinkedHashMap as a simple cache. Check out how we can incorporate it directly into a caching layer over the ClockAngle class. Again, this is not a full featured cache solution, just a quick and dirty solution that may work for a focused part of your application for the time being.


Finally, if we have multiple classes doing the calculations and multiple caching techniques, it becomes apparent that there is duplicate code (in the core calculation, and in any input validation). It makes sense to refactor out common code and use the Decorator Pattern to compose these classes together. I think the final result is beautiful, simple, and easy to test!


Sunday, April 14, 2013

Accessing MySQL Instance In Vagrant VM


I’ve been playing around with Vagrant to set up development environments.

One thing that I thought would be useful would be to connect to the database which is running inside your Vagrant machine from the host machine. That way it would be easier to inspect or tweak what the database was doing while the application is running with a GUI like MySQL Workbench. Additionally you could reset the database with a SQL script from the host without dropping into vagrant ssh.

It’s not hard, but there are a couple steps you have to do to make that connection work.


Set up port forwarding



Inside the Vagrantfile, the line would look something like this
config.vm.network :forwarded_port, guest: 3306, host: 3309



Make sure the MySQL user can connect from outside localhost



We can do this at provision time when you create the database and create the user. One way to do this is for the MySQL provisioner to reference a SQL file to set it up:

shell provisioner:
mysqladmin -u root password root
mysql --user=root --password=root --host=localhost --port=3306 < /vagrant/mysql_boostrap.sql

mysql_bootstrap.sql
create schema appdb;
create user 'dbuser'@'%' identified by 'dbuserpassword';
grant all on appdb.* to 'dbuser'@'%';



Rebind MySQL Host inside the VM



We need to rebind the mysql inside the VM. Edit your my.conf file (say, sudo emacs /etc/mysql/my.cnf) and comment out the following lines
# skip-external-locking
# bind-address


Connect



We’re ready to connect from the host! To connect from the mysql client on the command line, we need to use the TLS setting. Your command line would look like this:
mysql --user=dbuser --password=dbuserpassword --host=127.0.0.1 --protocol=TCP --port=3309

If you use MySQL query browser or MySQL Workbench, the connection form requires 127.0.0.1, not localhost in connection setting dialog

Voila! Now you can connect to MySQL from the client on your host to the server running inside your Vagrant VM!

Sunday, April 7, 2013

Troubleshooting A Mysterious HTTP 500 Response


The other day I was modifying a Spring REST endpoint, changing the DTO creation method. My DTO in this case is a variable-depth copy of my domain object with hibernate proxies converted or stripped out. Strangely, when I accessed the endpoint to retrieve the modified DTO, I would get a 500 response but see no exceptions in the logs. What was going on?

Troubleshooting is all about isolation. First, replicate the problem (isolate the trigger). Then, find where the expected behavior changes from the actual behavior (isolate the location in the pathway). Finally, see what you can change in that location that resolves the problem (isolate the fix).

I’d already started isolating the trigger: creating a DTO with one technique vs another. I knew that I had changed how my DTO was being created and that the returned object was supposed to be the same as before. So to further isolate the trigger, my first question was: “What is the difference between the previously returned object and the new object?”

For comparison I created both objects in the controller method so I could compare the working and broken objects side by side. Stepping through the debugger with the broken code exited the Controller with the return of a valid DTO. But I was returning more information than before, specifically, three collections each of which had shallow copies of their elements. When I emptied the collections, the formerly-broken object was successfully returned without a 500. By process of elimination I determined with collection and which elements in it would trigger the 500.

Now that I’d isolated the problem to a new specific element in my response, the next step was to isolate the location in the pathway where the problem was occurring. From the work in the debugger, I knew that the problem was happening between the return of my controller and the receipt of the response by the browser. What happens between those two points?

Unfortunately I didn’t have the Spring source code readily available to step into, but I knew that the Jackson mapper was marshalling my @ResponseBody object to JSON behind the scenes. Maybe there was a problem with the marshalling?

I quickly created a unit test that instantiated a Jackson ObjectMapper and marshalled a DTO containing the problematic element that I knew would trigger the 500. And sure enough, there was a NullPointerException inside a “get” method being called by Jackson! It turns out that Jackson was picking up a “get” method and evaluating it during the marshalling, but the “get” method was actually performing a calculation instead of a simple property return. And the calculation involved an object property that had been set to null during my new DTO creation.

Some quick googling revealed that this is a known gotcha. The solution I zeroed in on was to rename my method so that it was prefixed with “calculate” instead of “get”, which was probably a more appropriate name anyway. And of course the unit test was perfect to show that this fixed the problem (isolate the solution).

So in turn I’d isolated: the trigger, the location in the code path where the problem occurred, and the change that would fix the problem. It was a pain in the butt, but a good exercise in troubleshooting.

Sunday, March 31, 2013

Programming Puzzle: The Clock Angle Problem



I came across an interesting programming puzzle the other day that I hadn’t seen before. It’s called the Clock Angle Problem. In a nutshell: Given a time of day, what is the smallest angle between the hands on an analog clock showing that time? What about the angle for the current time, whatever that time is? While there is an analytic solution, the problem provides a tidy little puzzle for programmers to solve programmatically.

So I did the obvious thing: coded up a solution and posted it to Github. The trick is to convert hours and seconds to the same units so you can find the difference. You can compute the difference in degrees (in this sample the values are also precomputed). You can also normalize their values first.

Let’s take this question a step further and ask: How would we unit test all of this? (again, test code is on Github) If we have a method for finding the angle given the current time, any unit test require separation of obtaining and parsing the current time from the actual clock angle algorithm. Once that’s done, a parameterized unit test fits the bill nicely for the algorithm and we can spot check some known values.

We can write another test for parsing the hour and minute from a Date. The trickiest part to test is obtaining the current Date, since time-based unit tests require a little special handling. We need to encapsulate the production of the current Date (say, in a DateProvider class, or in an overridable factory method on the Clock) in such a way that we can override it in the test to provide a constant Date for unit testing. Then the one-liner that provides the current Date can easily be unit tested: test that it is the same as a new Date() constructed in the unit test (within a certain tolerance, say 100ms).

We can still do more with this question. What if the hour and minute hands ticked for every millisecond and we wanted millisecond-level accuracy? Would rounding errors be a concern, and if so, how would we mitigate that? If the calculation was really expensive, how could we implement caching? (Hint: Not WeakHashMap) I’m not answering these questions here, but they are left as an exercise for the reader (or for myself for a future blog post).