Sunday, May 26, 2013
Blog Moving to a New Home
This blog is moving to Wordpress.com. You can find it at http://thoughtfulsoftware.wordpress.com. All the previous posts have been imported there so you can read everything at the new place!
Sunday, May 19, 2013
Sanitizing User Input, Part I
Many
years ago I took a secure coding class. I mainly remember one thing
from the course: “Assume all user input is evil.” This is fine because
the instructors did say “If you only remember one thing from the course,
remember this!”
What
can a user do with input alone? Let’s say you are a malicious user of a
web forum. You could create an account on the forum and set your
display name to “<script>alert(‘surprise!’);</script>”.
After registration, there is a script tag with javascript being stored
in the database where your username should be.
With
that in place, any time another user of the forum loads a page where
your username is displayed (say on any of your comments) your custom
javascript will execute on their browser! This is a Bad Thing because
usually a malicious user would not just pop up “surprise” but instead
use that snippet of javascript to, say, grab all the cookies in your
browser and send them to said user for nefarious purposes.
This is called Cross Site Scripting (XSS)
and works best on pages that are rendered on the server because the
script is always loaded by the browser that way. The script might not be
run if it’s added to the page after the page is loaded by an AJAX call.
But an AJAX application can still be susceptible, for example if the
username is added to the DOM like this:
// bad code that is susceptible to XSS
// JSON object returnedUser.displayName is
// “<script>alert(‘surprise!’);</script>”
var newdiv = document.createElement('div');
newdiv.innerHTML = returnedUser.displayName;
$('user_listing').append(newdiv);
Sometimes
developers in an overzealous commitment to security decide to html
encode all input, or strip all special characters such as “<” and
“>”. There are worse things to be overzealously committed to, but we
can do better. Instead of sanitizing every field every time, we can say
that the check depends on what the content is supposed to represent and
how it will be displayed. For example, in a user display name, a link
tag is probably not valid to have as part of the name, but a link tag or
style tag may be very helpful and relevant in a displayed product
description. For this reason, most HTML sanitizers are very flexible
about how they can be configured, so that we can easily allow different
kinds of HTML in different places.
Now
that we’ve agreed we need to sanitize user-supplied text, the next
question is when to do the sanitizing. You could make an argument to
sanitize user-supplied text on input or on output.
A reason for sanitizing input is that philosophically it makes sense to
catch potential security issues as early as possible, and you would
avoid storing malicious input on your system at all. This takes
malicious input and turns it into a validation issue, just like storing
numbers for a zip code is a validation issue. A reason for sanitizing
output is that the site then has the option to change sanitization
policy dynamically. At some point html may be considered unsafe (say,
allowing links in a product description) and later it could be
considered safe (due to a business decision to allow it). If you
sanitize only output, you are free to decide the policy whenever you
want.
Some libraries are designed to operate on input. Lacking a strong driver to be able to change the sanitization policy dynamically, I would favor sanitizing on input as well.
Stay tuned for Part II where we’ll look at a slick way to sanitize input as an input validation problem!
Sunday, May 12, 2013
Containerless Web Applications Part I: Introduction
The first time I ever saw a Scala program run, it was running as a standalone program with the Play Framework. There was no application server, no servlet container, in fact no container of any kind. It was a program running all by itself, just like the programs we learned to write in college. From a JEE point of view, it was odd, but strangely beautiful.
Fast forward a few years, and we see this idea of container-less deployment becoming more widespread among the Java players. There has been a trend of developers looking for simpler ways to run their applications: think about people moving from EJB 2.0 to Spring, and from heavy Application Servers to simpler servlet containers. It only makes sense that eventually more people would eschew containers altogether and embrace the concept of putting HTTP-handling inside your application instead of outside.
This approach has a few names: containerless, war-less, or “embedded jetty” (if you want to google for more on this topic). There are some disadvantages, of course... nothing in life is free. But it’s the opinion of many developers that the advantages outweigh the disadvantages.
Let’s outline some of the advantages of containerless deployment:
- Better IDE support Since it’s a regular application instead of something hosted in a web server, it can be easily started and stopped by any IDE without special plugins and without attaching a server to your IDE. Additionally, profiling is a breeze, and debugging is instantaneous (no need to attach your debugger to a remote server).
- Simplified development Developers run the application exactly the same way in development as in production. This reduces the possibility of errors from differences between running in development and running in production. A pleasant side effect is that even between developers it’s impossible to have server-specific errors because someone ran the application on one server and someone else ran it on another.
- Easier to deploy Deployments are easier to reproduce. There’s no need for the maven cargo plugin, there’s no copying from here to there, and there’s no extra server configuration that you need to maintain separately from the application.
- Ease of re-use The application is already a jar and is easily usable as a standalone library itself.
- Startup is faster Even for a small application, Tomcat can take a minute (or more). With an embedded server there is less for the container to do and it can start in mere seconds.
- Less classloader issues Fewer classloaders means fewer classloader issues. Conflicts between your application’s dependent libraries and the libraries distributed with your server has been known to cause bizarre and difficult-to-diagnose classloader problems.
Hopefully this whets your appetite for containerless deployment. In upcoming posts, we will review a simple architecture for a containerless application (complete with working code).
Sunday, May 5, 2013
Using A Variable Depth Copy to Prevent Hibernate LazyInitializationException
What is LazyInitializationException?
I’m going to go out on a limb and say: Anybody who has ever used Hibernate has at some point triggered a LazyInitializationException. Seriously, questions about what this exception is and how to fix it pop up all the time.
What
is a LazyInitializationException? In a nutshell, if you use Hibernate
to load an entity from the database, and then try to access a
lazily-loaded portion of that entity’s object graph from outside
the database session, Hibernate will throw a
LazyInitializationException. The simple answer is to just not access
lazily loaded properties or collections outside of a session. However,
the devil, as they say, is in the details.
If you’re unfamiliar with this exception and its solutions, I recommend reading up on the many options to solve
this problem. The options have been covered enough and are google-able
enough that I don’t need to discuss all of them here (except maybe to
ask you to avoid OpenSessionInView). But what I can discuss here is my own solution to the problem, and
show how my solution provides a two-for-one bundle of goodness.
What were the issues I was looking to solve?
Besides preventing this exception, first of all I was looking for a solution that would convert Hibernate objects into a DTO that looks the same as my regular entity / domain object. I investigated mapping with Dozer,
but it looked a bit heavy for my needs, and I didn’t want to introduce
another library unless it was actually necessary. Another option is to
use a HibernateUnproxifier, but that still requires code to access each specific collection every time, and I was looking for something more general.
Secondly,
I was looking for a solution that would load collections to the right
depth (deep enough, or shallow enough). Sometimes I wanted just the
top-level properties of an object, sometimes I wanted the object’s
objects or collections, and so on. As mentioned above, the simple answer
is that you can just lazily load the objects and collections you want
as needed. But the devil in the details is that it quickly becomes
cumbersome to modify your DAO or Service with methods to load this
specific collection here and that specific collection there.
Finally,
I was looking for a solution that prevents circular references.
Sometimes my domain classes had to have circular references to satisfy
certain Hibernate mappings. When marshalling these objects to JSON, the
conversion would break. It’s easy to introduce @JsonBackReference, but again I felt like there had to be a better way.
Copy It!
Enter the VariableDepthCopier.
With this class you can copy an object coming from the database into a
new instance of the same class, and specify how deep the resulting
object graph should be in the new copy.
This copy can then be safely passed around or marshalled to the client
without worrying about LazyInitializationExceptions.
As for how to specify the copy depth: an
object copied with level 0 has just the primitives and java.lang
immutable classes (such as String, Number, Date, etc). Non-primitive
properties will be set to null, and collections will be empty. Level 1
contains non-primitive properties, and collections will be filled. These
child objects and objects in collections are set to the equivalent of
what we saw as level 0 for the first object. The pattern goes on, you
can copy to level 2, level 3, and so on. I’m going to make the argument
that you should not return a variable depth copy of level
higher than 2.
How does this satisfy the three issues I was looking to solve?
First, this
converts my domain object to a new object of the same class but without
Hibernate’s persistent collections. This prevents the exception.
Hibernate's persistent collections aren't copied, their contents are
copied into the natural collection of the target class.This way we don’t
need the HibernateUnproxifier, everything can be copied without knowledge of Hibernate.
Secondly,
the copy still needs to be performed inside a transaction, but it can
be done generically without needing to specify which specific
collections are loaded. The copier provides complete control over how
shallow or deep the copy/mapping occurs.
Finally,
this copy performs cycle detection and sets any repeated copies to
null. With this mapping technique, I was able to remove @JsonBackReference from all of my domain objects.
Caveats
It’s important to note that the copy
is done according to bean properties, not necessarily through field
reflection. Because of this, the object to be copied should be a domain
object following the Java Bean pattern. Additionally, the copier depends on Spring's BeanUtil, so it works best in a project already using Spring. If this were to be distributed as a more general purpose tool, I would probably try to rework it to use field reflection, and to not have other dependencies. As of this writing, this solution works fine in my project so this is the state it's in right now.
Conclusion
As
mentioned above, there are many ways to deal with
LazyInitializationExceptions. I thought this was a neat and useful idea.
If you have ever dealt with LazyInitializationExceptions, hopefully
you’ll find it useful too.
Sunday, April 28, 2013
How to run an IDE INSIDE a Vagrant VM
Vagrant is intended as a runtime environment for use with your development, not as the development environment (IDE, tools, etc) itself. But I thought it would be nice if I could run my IDE inside the VM. This would have several advantages, including: leverage IDE shortcuts for executing the app, make it easier to connect the debugger, and ease ramp up time for other developers.
So I looked into setting up a Vagrant VM with Netbeans. The necessary steps are downloading and installing Netbeans in a provisioner, and forwarding X11 from the VM to the Host.
There are many options to do silent installation of Netbeans. I had this inside my shell provisioner (note, this takes quite a while to download and install, and you should do this after installing java)
wget http://download.netbeans.org/netbeans/7.3/final/bundles/netbeans-7.3-javaee-linux.sh
chmod 755 netbeans-7.3-javaee-linux.sh
sudo ./netbeans-7.3-javaee-linux.sh --silent -J-Dnb-base.installation.location=/opt/netbeans-7.3
rm netbeans-7.3-javaee-linux.sh
To do X11 forwarding requires a few steps
- Add to Vagrantfile: config.ssh.forward_x11 = "false"
- run vagrant with “vagrant up --no-provision” so you don’t download netbeans again!
- vagrant ssh into the VM, running netbeans brings up the UI on your host display, it “just works!”
At this point, it works but performance is horrible. Clicking on a menu in Netbeans takes about 12 seconds to respond.
To improve the performance, we can specify a faster cipher and compression for X11. To specify this, connect with ssh directly instead of using vagrant ssh so we can pass in the ssh options. Use vagrant ssh-config to configure ssh to connect to the vagrant VM: vagrant ssh-config > ~/.ssh/config (be careful not to blow it away if that file already exists). The command looks like ssh -F ~/.ssh/config -c arcfour,blowfish-cbc -XC default Note that the ssh config is set to “Host default” and we need to specify that (“default”) in the ssh command. Using localhost in the ssh command will not work. After this change, performance is much better but still bad (response time opening menus goes from 12 seconds to 4 seconds)
Additional things to try include manually modifying the VM video RAM to 64MB from 8, but that had no effect. (Indeed another VM I use for development that has the full gnome desktop installed and regularly run an IDE with no issue has only 12MB of video RAM) Another initially promising option was to enable VirtualBox 2D acceleration, but that is for Windows guests only so is not an option for Linux guests.
More potential steps you can take to improve the performance
- VMWare’s provider is rumored to have special graphics optimizations that let you work with an IDE inside a VM.
- NX Server is supposed to be faster than X11
In conclusion, using Vagrant as your complete development environment does not really work out as well as you’d hope. Perhaps not so surprisingly, it’s best to use Vagrant as it was intended.
Sunday, April 21, 2013
Clock Angle Problem Part II
Recently I posted a solution to the clock angle problem.
At the end we were left with a few questions involving how to spice the
problem up a little bit. In this post we will explore these question
more. Namely, we will introduce a seconds hand along with the hour and
minute hands, and we will explore caching.
First
of all: what if the hour and minute hands ticked for every second and
we wanted second-level accuracy? What we need to do is presume that
there’s a second hand, and add the degrees changed for each second to
the hour and minute components for each second tick. We can find the math for it, and write some code like this:
// calculate all units in degrees per second
// minute hand is 6 deg/minute plus 1/10 deg/sec
double md = m*6 + s*0.1;
// hour hand goes through 30 degrees per hour
// plus 30/60=1/2 deg/min,
// plus (1/2)/60=1/120 deg/sec
double hd = h*30 + m*0.5 + s*(1d/120d);
double result = Math.abs(hd-md);
return result > 180 ? 360 - result : result;
Second of all, if the calculation was really expensive, how could we implement caching? (Hint: Not WeakHashMap) There are a couple ways to approach this.
It
helps to understand the nature of what we’re computing. Is it
time-intensive? Memory-intensive? Does it make sense to pre-compute
values? In my caching example I took two approaches: one that
pre-computes all possible values, and one that caches values as they are
calculated.
It might make sense to pre-compute
values if the set of possible values is limited and known, and the
calculations do not adversely affect the startup time of the
application. In this case, both are true. If the calculation took a lot
of time or the amount of data to store was extraordinary (say, if we had
a clock with nanosecond precision!) then this approach would obviously
not be viable.
If
pre-computing is not viable, we need a way to determine a way to store
cached values. If we are already using a caching solution elsewhere in
our application already, the choice is easy and we can just incorporate
that. One options is Spring Cache, which provides a nice abstraction over a couple implementations (see some handy posts here, and here). Additionally, you can include EHCache and use it directly.
However,
if you are not already using caching and just need a quick one-off
caching solution (on the way to a more mature solution as your needs
grow... no need to re-invent the wheel), we can try extending
LinkedHashMap as a simple cache. Check out how we can incorporate it directly into a caching layer
over the ClockAngle class. Again, this is not a full featured cache
solution, just a quick and dirty solution that may work for a focused
part of your application for the time being.
Finally,
if we have multiple classes doing the calculations and multiple caching
techniques, it becomes apparent that there is duplicate code (in the
core calculation, and in any input validation). It makes sense to
refactor out common code and use the Decorator Pattern to compose these classes together. I think the final result is beautiful, simple, and easy to test!
Sunday, April 14, 2013
Accessing MySQL Instance In Vagrant VM
One thing that I thought would be useful would be to connect to the database which is running inside your Vagrant machine from the host machine. That way it would be easier to inspect or tweak what the database was doing while the application is running with a GUI like MySQL Workbench. Additionally you could reset the database with a SQL script from the host without dropping into vagrant ssh.
It’s not hard, but there are a couple steps you have to do to make that connection work.
Set up port forwarding
Inside the Vagrantfile, the line would look something like this
config.vm.network :forwarded_port, guest: 3306, host: 3309
Make sure the MySQL user can connect from outside localhost
We can do this at provision time when you create the database and create the user. One way to do this is for the MySQL provisioner to reference a SQL file to set it up:
shell provisioner:
mysqladmin -u root password root
mysql --user=root --password=root --host=localhost --port=3306 < /vagrant/mysql_boostrap.sql
mysql_bootstrap.sql
create schema appdb;
create user 'dbuser'@'%' identified by 'dbuserpassword';
grant all on appdb.* to 'dbuser'@'%';
Rebind MySQL Host inside the VM
We need to rebind the mysql inside the VM. Edit your my.conf file (say, sudo emacs /etc/mysql/my.cnf) and comment out the following lines
# skip-external-locking
# bind-address
Connect
We’re ready to connect from the host! To connect from the mysql client on the command line, we need to use the TLS setting. Your command line would look like this:
mysql --user=dbuser --password=dbuserpassword --host=127.0.0.1 --protocol=TCP --port=3309
If you use MySQL query browser or MySQL Workbench, the connection form requires 127.0.0.1, not localhost in connection setting dialog
Voila! Now you can connect to MySQL from the client on your host to the server running inside your Vagrant VM!
Sunday, April 7, 2013
Troubleshooting A Mysterious HTTP 500 Response
The other day I was modifying a Spring REST endpoint, changing the DTO creation method. My DTO in this case is a variable-depth copy of my domain object with hibernate proxies converted or stripped out. Strangely, when I accessed the endpoint to retrieve the modified DTO, I would get a 500 response but see no exceptions in the logs. What was going on?
Troubleshooting is all about isolation. First, replicate the problem (isolate the trigger). Then, find where the expected behavior changes from the actual behavior (isolate the location in the pathway). Finally, see what you can change in that location that resolves the problem (isolate the fix).
I’d already started isolating the trigger: creating a DTO with one technique vs another. I knew that I had changed how my DTO was being created and that the returned object was supposed to be the same as before. So to further isolate the trigger, my first question was: “What is the difference between the previously returned object and the new object?”
For comparison I created both objects in the controller method so I could compare the working and broken objects side by side. Stepping through the debugger with the broken code exited the Controller with the return of a valid DTO. But I was returning more information than before, specifically, three collections each of which had shallow copies of their elements. When I emptied the collections, the formerly-broken object was successfully returned without a 500. By process of elimination I determined with collection and which elements in it would trigger the 500.
Now that I’d isolated the problem to a new specific element in my response, the next step was to isolate the location in the pathway where the problem was occurring. From the work in the debugger, I knew that the problem was happening between the return of my controller and the receipt of the response by the browser. What happens between those two points?
Unfortunately I didn’t have the Spring source code readily available to step into, but I knew that the Jackson mapper was marshalling my @ResponseBody object to JSON behind the scenes. Maybe there was a problem with the marshalling?
I quickly created a unit test that instantiated a Jackson ObjectMapper and marshalled a DTO containing the problematic element that I knew would trigger the 500. And sure enough, there was a NullPointerException inside a “get” method being called by Jackson! It turns out that Jackson was picking up a “get” method and evaluating it during the marshalling, but the “get” method was actually performing a calculation instead of a simple property return. And the calculation involved an object property that had been set to null during my new DTO creation.
Some quick googling revealed that this is a known gotcha. The solution I zeroed in on was to rename my method so that it was prefixed with “calculate” instead of “get”, which was probably a more appropriate name anyway. And of course the unit test was perfect to show that this fixed the problem (isolate the solution).
So in turn I’d isolated: the trigger, the location in the code path where the problem occurred, and the change that would fix the problem. It was a pain in the butt, but a good exercise in troubleshooting.
Sunday, March 31, 2013
Programming Puzzle: The Clock Angle Problem
I came across an interesting programming puzzle the other day that I hadn’t seen before. It’s called the Clock Angle Problem. In a nutshell: Given a time of day, what is the smallest angle between the hands on an analog clock showing that time? What about the angle for the current time, whatever that time is? While there is an analytic solution, the problem provides a tidy little puzzle for programmers to solve programmatically.
So I did the obvious thing: coded up a solution and posted it to Github. The trick is to convert hours and seconds to the same units so you can find the difference. You can compute the difference in degrees (in this sample the values are also precomputed). You can also normalize their values first.
Let’s take this question a step further and ask: How would we unit test all of this? (again, test code is on Github) If we have a method for finding the angle given the current time, any unit test require separation of obtaining and parsing the current time from the actual clock angle algorithm. Once that’s done, a parameterized unit test fits the bill nicely for the algorithm and we can spot check some known values.
We can write another test for parsing the hour and minute from a Date. The trickiest part to test is obtaining the current Date, since time-based unit tests require a little special handling. We need to encapsulate the production of the current Date (say, in a DateProvider class, or in an overridable factory method on the Clock) in such a way that we can override it in the test to provide a constant Date for unit testing. Then the one-liner that provides the current Date can easily be unit tested: test that it is the same as a new Date() constructed in the unit test (within a certain tolerance, say 100ms).
We can still do more with this question. What if the hour and minute hands ticked for every millisecond and we wanted millisecond-level accuracy? Would rounding errors be a concern, and if so, how would we mitigate that? If the calculation was really expensive, how could we implement caching? (Hint: Not WeakHashMap) I’m not answering these questions here, but they are left as an exercise for the reader (or for myself for a future blog post).
Subscribe to:
Posts (Atom)