Sunday, April 7, 2013

Troubleshooting A Mysterious HTTP 500 Response


The other day I was modifying a Spring REST endpoint, changing the DTO creation method. My DTO in this case is a variable-depth copy of my domain object with hibernate proxies converted or stripped out. Strangely, when I accessed the endpoint to retrieve the modified DTO, I would get a 500 response but see no exceptions in the logs. What was going on?

Troubleshooting is all about isolation. First, replicate the problem (isolate the trigger). Then, find where the expected behavior changes from the actual behavior (isolate the location in the pathway). Finally, see what you can change in that location that resolves the problem (isolate the fix).

I’d already started isolating the trigger: creating a DTO with one technique vs another. I knew that I had changed how my DTO was being created and that the returned object was supposed to be the same as before. So to further isolate the trigger, my first question was: “What is the difference between the previously returned object and the new object?”

For comparison I created both objects in the controller method so I could compare the working and broken objects side by side. Stepping through the debugger with the broken code exited the Controller with the return of a valid DTO. But I was returning more information than before, specifically, three collections each of which had shallow copies of their elements. When I emptied the collections, the formerly-broken object was successfully returned without a 500. By process of elimination I determined with collection and which elements in it would trigger the 500.

Now that I’d isolated the problem to a new specific element in my response, the next step was to isolate the location in the pathway where the problem was occurring. From the work in the debugger, I knew that the problem was happening between the return of my controller and the receipt of the response by the browser. What happens between those two points?

Unfortunately I didn’t have the Spring source code readily available to step into, but I knew that the Jackson mapper was marshalling my @ResponseBody object to JSON behind the scenes. Maybe there was a problem with the marshalling?

I quickly created a unit test that instantiated a Jackson ObjectMapper and marshalled a DTO containing the problematic element that I knew would trigger the 500. And sure enough, there was a NullPointerException inside a “get” method being called by Jackson! It turns out that Jackson was picking up a “get” method and evaluating it during the marshalling, but the “get” method was actually performing a calculation instead of a simple property return. And the calculation involved an object property that had been set to null during my new DTO creation.

Some quick googling revealed that this is a known gotcha. The solution I zeroed in on was to rename my method so that it was prefixed with “calculate” instead of “get”, which was probably a more appropriate name anyway. And of course the unit test was perfect to show that this fixed the problem (isolate the solution).

So in turn I’d isolated: the trigger, the location in the code path where the problem occurred, and the change that would fix the problem. It was a pain in the butt, but a good exercise in troubleshooting.

No comments:

Post a Comment