Error handling

I argue in this post that there is always going to be a compromise between explicitly handling error cases close to where they occur and making the main execution path easy to read. In my view, neither is always right or wrong, but rather it should be possible to choose one based on the problem being solved.

I’m a Java coder these days but I have spent time with plenty of other languages. For a while, I pursued a personal project with Go. After building quite a bit of my web-based adventure game (I know), I was sufficiently frustrated by Go’s explicit error handling to flee back to Java.

Then, recently, I was chatting with a friend about what he’s up to and he said he was planning to use Go for some AWS microservices. I related my experience with error handling and he told me that the Go architects have proposed a change to Go error handling, detailed here. So I read about it.

The first thing I thought was that it’s yet another syntax. Once again, XKCD has a somewhat suitable cartoon:

standards

In this case, of course, the Go people look at everyone else’s solutions to error handling and they decided they could do better (within the goals of Go’s explicit error handling)

In this post, I want to take a step back and look at motivations in error handling.

Trees vs Sequences

Imperative programming languages consist of a sequence of instructions with some control flow.

We use functions and loops to avoid repeating the same statements over and over in our code, and to allow us to hold more stuff in our heads (by building a thing and then using it later without worrying how it works).

We use branching (e.g. if and case statements) to allow our code to respond to its environment, such as external parameters or the result of an earlier calculation.

Without these structures, instructions are executed after another in the sequence they appear on the page. This is the easiest to read and understand because it’s how lines of text on a page work, they are read one after another. The control flow adds complexity and makes the code more difficult to read. For example:

Do a thing and get its result. If the result is a 4, do a thing, or if it’s a 5, do something different.

The first sentence is easier to understand on first reading than the second, right? However, notice that, in spite of the branching in the second sentence, it’s clear still clear what’s going on because there are only two branches (either the result is 4 or it’s 5).

It isn’t always this simple, however:

Check to see if you have butter.
If you have butter,
  check to see if you have bread.
  If you have bread
    put some butter on the bread,
  otherwise
    check to see if you have any teacakes.
    If you have any teacakes
      put some butter on a teacake,
    otherwise
      phone mum.
Or, if you don't have any butter
  phone mum.

While the example above probably works out as a collection of coherent sentences, it’s ugly because of the nested branching. I certainly couldn’t do the entire thing in the clauses of a single sentence, and at the end I couldn’t use “otherwise” for the butter because after such a long time it would be difficult for the reader to remember what the branch was about. The solution here is to repeat the subject of the branch (“if you don’t have any butter”).

We can refactor this pseudo code, avoiding nested branching, to make it more readable:

Check to see if you have butter.
If you have no butter, 
  phone mum.
Check to see if you have bread.
If you have bread,
  put butter on the bread then stop.
Check to see if you have teacakes.
If you have teacakes,
  put butter on a teacake then stop.
Otherwise, phone mum.

It’s easier to read than before and it’s shorter to read for certain conditions, because you get a result or phone your mum at several locations before the end of the code. and I think this exercise captures what is a fairly mundane refactoring task that all developers engage in regularly.

However, this post is about about error handling, of which we currently have none*, so let’s get back to that.

*Not having butter isn’t an error because we expect it; an error would be “I don’t know what butter is” or “the kitchen has blown up”!

The problem of errors: readability vs accuracy

Let’s look at an easier example than the one above, with no branching:

Sum the numbers in a file:
  Read the file
  Parse the file
  Sum the numbers
  Print the result
End

In your head this function might look like this:

code_flow

It’s a little sequence of instructions, telling you step by step how to load sum up the numbers in a file. No branching or decisions to be made. It’s the easiest kind of code sequence to read.

However, it’s not really that simple. Every step of this function could result in an error, and we would have to do something about it. Really, this code is a tree, like this:

code_tree

Now we have a problem. We should only be able to get to the next instruction if the last one was successful, but checking at each stage means we are constantly branching. Even if we write this like we did above to avoid nested branching, there is still going to be a branch after every line in the sequence.

This is the main thrust of this post:

There is always going to be a compromise between explicitly handling error cases close to where they occur and making the main execution path easy to read.

Let’s look at some ways that have already been attempted…

Immediately handling errors

Take a look at this representation of the above code, with immediate error handling:

file, err = read(filename)
if (err)
  stop
numbers, err = parse(file)
if (err)
  stop
sum, err = sum(numbers)
if (err)
  stop
err = print(sum)
if (err)
  stop
stop

It is difficult to see the main flow of instructions because the error handling keeps getting in the way. This is essentially how Go (as currently implemented) handles errors. If we’re concerned about always handling errors then (in a sequential, line by line, representation of the code) this is about as good as it gets.

In Go this is mandatory. Provided that you don’t break the coding convention that error arguments aren’t marked unused (by denoting them with an _ ), you must deal with an error if you call a function that returns one. Because every error must be explicitly handled, no errors are ever swallowed or missed, which was the aim of this paradigm in Go. However, and it looks like the Go architects have now realised this, I think it’s a mistake to make this approach mandatory in all cases.

A more concise, optional representation

There is another way to do this kind of error handling that might look cleaner, with error handling on the same line:

file = read(filename) or stop
numbers = parse(file) or stop
sum = sum(numbers)    or stop
print(sum)            or stop

This is similar to how you can handle errors in Perl and also similar to the method I discuss for handling errors in an old post of mine about Bash scripts. The examples I’ve seen also make this approach optional rather that mandatory as in Go, so I like this better. However, this has its limitations.

For one thing, if you want to do more than one thing in the “or” clause you’ll need to write a function for it. Any functions will naturally be divorced from the main code sequence so might themselves become confusing.

Another limitation is that it can still be more difficult to read each line if you have to cope with what happens when the main clause in that line fails, so although you’re using fewer lines it may still be as hard to read.

Exceptions

Generally, handling exceptions works like this:

try
  file = read(filename)
  numbers = parse(file)
  sum = sum(numbers)
  print(sum)
catch Exception
  stop

That is, we simply have the main sequence of instructions and trust the language to jump to the catch clause if there’s a problem. Rather than handling exceptions near to where they are thrown, we can also throw them back up the call stack to some higher caller.

The main criticism of exceptions seems to be twofold:

  1. that the things you need to do in the catch clause might be different (like cleaning things up) depending on how many instructions had already executed before one blew up inside the try clause, and
  2. that it’s difficult to know whether or not every possible exception is really being caught somewhere.

I dare say there’s a lot of complexity in the arguments about exceptions, how they’re designed and how they can go wrong. I’ve certainly seen both of the issues above in my time writing code. I don’t plan to address any arguments directly, rather point out that:

Exceptions represent a compromise between explicit (and obvious) error handling and the readability of sequences of instructions. Therefore, they are never going to be as good at showing how errors will be handled as more explicit approaches, like Go, but they might still be worth using because they are demonstrably more readable.

The Go 2 solution

From what I can see, the new Go solution (which you can read about here) is syntactically different from exceptions (check vs try and handle vs catch) and, like standard Go, still enforces mandatory error handling by the caller.

The mandatory handling is a little bit more explicit than Java’s checked exceptions, in that you need a minimum of a handle block at the top of the function that returns with an error rather than just a throws clause in the method signature, but I honestly don’t think that’s much of a difference.

The main difference with Java is that there are no unchecked exceptions, so you can’t silently throw upwards 7 levels in the call stack without any of the intervening levels being aware of it. Well, I think it should be up to you whether or not you want to do that, in your situation, with your code base, so I’m still pretty happy with Java Exceptions.

That said, the proposed change to Go would make me more likely to give Go another try, since it certainly makes error handling more concise…

Java Exceptions

Alright, so I’ve decided I’m happy with Java Exceptions, but everyone who has been coding for more than five minutes knows the issues that can arise. The rest of the post is about minimizing these issues.

Barry Ruzek, writing for Oracle, offers some useful guidance about Exceptions, although I don’t agree with everything he writes.

For example, I’d be very careful about regularly throwing checked exceptions that need to travel several calls back up the stack before they are handled. It seems to me to be a recipe (in some situations) for overcomplicated method signatures, throwing six different exceptions (that will often be handled in basically the same way).

Steve Loughran discusses the problems of checked exceptions and streams on his blog, which is quite a good read.

Jeff Friesen discusses exceptions and some best practices in his post on JavaWorld.

My opinion, for what it’s worth 🙂

We should have style guides for this stuff. I’m not a fan of style guides for everything, as they can be too prescriptive, but this is definitely one of the things you will want to be consistent with throughout the codebase, otherwise you’ll be in a real mess before long.

We should avoid throwing exceptions if we can. This will mean checking something is possible before doing it. It will also mean working out acceptable optional return values for some scenarios (using Java’s Optional, in fact). Using fewer exceptions will minimize the exceptions flying around so that, when we need to use them, we can see them clearly.

We should use fewer types of exception. If a parameter is bad, that’s an IllegalArgumentException. If you’re parsing a number from a String, it’s a NumberFormatException, and so on. I like nice, semantic code as much as the next coder, but it can be hard enough working out what’s thrown and caught, or working out what to throw, without three different flavours of NoSuchMethodException. It might also be an idea to enforce (in your style guide) the use of a factory for generating new exceptions so as to make sure you know what is used where.

We should always add the environment to the exception message. What were you doing when the exception happened? What were the parameters to the method? What line of the file were you processing? Sure, you intend this exception to be processed by its caller, who will know some of this stuff when you get to its catch statement, but what if there’s a new caller? What if a refactor throws the exception further? I think we always need to assume that an exception is going to leak out into a log file somewhere, and in that situation we need to know as much as we can.

We should throw the most appropriate type of exception. I think there are code bases where the situation or the style of programming mean that catching exceptions isn’t the end of the world (in a web server, for example, it might be the difference between a 404 and a 500 error). In this case, if you want to make everything a RuntimeException for brevity, even though it leaves you at greater risk of uncaught exceptions, then so be it.

In fact, if you’re using Spring Boot to create a web app you can avoid catching anything at all and allow everything to percolate up and be mapped to a HTTP error code. This is neat as it looks a bit like Barry Ruzek’s barrier without you even needing to make sure you write the catch.

In some cases, however, an unchecked exception can lead to the early termination of your long-running process and losing your data, or something similarly horrific. In this case, maybe checked exceptions are your answer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s