I went to Go Ape last week with my kids. While I was there, I watched the staff moving around in the trees, helping children and keeping everyone safe. If you’ve never been to Go Ape, it involves traversing around one of several courses in the trees while attached to a steel cable running around the entire course.
The staff operate a two-carabiner system, which allows them to safely move around people and between courses. That is, they have two carabiners attached to two short ropes attached to their harness. This meant that, when switching from one track to another, both carabiners start off clipped to the same track. The staff member then unclips one carabiner from the first track and onto the other, then repeats the action with the second carabiner. At no stage are they unclipped from the safety apparatus altogether.
I expect that some of the staff members consider this to be somewhat overcautious – they don’t fall off this thing, they spend all day on it and they all seem very fit and agile. And yet, this solution has an elegance to it that means it is no inconvenience to follow the rules. I saw the operation done many times during our time there and staff were fast. Nobody broke the rules.
And, of course, end users would find this system relatively easy too. I am told that the same simple two-carabiner safety process is sometimes used in this kind of place for users as well as staff. You just have to be careful that they understand that doing one carabiner at a time is crucial to the safety of the system.
Even with that training, a determined or incompetent user could unclip from the system entirely if they don’t follow the process. At the Go Ape I visited, however, a trolley system is in use, which means users never have the opportunity to clip or unclip anything, they’re fixed into the system from before their feet leave the ground until after they’re back on the ground. It is a truly remarkable safety system.
Because I’m an insufferable nerd, this whole experience made me think about safety in software. Now, for the vast majority of software developers, we’re not in a position to kill anyone (least of all ourselves!) when we make a mistake. However, we can cost people some time and effort, or some money, and in the worst cases lose data and/or allow access to confidential data, which can certainly damage real lives. I think the term “safety” applies to the desire to avoid all of these outcomes.
Some actions for safety
I’ve looked for a resource that discusses best practices for ensuring safe activity in workplace environments, a short list of rules and guidelines, but I haven’t been able to find anything. If you know of one I’d be glad to hear of it as I’m sure there are aspects of this that I haven’t thought of. In the absence of an external source, I’ll propose my own:
- Tell people about the danger. “You could fall off and kill yourself if you’re not attached”
- Instruct people in the proper process. “Put this harness on. Don’t take it off. It’ll save your life if you fall. When switching to another track, do one carabiner at a time so that you’re always attached to something, otherwise you could fall.”
- Check that the proper process is being followed. “Let me check that harness for you before you go up there”
- Make it easier to follow the process than not to. “Lets build a trolley system that removes the need to clip and unclip carabiners. Novices will use this trolley system, which is much harder to do wrong.”
So, applying this to, for example, relational database transactions in code:
- There should not be any developers working in this field that do not understand the dangers of transactions. Indeed, everywhere I’ve worked there have been horror stories about leaving transactions open and hanging the database, or not using transactions at all and leaving behind a mess when your code fails. These are the legends that help developers to remember that you need to pay attention when writing transactional code.
- Nonetheless, formal processes of training developers to use transactions conscientiously, and providing best practices for a given codebase, are a good idea.
- Code reviews satisfy this as a general requirement, but a specific mention in a code-review checklist is a more thorough approach.
- Finally, using something like Spring Data makes transaction management easier to get right. If your codebase has special standards you want to enforce (like transactions being read-only unless explicitly otherwise) then it can be useful to add your own aspects and enforce their use so that it’s easier to follow that process than not. If using JPA, you can’t do anything without a transaction, which at least forces developers to think about them.
Of course, I’m not suggesting it’s impossible (or even difficult) to place transactions at the wrong level. Unfortunately, I think this is an area where flexibility of application means you just need good training and to think hard about what you’re doing.
HTTPS is a situation where the vast majority of the time you want no flexibility, just the same solution for every server. This should be a simple place to make it near impossible to do it wrong.
I’ve set up HTTPS many times on Apache, Tomcat, NodeJS and nginx servers. Often it involves little more than following the instructions… then finding someone to give you a TLS certificate… then convincing them that you own the domain… then making sure it’s in the right format… and you’ve left it in the right place. And then you’re done! Until it expires. And that solution only works if it’s an internet-facing server!
Lets be clear – whether it’s over a firewalled intranet or the internet we always need to be using HTTPS for communications. There are two aspects to the protection offered: 1) HTTPS is the only way to be sure we’re speaking to the server we think we are, and 2) this is by far the easiest way to encrypt our communications.
But we don’t. Some devs at least use a self-signed certificate for internal stuff. Many (many, many) developers note that Apache, Tomcat, Flask, NodeJS and Spring Boot all start up without TLS by default and we leave it like that for a while because it’s easy.
Following my list* of four actions we can take:
- We are moving towards an understanding that HTTPS should be the default for all applications, however I think that teams should really challenge any occasions when it isn’t in place. Zero tolerance. Maybe some CPD sessions showing developers what messages look like using wireshark or something to frighten people?
- I would like to think graduates these days are being taught how to configure HTTPS and why it’s essential, but I am not convinced this is the case
- I guess checking in this context is just calling people out when you see they’re not using HTTPS? Browsers have gone a long way to doing this with their warnings and red icons
- By far the greatest obstacle to using HTTPS is its non-trivial set up. I try to make it easy for team members by writing scripts that will get you there and READMEs that explain how, but ultimately making this situation better is out of the control of any single developer so I think there’s a lot more we need to do as an industry
What we should be doing as an industry
For intranet stuff, I think generating self-signed certificates by default would make all the difference. At least then the complaint would be that your server’s identity is not verifiable, rather than that you’re sending passwords around in plain text!
If you’re a sysadmin, you should absolutely have your own internal CA, deploying it to user devices as install images and/or your software update process. You should be providing an automated way to obtain TLS certificates from the CA.
As for real public internet-level certificates, Let’s Encrypt seem to be very good. Automated provisioning of certificates and regular expiry mean that it’s possible to configure an automated process to refresh your certificate. However, through no fault of theirs, they can’t just automatically generate a TLS certificate willy nilly. First, they have to verify that you really own the domain, which makes initial setup difficult. They have to ask you to prove you own the domain by making a change to the domain that’s visible to them. It’s hard to make that automatic.
It seems to me the answer here is for Domain Registrars to provide certificates for their customers through an API. They are the people who can verify that you’ve authenticated and that you own the domain, so having them provide an automated TLS certification process seems to me to be the way to make it automatic. Combine this with authentication software integrated into your favourite server and certificates could be issued automatically by feeding an auth token or similar into your web server.
Maybe there’s a good security reason for not doing it this way?
*I know, four actions I just made up, who do I think I am?! Send links to the writings of proper experts please