Every few months or so, somebody asks on social media why a particular DVLA digital service is turned off over night. Why is it, in the 21st century, a newish online service only operates for some hours of the day? Rather than answering it every time, I’ve decided to write this post, so I can point people at it in future.
It’s also a great case study to show why making government services digitally native can be quite complicated. Unless you’re a start up, you’re rarely working in a greenfield environment, and have legacy technology and old working practices to contend with. Transforming government services isn’t as easy as the tech bros and billionaires make it out to be.
A bit of history
Before I get into the detail - a little bit of history.
DVLA is around 60 years old and manages driving licences and vehicle records for England, Scotland and Cymru.
For decades, DVLA outsourced the management of its technology to organisations like IBM and Fujitsu. We brought this back in-house in 2015. When I started working with DVLA in 2013 to deliver new digital services, almost all1 technology delivery was done by IBM and Fujitsu.
At the time, many of DVLA’s services - particularly those relating to driving licences were still backed by an old IBM mainframe from the 1980s - fondly known as Drivers-90 (or D90 for short). D90 was your typical mainframe - code written in COBOL using the ADABAS database package. Most data processing happened ‘offline’ - through batch jobs which ran during an overnight window.
In the early 2000s, there had been an attempt by DVLA’s IT suppliers to modernise the systems. They’d designed a new set of systems using Java and WebLogic, with Oracle Databases - which they referred to as the New Systems Landscape (or NSL). To speed up the migration, they’d used tools to automatically convert the code and database structures.
As often happens in large behind-the-scenes IT modernisation projects, this upgrade effort ran out of energy and money, so it never finished. This left a complex infrastructure in place - with some services using the new architecture, some using the mainframe, and some using both at the same time.
Over time, problems emerged with the code written by the automated tools - it was over complicated, brittle, and difficult to maintain. Because it was an automated translation, it also meant that the new system replicated the overnight batch job design from the old mainframe.
However, these new batch jobs also had inbuilt assumptions that the underlying datasets wouldn’t change during the overnight batch window. If something did change, the batch might fail and leave the entire database in an uncertain or corrupt state. Untangling this sort of failure could take days to fix and leave the entire system offline.
This was the landscape which we faced in 2013.
Delivering new digital vehicle services
As part of the GDS exemplar programme in 2013, DVLA committed to delivering a set of new digital services for managing vehicles and personal registrations. To deliver these services, we had to navigate the complexity of the existing tech in place.
Building a new front-end service would be relatively straightforward. However, updating the vehicle record would be more complex - we’d have to integrate with the legacy systems and deal with IBM/Fujitsu to do it. But the even bigger issue - how would we deal with the fragile overnight batch jobs?
We faced a choice. Step back and spend the next few years redesigning and rebuilding the underlying infrastructure to remove/remediate the overnight batch jobs, or accept the service couldn’t initially operate overnight.
Organisations often fall into this trap - spending years and huge amounts of money fixing the underlying foundations before starting to do new things. It’s difficult for an organisation to keep its focus and attention on a complex upgrade - particularly without getting noticeable benefits along the way. DVLA tried this in the early 2000s when migrating away from the mainframe. They ran out of money, and ended up in an even worse half-state.
I pushed for us to press on and deliver a service that could operate normally during the day, but would be turned off overnight. This would allow us to get some value early - giving people access to a new service quickly, while we looked to fix the issues behind the scenes. Luckily the political pressure of the exemplar programme supported us to do that.
So we built the new service. We paid IBM/Fujitsu to build us an API into the database so we could update the records in real time. We worked with them to squeeze and narrow the batch window as much as possible, and shifted it a little so there was more availability in the evening (when user research showed people would be more likely to use it).
Over the next few months we added more parts to the service - you could buy and sell a vehicle online, manage your personal registration plates, update your address etc. All without paper.
Users got the benefit of new services and DVLA didn’t have to deal with as much paper as before. Real value.
What happened next?
Not long after the first service was launched, we designed and prototyped a way to allow the services to operate during the batch window. In the daytime, the service would operate as before. During the batch window, it would temporarily ’store’ the transactions until the legacy system became available again. This would allow the services to run 24/7.
Unfortunately, for various reasons, DVLA decided not to implement this. Instead, they decided to focus on rebuilding the legacy infrastructure in its entirety.
At the point I left DVLA in late 2015 this new infrastructure was still being designed.
It’s now 2024 - 10 years on from the launch of the first service. The legacy infrastructure, which really should have been replaced by now, is probably still the reason why the services are still offline overnight.
Is this acceptable? Not really. Is it understandable? Absolutely.
Legacy tech is complicated. It’s one of the biggest barriers for organisations undertaking digital transformation.
-
we found an absolutely brilliant team of Civil Servants that developed and owned a bunch of essential (but overlooked) tech services. But that’s a story for another post. ↩︎