So on this beautifully sunny Saturday, I receive news that British Airways has suffered a “major systems outage”. Lots of flights are cancelled and passengers are venting on Twitter. Nothing unusual in people venting their frustration at British Airways on Twitter. Even I have done it in the past!
There are two main points here. Firstly its British Airways. They never seem to me to be able to handle an issue correctly and seem to have difficulty in behaving in a good and timely fashion towards customers. The chaos always seems to me to be exacerbated. I have had over 15 years experience of this, be it extreme snow or even swipe cards! I remember the swipe cards at Heathrow Airport incident particularly well as they managed to leave me stranded in a god forsaken hell hole for 4 days!
I do feel sorry for the passengers on this one as it is not their fault. I hope that they all manage to get on go about their business, they do not need this disruption. An airport in my opinion is one of the worst places to be stuck, especially if you are airside.
I have not seen enough information about this one to make any sort of technical judgement.
The second point is not particularly aimed at British Airways as it is an industry wide issue. You see all over business and industry not enough budget ever seems to be given to IT and this make-do-and-mend approach has to be followed despite compliance and standards that have to be followed. In my 18 years in the IT trade I saw this every day. Some organisations are better than others. The other month there was an issue with Amazon Web Services that took out a lot of big services for in some cases more than 24 hours. This is not acceptable.
Thing is, these outages are getting larger and even more far reaching and disruptive. We had a cyber attack a few weeks ago that severely hit large parts of the NHS and some other organisations too, some of these for more than 24 hours. Cyber attack, system failure or human error, an outage is an outage.
One of my systems suffered an outage the other day. I had everything recovered in 10 minutes, but an outage is an outage. You see I had an instant recovery plan that is tried and tested. If l didn’t have these plans in place it could have been worse.
I have always felt that not enough attention is paid to disaster recovery, business continuity and failover. This essentially means that you can swap systems over in the event of a major problem to try and avert it or stem the damage to the business. The problem is that business continuity and disaster recovery costs money. At a senior management level there seems to be an attitude of “it will never happen to us, we do not need to spend the money”. It does happen. Systems fail all the time. Its a fact.
The answer is to pay more attention and credence to business continuity and failover. An issue will happen and beefing up these components will stem any damage to a business and indeed stem the levels of customer dissent and dissatisfaction, especially when Twitter is so readily available as a complaint channel!
I hope business and industry will look at these outages and the effect they have on customers and take some “preventative medicine”. Outages are getting bigger and even more disruptive such is the way everyone now relies on technology.