Top 10 IT Disasters of All Time
|
Remember Y2K? Though the millennium bug was not an IT disaster, it was a near disaster that affected the most number of people and businesses. Thus, Y2K made a list of the top 10 historic IT disasters. |
The British online tech news service ZDNet.co.uk has published what it sees as the top 10 IT disasters of all time. Not surprising, the list is a bit European focused, but it serves as a guide to the major problems the IT world has faced.
1. A faulty Soviet early warning system nearly caused World War III. In 1983, a software bug in the Soviet system reported that the U.S. launched five ballistic missiles.
2. The AT&T network collapsed in 1990, caused by an error in a single line of code in a software upgrade. Some 75 million phone calls across the U.S. went unanswered.
3. An Ariane 5 rocket exploded shortly after liftoff in 1996. According to a New York Times Magazine article, the self-destruction was triggered by software trying to stuff "a 64-bit number into a 16-bit space."
4. Two partners used different and incompatible versions of the same software to design and assemble the Airbus A380 jetliner in 2006. When Airbus tried to bring together two halves of the aircraft, the wiring on one did not match the wiring in the other. That caused at least a one-year and very costly delay to the project.
5. Navigation errors doomed two spacecraft sent to explore Mars in 1998 when one NASA contractor used imperial units and another contractor employed metric units in the space crafts navigation systems.
6. The British Child Support Agency's computer system operated by EDS overpaid 1.9 million people and underpaid some 700,000 in 2004, costing taxpayers over 1 billion pounds.
7. The two-digit year 2000 problem was more of a disaster avoided, except for the cost. Fixing the code, according ton one estimate, topped $825 billion.
8. When a Dell laptop exploded at a Japanese trade show in 2006, word of other laptop fires began to surface. Faulty batteries were blamed. Two recall programs for Dell and Apple cost battery maker Sony an estimated $185 million.
9. Some 500,000 British citizens discovered in the summer of 1999 that their new passports couldn't be issued on time because the Passport Agency had brought in a new Siemens computer system without sufficiently testing it and training staff first. The British government had to pay millions in compensation, staff overtime and umbrellas for the poor people queuing in the rain for passports.
10. About 17,000 passengers found themselves stranded earlier this year at Los Angeles International Airport when a network card persisted in sending the incorrect data out across the network, causing a network failure and forcing aviation official to ground planes. Nobody could be authorized to leave or enter the U.S. through the airport for eight hours.
ZDNet did not include disasters that resulted in any deaths.
Did the list miss any big IT disasters?
Here's one we entitled Prescription for Disaster, a report that tells of a disaster in the making in the $12 billion transformation of the British healthcare system.
Let us know below which disasters should have been included in the list.

Comments (30)
Your link on Prescription for Disaster caught my eye as I work in the health care sector of IT.
As I read the article, it seemed like exactly what you'd get if Al Gore (the inventor of the Internet and all things IT) joined forces with Hillary care (government healthcare.) Scary. I'll give Accenture credit for having the guts to walk away, even if it cost them nearly half a billion in set asides.
Posted by Don Babcock | November 28, 2007 10:33 AM
Great subject. Another great one would be ugliest IT implementation disasters.
You could sort by dollars wasted or spent in horrific protracted litigation. There are so many out there and it would help remind all CIOs (and their CXO customers and BODs) that it is much better to do it right and spend enough the first time to get large, complex IT integration projects done right.
Posted by Baird Lobree | November 28, 2007 10:45 AM
My vote goes to the Therac25 fiasco. The Therac-25 was a computer controlled radition therapy machine produced by Atomic Energy Canada Ltd. in the 1980s. The machine offered two modes of therapy, including "Megavolt X-ray therapy". The problem happened when the machine's software failed to move a component into place to convert an electron beam into X-rays, allowing a high power electron beam to directly strike patients. At least 5 patients died of radiation poisoning from dosages that were sometimes in the hundreds of Grays.
AECL did not have the machine's software independently reviewed. AECL did not assess what potential failure modes existed. The system documentation did not explain error codes. AECL ignored incident reports.
The problem turned out to be a software bug which intermittently caused an arithmetic overflow. When this happened, the software bypassed safety checks. So the machine killed people that were simply undergoing therapy. Nice one.
Posted by Christopher Shaw | November 28, 2007 10:45 AM
You could add to this list just about an IT project the U.S .government has undertaken in the past seven years: $170 million FBI Case Management System, $500 million Web based travel management system for the Department of Defense (guess they didn't know about Travelocity), failed security project at Veterans Affairs ($103 million, business service management system at the IRS (three years late, $30 million-plus over budget), Just about every project at the Transportation Security Administration (Terrorism Information Awareness system alone, $300 million-plus), the list goes on and on.
This is our money, being wasted, primarily due to lack of any "adult supervision" over the contractors doing the work.
JRM
Posted by Jeff McIntyre | November 28, 2007 10:46 AM
The AT&T outage in 1990 (No. 2 on your list) had a far greater impact than what you described. Air traffic control was knocked out on the East Coast of the U.S. by this outage, which was far more disruptive for those of us traveling that day than any phone calls not being answered.
Posted by Bob Emerson | November 28, 2007 10:49 AM
Don, what a stupid thing to say. First of all, nobody but you wanted to spoil the conversation by tossing in some old political grudge. Second, Al Gore never claimed to have invented the internet, but he did write the checks to fund the extension of Arpanet.
So stick to the subject and look for black helicopters on your own time.
Posted by James Stone | November 28, 2007 10:52 AM
Here's one that was caused by the CIA; they don't get enough credit for their successes.
In January 1982, President Ronald Reagan approved a CIA plan to sabotage the economy of the Soviet Union through covert transfers of technology that contained hidden malfunctions, including software that later triggered a huge explosion in a Siberian natural gas pipeline
From The Washington Post: "The result was the most monumental non-nuclear explosion and fire ever seen from space," he recalls, adding that U.S. satellites picked up the explosion. Reed said in an interview that the blast occurred in the summer of 1982.
"While there were no physical casualties from the pipeline explosion, there was significant damage to the Soviet economy,"
Posted by Lou Clark | November 28, 2007 11:00 AM
Even though I am a fan of Windows, I think Microsoft's approach to development of their ubiquitous OS has led to much of the present day dilemma of a-patch-a-week, security loop holes and the squandering of public trust.
Microsoft is only one example--proponents of "extreme programming" and the businesses that support the notion of engineering without due dilligence, are also to blame--a coming whirlwind of waste from poorly written applications to failed projects becuase of cost overruns.
The biggest disaster is the sheer arrogance in the software industry; no other engineering discipline is as undisciplined.
Posted by Mike Carter | November 28, 2007 11:21 AM
One could add the recent TJX credit card theft disaster to this list.
Posted by Gino G Gori | November 28, 2007 11:28 AM
AT&T Wireless!!! $100M Siebel disaster that hastened the downfall of the company and eventual sale. Text book case in collective incompetence.
Posted by Paco Malone | November 28, 2007 11:45 AM
The Therac 25 disaster was only partially a software problem.
Switching from low to high power caused the device to move an internal target shield from it's low power position to a high-power (i.e. out of the beam path) position.
By adding an inexpensive hardware interlock to the device, it would have been impossible to switch the device from low power to high power without manually releasing the interlock - no matter what the software tried to do.
In some cases, the problem is the whole design. Designers and developers do not consider the entire product, system, etc. when designing.
-S
Posted by Sunnyboy | November 28, 2007 11:58 AM
Does anyone remember when the "new" Denver airport tried to implement a high-tech baggage handling system? I believe it was supported by PC's running OS2.
I seem to recall it was a complete failure and resulted in over a year of delays in opening the airport while a traditional baggage handling system was installed.
Posted by Paula Rosenblum | November 28, 2007 12:21 PM
What about the AMR + Budget + Hilton + Marriott attempt to overhaul their reservation systems with CONFIRM??
.
Posted by jh lundin | November 28, 2007 12:35 PM
Isn't it ironic that the first story about almost causing causing World War III occurred in the same year as the movie War Games was created!
Posted by someone | November 28, 2007 1:26 PM
One of the worst IT disasters was also, In fact, one of the Worst disasters in U.S. history. I am referring to the logistical and coordinated mess of people and machines before, during and immediately preceding the September 11 attacks on the World Trade Center in NYC and the Pentagon in DC in 2001.
there were Conflicting reports in the government sectors as to what was really happening during that fateful morning as was the mess of trying to handle all the business lost claims/missing persons lists (to this day there still is not an accurate accounting of all the people who died at the WTC the listed 2896 victims, is more closer to 3500 (Just at the WTC alone) but because a mishmash of Windows PE devices, palm devices, Macs, PCs all cobbled together to try and make something that resembled a halfway decent information retrieval system resulted in hundreds of duplicate entries, incorrect information, or just plain missing information..
About the only pieces of IT equipment that functioned correctly were the server setups in the doomed towers that were getting in the last bits of transactional data out (and doing this while power the building was cut due to large commercial airliners crashing into them). For those networks not in the immediate impact floors, their redundant standby power supplies performed admirably until the structural integrity of the towers themselves just gave out.
The result: We lost a lot of very good people those weeks.
Posted by Christopher Mateja | November 28, 2007 1:55 PM
While not in the same class as your examples above, the Province of Ontario, Canada, spent a lot of money designing and building a computer for educational use that hit the market at the same time as the IBM PC.
Posted by Charles E. Fox | November 28, 2007 2:08 PM
The Danish government in the mid '90s, wanted a system for the unemployment offices. The idea was to register unemployed peoples qualifications, register the qualifications needed in the vacant jobs, and make an easy and good match.
It was decided to base the system on OS/2 Warp, which technically was a very good and stable system, and it was also decided to use IBM PCs. Nothing wrong in that.
The budget was 100 million kroner, about $20 million, fair for the time. As usual, when everybody wants to put a fingerprint on the pie, the aim dissolved and good ideas were added to the project, that soon had its first postponement. It was finished four years late, at five times the budget, and OS/2 was announced discontinued the same year the system was released. It was awful to work in, users had to go 20 displays to match a job-seeker to a job. When PCs were changed for new ones, peculiar errors popped up. An investigation showed that for some unknown reason the system needed a certain BIOS in the chosen IBM PC in order to work properly.
The company hired to make a new system stated that it was OS/2's fault, and everything would be fine just by switching to Windows. Of course, it didn't, another couple of hundred millions were thrown out the window (sorry!), but I can't see that the base OS can be blamed for creating a system where the very core function needs wading through 20 displays.
For IT projects in large organizations, public or private, multiply the figures by the value of pi, and it comes out right.
Peter
IT manager and project manager, who was not involved in the above
Posted by Peter Krogsten | November 28, 2007 3:20 PM
Mike,
Extreme and Agile methodologies, when performed properly, can result in software so good you break the Six Sigma model. We have done it. All while maintaining Sox compliance.
Now someone needs to teach the Microsofts of the world how to actually build it properly.
I also remember stories in ComputerWorld about two stealth bombers lost during testing due to software locking up. According to the article they fly like throughing lead bricks off the roof without the computer.
Posted by Dennis Thayer | November 28, 2007 3:35 PM
Our "ERP"* system was last upgraded in 1990 and it is DOS based! The manufacturer is out of business. It is mission critical. Besides a gross lack of functionality, I have explained to the owners that Microsoft could put out a patch or new OS that would totally disable our "ERP" system and bring business to a halt. I've had several real ERP vendors in but the big guys balked at the price. If we go down for 2 or 3 days, the lost revenue would be greater than a new ERP system. I'm just happy I'm not the product owner but I would have to get involved to clean up the mess.
* ancient system downloaded from a bulletin board (pre-Internet) that uses flat files, pointers and indexes.
Posted by Pulling Hair Out | November 28, 2007 4:26 PM
If we like it or not: IT is with us, and will be playing an ever increasing role, for the society as well as for the individual.
So while the stories are amusing, a mere complaining isn´t worth a dime.
I´m looking forward for a second (and much more important) article: "How to avoid IT disasters."
Peter
Posted by Peter | November 29, 2007 6:58 AM
How about the cascading power outage in the northeastern U.S. in 2003? It wasn't a pure IT disaster, but certainly wreaked havoc with everything from water systems to transportation to telecommunications.
Posted by Stan | November 29, 2007 12:24 PM
"...grounded 17,000 planes earlier this year at Los Angeles International Airport"
Oh, come on now. 17,000 planes on the ground would have stretched from LA to the boneyard at Tucson. Are there even 17,000 planes in the air at any one time?
Posted by Anonymous | November 29, 2007 12:25 PM
Our anonymous friend is right.
The 17,000 figure refers to the number of passengers stranded, not flights grounded. The 17,000 grounded planes figure appears in the ZDNet posting. This blog has been corrected.
Posted by Eric Chabrow | November 29, 2007 12:49 PM
Great article.
One IT disaster that is still going on is the Canadian Gun Registry system. $2 billopm and still counting everything but the guns that should be counted.
Posted by Glen | November 29, 2007 1:04 PM
Some additional entries:
I recall years ago that the mainframe(s) that ran the Sabre travel reservations system went down for 13 hours making it impossible for any travel agents to do business. I recall that estimated opportunity cost was something like $1 million in lost revenue per minute.
A big IT boondoggle was the FAA redesign of the air traffic control infrastructure about 10 years ago. I recall that they spent billions and couldn't get it to work so they reverted to the legacy system. Now they're talking about "NextGen" again.
Ditto for the IRS- billions spent for a massive update project with no result.
Posted by Mark Conway | November 29, 2007 1:10 PM
Regarding Stan's comment about the NE cascading power outage in 2003. It actually was caused by a faulty computer system that allowed a power gird failover from Ohio to Canada that overloaded the circuits and brought down the entire NE power Grid system. At the same time, a computer virus that spread itself upon startup, was massively spread as all computers in the NE started up, causing massive down time for weeks. Some even say it was that very virus that caused the original computer failure that caused the cascading outage itself.
So the story goes.....
Posted by Steve Dodd | November 29, 2007 5:36 PM
"Not surprising, the list is a bit European focused".
Er, the USA is NOT the world.
Get OVER yourselves, you egocentric snobs!
Posted by Richard Matthews | November 29, 2007 6:37 PM
You may want to visit this link: http://www.wagingpeace.org/articles/1998/01/00_phillips_20-mishaps.htm
There was a communications link between North American Air Defense Command early warning computers and Strategic Air Command computers. The system that managed data between these computers was called SWCCS (SAC Warning Command & Control System). The computer tape mistake in November 1979 resulted in the SAC command post officer in charge being relieved of duty and forced into retirement due to his inability to react as required.
Posted by William F. Slater, III | December 3, 2007 12:09 AM
Richard Matthews writes on Nov. 29: "'Not surprising, the list is a bit European focused.' Er, the USA is NOT the world. Get OVER yourselves, you egocentric snobs!"
Richard Matthews is correct; Britain, with its creation of ITIL and Europe's ISO IT oversight, still produced the majority of the top 10 IT Disasters [with] the majority of the rest in Europe. So, it goes to show you, if you want quality IT you go to the USA; they have been doing it the longest and obviously better.
Posted by Eduardo Mendez | December 4, 2007 1:45 PM
Eduardo,
It is best to not feed the trolls. If you do, they will get bigger and multiply.
I think that one the biggest problems that we have with IT projects is that we wait so long to declare them failures. Because so much depends on how projects start we should be more willing to "kill projects early and often."
Posted by David Shockey | December 12, 2007 1:13 PM