How Does Recent 'Dev Destroys Production Database' Mistake Rank with Other Programming Disasters?
By now you've surely heard about the junior software developer who destroyed a production database on his first day on the job, was immediately fired and even warned about possible legal action.
"I am panicking to high heavens," poor "cscareerthrowaway567" said in a Reddit post earlier this month that has garnered the attention of tens of thousands of readers, mostly supportive.
But how does it rank with other calamitous programming gaffes?
We earlier reported on "Job-Killing Decisions by Techies: What Not To Do," which included developers getting fired for creeping on teens, keeping inappropriate photos on a computer (including a judge), revealing confidential salary-related information at Google and many more.
But how does the recent database-deletion fiasco (which, judging from the community response, will likely see "cscareerthrowaway567" landing on his feet with the choice of multiple job offers) rank with other dev disasters? (Just assuming the person is a male, based on statistics.)
Here are several comparisons, culled from the Web:
Therac-25 Radiation Therapy Machine
Deleting a production database surely feels horrible, but it can't compare to programming mistakes that can actually kill people. Back in the '80s, six radiation overdose accidents that resulted in serious injuries and even deaths were at least partially attributed to poorly written software.
"The Therac-25 accidents are the most serious computer-related accidents to date," says a 1993 article on the IEEE Xplore site.
Another extensive investigation on the MIT site listed several causal factors related to the Therac-25 disaster, including: inadequate software engineering practices.
The report says basic software engineering principles apparently violated in the case include:
- Software specifications and documentation should not be an afterthought.
- Rigorous software quality assurance practices and standards should be established.
- Designs should be kept simple and dangerous coding practices avoided.
- Ways to detect errors and get information about them, such as software audit trails, should be designed into the software from the beginning.
- The software should be subjected to extensive testing and formal analysis at the module and software level; system testing alone is not adequate. Regression testing should be performed on all software changes.
- Computer displays and the presentation of information to the operators, such as error messages, along with user manuals and other documentation need to be carefully designed.
St. Mary's Mercy 'Death' Notifications
While accidental deaths are the worst possible outcome of bad programming, accidental death notifications can't be too far behind. That's what resulted from a "flawed patient-management software system" at St. Mary's Mercy hospital, according to a 2003 Baseline article.
Software flaws reportedly resulted in some 8,500 patients being notified that they were dead, with notifications also going out to their insurance companies and to Social Security, which probably spewed a huge amount of red tape confusion.
"It turns out St. Mary's Mercy had recently completed an upgrade of its patient-management software system," the article said. "A 'mapping error' in the conversion process resulted in the hospital assigning a disposition code of '20' -- which meant expired -- instead of '01,' which meant the patient had been discharged."
"To us, this is really not a very big story," a spokesperson was quoted as saying in the article. "We're not going to elaborate any more. It was a mapping error. That's all we have to say about it."
But that's certainly not all that the affected patients had to say about it.
$200-$300 Million IRS Glitch
The Houston Chronicle published an Associated Press article in 2006 that detailed an IRS computer glitch that cost the U.S. hundreds of millions of dollars.
The fiasco resulted from a computer program designed to hunt for fraudulent refunds in U.S. tax returns.
"The tax agency said Friday that a contractor promised to deliver by January a new version of a program, used since 1996, that searches for signs of fraud in every tax return claiming a refund," the article said. "The contractor, Computer Sciences Corp., did not produce a working program by the deadline, and IRS officials could not put the old program back into operation in time for this spring's tax filing deadline."
The glitch caused the IRS to fall behind on its identification of fraudulent refund claims, resulting in the $200-$300 loss estimate.
1983 Soviet Nuclear False Alarm Incident
This was detailed extensively in a Wikipedia article. Here's the nutshell:
On 26 September 1983, the nuclear early warning system of the Soviet Union reported the launch of multiple USAF Minuteman intercontinental ballistic missiles from bases in the United States. These missile attack warnings were correctly identified as a false alarm by Stanislav Yevgrafovich Petrov, an officer of the Soviet Air Defence Forces. This decision is seen as having prevented a retaliatory nuclear attack based on erroneous data on the United States and its NATO allies, which would have probably resulted in immediate escalation of the cold-war stalemate to a full-scale nuclear war. Investigation of the satellite warning system later confirmed that the system had malfunctioned.
The false alarm reportedly resulted from "a rare alignment of sunlight on high-altitude clouds and the satellites' Molniya orbits, an error later corrected by cross-referencing a geostationary satellite." The subsequent investigation revealed "other bugs found in the missile detection system" that embarassed the Soviet hierarchy.
The Patriot Missile Failure
An article on the University of Minnesota site explains how a Patriot anti-missile missile failed to intercept a Scud missile launched by Iraq against U.S. forces in 1991, resulting in the death of 28 soldiers and about 100 injuries.
"It turns out that the cause was an inaccurate calculation of the time since boot due to computer arithmetic errors," the article stated.
A report from the U.S. General Accounting Office, titled "Software Problem Led to System Failure at Dhahran, Saudi Arabia," said: "The Patriot battery at Dhahran failed to track and intercept the Scud missile because of a software problem in the system's weapons control computer. This problem led to an inaccurate tracking calculation that became worse the longer the system operated."
The GAO further explained the nitty-gritty details of the horrific accident in terms only a math lover could understand:
The range gate's prediction of where the Scud will next appear is a function of the Scud's known velocity and the time of the last radar detection. Velocity is a real number that can be expressed as a whole number and a decimal (e.g., 3750.2563...miles per hour). Time is kept continuously by the system's internal clock in tenths of seconds but is expressed as an integer or whole number (e.g., 32, 33, 34...). The longer the system has been running, the larger the number representing time. To predict where the Scud will next appear, both time and velocity must be expressed as real numbers. Because of the way the Patriot computer performs its calculations and the fact that its registers are only 24 bits long, the conversion of time from an integer to a real number cannot be any more precise than 24 bits. This conversion results in a loss of precision causing a less accurate time calculation. The effect of this inaccuracy on the range gate's calculation is directly proportional to the target's velocity and the length of the system has been running. Consequently, performing the conversion after the Patriot has been running continuously for extended periods causes the range gate to shift away from the center of the target, making it less likely that the target, in this case a Scud, will be successfully intercepted.
So there you have it -- just a taste of some of the more serious, well-known programming disasters. While "cscareerthrowaway567" may have been immediately fearful of having thrown away his career, his database-deletion mistake pales in comparison to other, much more serious coding calamities. A trashed database -- while crucial to a business -- is much less serious than loss of life, threats of nuclear war, hundreds of millions of dollars lost and so on.
Also, in the current climate of social media hype and automatic online fundraisers and other campaigns instantly springing up to aid anyone in distress, I'm pretty sure "cscareerthrowaway567" is sifting through job offers as I write, if he hasn't already accepted a much better position.
As Quartz reported, "The tech world is rallying around a young developer who made a huge, embarrassing mistake."
I wouldn't recommend duplicating this scenario in hopes of making a good career move, though. The trick probably has a short shelf life.
Posted by David Ramel on June 19, 2017