Atomic Energy of Canada Limited (AECL), a crown corporation of the Canadian government, began developing the Therac-25 in the late 1970s and took it fully computerized to the healthcare market in 1982. The Therac-25, like other medical linear accelerators including its predecessors Therac-6 and Therac-20, used high-energy electron beams to destroy tumors without damaging nearby healthy tissue. Depending on whether the tumor was close to the skin or in deeper tissue, the Therac-25 would operate in an electron-beam or X-ray mode. The Therac-25 had both technological and economic advantages over its predecessors. With a folded linear accelerator, it was capable of achieving higher energies and better avoiding healthy tissue adjacent to tumors. The folded linear accelerator, also called a "double-pass accelerator," occupied less space and used a cheaper energy source than the conventional accelerator.
The Therac-25 was different from its predecessor the Therac-6 in that it had both X-ray and electron beam modes, and it was different from the Therac-20 in that it could be fully controlled through software. Combining the X-ray and electron modes into one machine meant that a beam spreader plate, necessary for X-ray mode operation, would need to be moved into and out of the beam field. This vulnerability played a key part in some of the accidents. The software controls were a selling point for the Therac-25. Rather than implement all of the expensive hardware interlocks and sensors, AECL chose to control many safety features with software. Unfortunately, the patchwork of old hardware and new software resulted in numerous safety gaps.
Between 1985 and 1987 there were six recorded accidents involving the Therac-25 resulting in massive radiation overdose to patients. All of these incidents resulted in either death or serious injury.
The first incident occurred on June 3, 1985 at the Kennestone Regional Oncology Center in Marietta, Georgia. The patient was prescribed a 10-MeV electron treatment to her clavicle area, but when the machine turned on she felt a "tremendous force of heat... [a] red-hot sensation." There was no visible sign of tissue damage immediately following treatment, but after going home, the area began to swell and became extremely painful. The patient developed burns all the way through to her back; eventually she needed to have a breast removed and lost the use of her shoulder and arm. Following this incident, the hospital physicist inquired with AECL if an electron beam could be administered without the beam spreader plate in place. AECL incorrectly responded that it was not possible.
The second incident occurred in Hamilton, Ontario, Canada on July 26, 1985. The patient was receiving radiation treatment to a region near the hip. Six times the operator tried to administer treatment, but the machine shut down with an "H-tilt" error message. The operator did not know what this meant but reported it to the hospital technician. The machine's display read "no dose" after each treatment attempt, but when the patient died from cancer a few months later, an autopsy revealed that the patient had such intense radiation burns that he would have required a hip replacement. The incident was reported to AECL.
The third incident occurred in Yakima, Washington during December 1985. Similar to the first case, the patient received a radiation overdose but no cause could be found. The technicians at the hospital contacted AECL about the incident but AECL responded saying that an overdose was not possible and no other incidents had been reported.
The fourth and fifth incidents occurred at the East Texas Cancer Center in March and April 1986. These two incidents were similar because both patients were prescribed electron beam radiation, but during the setup for both treatments, the operator accidentally pressed X for X-ray, then quickly changed it to E for electron beam before turning on the beam. Both times, the display read MALFUNCTION 54 and displayed a gross under dose. The operator's manual made no mention of MALFUNCTION 54. In the first instance, the operator quickly restarted the treatment, but received the same error message. At this time the patient was pounding furiously at the door of the treatment room and was complaining about receiving an electric shock. The hospital shut down the machine for a day, during which engineers and technicians from the hospital and from AECL tested the machine but were unable to replicate the error. After the second incident that resulted in another apparent overdose, the hospital physician ran his own tests and was finally able to replicate the error, determining that it was caused by the speed at which the change from X-ray mode to electron mode occurred. Unfortunately, both patients involved in these incidents died from their radiation exposure.
The sixth and final incident involving the Therac-25 occurred in January 1987, again in Yakima, Washington. Similar to the first three incidents, the operator tried to administer a treatment but an ambiguous error message was displayed. Believing that little or no radiation had been delivered, the operator tried again. Again, the patient complained of a burning sensation and visible reddening of the skin occurred as in the last incident in Yakima. It was determined that the patient had received an overdose, but it was still unclear how it had occurred. AECL began investigating the incident and did find more software errors. Unfortunately, this patient died from complications related to the overdose in April that year.
Engineering failures and responsesEdit
The Therac-25 had several engineering failures that could have been averted through the use of independent verification, formal software specification, or more significant testing. AECL's initial responses to the flaws were deficient, and significant changes were not made until they were forced by the FDA to issue a corrective action plan in July 1987, two years after the first incident.
Lack of documentationEdit
The Therac-25's error codes, such as the infamous MALFUNCTION 54, were not mentioned in the operator's manual and were only acknowledged in the maintenance manual. Operators were also not informed whether any of the errors affected patient safety. Error messages that occurred most frequently were insignificant, so errors that were actually serious were sometimes ignored.
The operator in the Tyler, TX hospital had a sheet of error codes taped to the machine. It indicated that MALFUNCTION 54 meant "dose input 2 error" (p. 17), which was also not explained anywhere. It was intended for AECL internal testing use only, and eventually one AECL technician explained that it meant "a dose had been delivered that was either too high or too low" (p. 17).
When the AECL responded to the other software errors, they stated that documentation was a low priority. They did not further address this issue until they agreed to replace the MALFUNCTION codes with more descriptive error messages as part of the corrective action plan. Documentation is a critical part of any complex software system, especially one in which errors can have an impact on human life.
Code reuse from the Therac-20 was responsible for the accidents that occurred in Tyler, TX. The error occurred when the operator attempted to switch from the X-ray beam to the electron beam and then very quickly began the treatment. If the operator was fast enough, the beam would be activated after the beam flattener was moved but before the X-ray beam was shut down and the electron beam was turned on. The Therac-20 and Therac-25 had the same software for controlling the switch between the electron and X-ray beams. However, if this same sequence occurred on the Therac-20, the machine would simply blow a fuse and shut down because it had hardware interlocks that prevented the X-ray from firing without the beam flattener. The Therac-25 had only software interlocks, which were faulty.
Initially, AECL's solution to the problem was to physically disable the "up" key on all Therac-25 operators' keyboards. Then, if the operator were to input the incorrect beam type, or err on any data entry, he would be forced to restart the process. Hardware and software updates for all Therac-25s came much later as another part of the corrective action plan. Hardware interlocks were added similar to the Therac-20's to prevent the X-ray beam from firing without the beam flattener, and the software bug was fixed.
It is poor engineering practice to copy a solution from one project to another without considering the differences between them. The Ariane 5, an unmanned rocket, is another case in which code reuse lead to failure. The first test rocket exploded mid-flight because its software was from the Ariane 4. The Ariane 5 had a very different initial trajectory than the Ariane 4 and this difference caused a software exception which eventually stopped the computations, prevented correct navigation, and triggered a self-destruct mechanism.
Yakima Software BugEdit
The bug that caused the second overdose at the Yakima Valley Memorial Hospital was a result of integer overflow. During the setup phase of the treatment, the program used a counter to indicate whether the inputted parameters matched the prescribed treatment. If the parameters matched, the value of the counter was set to zero, and the treatment could proceed. Otherwise, the counter was incremented. The problem was that the counter was stored as an 8-bit integer, so it could hold a maximum value of 255. If the counter incremented again, it would wrap around back to zero. If the operator attempted to start the treatment at the precise time when the counter wrapped around to zero, then they could start treatment with the incorrect prescription.
The solution was to pick an arbitrary nonzero number to assign to the counter when the parameters were incorrect, so that it could never be zero unless the prescription was entered properly. While this sort of timing would be difficult to catch during testing, formal software specifications could have described the precise conditions under which the counter would have a value of zero. Another programmer performing an independent analysis of the code would also likely have been able to catch this error.
This case of professional negligence raises a variety of concerns about safe practices, especially in medical systems. Two important topics are the responses after the occurrences of the Therac-25 incidents of both the hospital staff and the AECL.
Due Diligence from AECLEdit
Much like its testing prior to taking the Therac-25 to market, AECL's responses to Therac-25 users' complaints were inadequate. AECL's hubris and continued dismissiveness led to repeated failures to fulfill professional obligations. While dismissing the first contact from Kennestone Regional Oncology Center was a professional failure, the knowledge of repeated incidents suggested systemic issues strongly enough that the "fluke" excuse was untenable. Ignoring a single incident was a professional failure; ignoring a pattern was a separate and flagrant professional misstep.
Due Diligence from HospitalsEdit
AECL did not fulfill its professional obligations as a manufacturer of a safety-critical system, but hospital employees, particularly physicists in charge of overseeing the machines, share culpability for these accidents. Given that the potential cost of misinformation from the manufacturer was human life, the physicists should have exercised more skepticism and done more independent testing. Prior to using their Therac-25, physicists at the Prince Margaret Hospital in Toronto, Canada installed a muzzle that could measure machine output and shut it down in case of malfunction. In taking these precautions, the Prince Margaret Hospital demonstrated supreme professionalism and prevented potential loss of life. In hospitals where incidents occurred, physicists should have put more effort into independent testing. Fritz Hager, a physicist and definition-4 professional at the East Texas Cancer Center, was the only physicist to test Therac-25 rigorously and under the same conditions it had failed in practice. As a result, he was able to demonstrate that the AECL could not claim the incidents were flukes.
Precautions and testing are insufficient without sharing information. Unfortunately, the Prince Margaret physicists' foresight was unique to their hospital. Since there were no incidents at their hospital, they never shared their muzzling idea until March 1986 at the first user group meeting. By March 1986, five of the six incidents had already occurred. Considering that there were only 11 hospitals using the Therac-25 at the time, operators at Kennestone Regional Oncology Center could have contacted other users and shared their information at little cost and with the benefit of saving lives. Further, users could have sooner involved the FDA to expedite regulation and the corrective action plan.
- Medical Devices: The Therac-25. Nancy Leveson, University of Washington
- Levenson, Nancy G.; Turner, Clark S. "An Investigation of the Therac-25 Accidents". IEEE Computer 26, July 1993.
- Ariane 501 Inquiry Board Report. 1996.
- Moral Hazard and the Role of Users in Learning from Accidents. Fauchart, Emmanuelle.