How To Evaluate & Manage Safety Risks In Biopharma
By Mark F. Witcher, Ph.D.
Of the many risks intrinsic to the pharmaceutical industry, safety has the most immediate significant impact. Safety risks can be described and modelled as cause-and-effect relationships using system risk structures (SRSs).1 All risks are initiated by a cause that passes through a connecting mechanism, described as a system, to produce an effect. A specific risk must be defined and structured using all three elements — cause, system, and effect. The purpose of this article is to structure a broad range of risks beginning with a defined danger or threat so they can be effectively understood and then managed.
Safety belongs to a family of risks where the analysis of a primary risk begins with identifying an obvious primary threat (danger) that might pass through one or more primary systems to result in a primary consequence (harm). However, the probability of the danger resulting in harm may also depend on one or more secondary risk factors that impact the primary risk system’s ability to control the danger to prevent harm.
This article describes how a straightforward cause and effect model can be used to evaluate and manage safety risks. While all models are approximations and thus partially wrong, the portion that describes the risk’s structure can be useful in understanding how a danger results in harm. While safety risks are most frequently viewed in terms of preventing harm to an individual or a group of people, the same principles apply to preventing damage to equipment, business enterprises, or any subject from a well-defined threat or danger.
Risks are currently viewed and described by most risk practitioners, including the pharmaceutical industry, as bad events.2,3,4 But effectively understanding a risk requires a more complete definition. Risk events do not occur spontaneously. All risk events are caused by a prior risk event. As shown in Figure 1, a primary risk is an input causal event (a threat) connected to the output effect event (a consequence) by a mechanism described as a control system. However, the system’s mechanism may be impacted by additional secondary risks that can also be described as a cause–system–effect relationship. A primary or simple risk is a linear sequence of the risk’s initiating threat, a connecting system mechanism, and the risk’s consequence. The connecting mechanism can be a sequence of systems. A complex risk is a network where the risk’s control system’s mechanism is significantly impacted by one or more secondary risk events.
Figure 1: Definition of a risk – A primary risk is a possible cause that might pass through a system to result in an effect. For a simple risk, the probability of the effect LC is the mathematical product of the probability of the cause LT occurring and the probability SYSLP that the threat will pass through the system, thus LC = SYSLP * LT. The effect will not occur if the cause does not occur. For a complex risk, a secondary risk factor event might impact the probability of the primary risk’s SYSLP significantly impacting the probability of the primary risk occurring should the threat occur.
All risk events have two attributes – severity and probability of occurrence. Severity can be easily described and rated from 0 to 7+ using Table A1 in the appendix. Table A1 covers seven orders of magnitude with an objective and subjective scale SX, and a logarithmic rating value SX^ calculated as the Log10SX of the objective scale. SX^ provides a concise scale for describing the significance of any risk event ranging from 0 – no impact – to 7+ – catastrophic. The severity of the risk is defined as part of the risk’s definition. Different levels of severity are essentially different risks that have different probabilities of occurrence. The severity of risks is usually obvious or analyzed assuming a worst-case rating.
However, while risks are identified and initially prioritized by their severity, they are managed or controlled by their probability of occurrence. Unfortunately, probability of occurrence is far more challenging to estimate. The probability of any future event is always clouded by uncertainty. Some risk events, such as the role of a die, are mathematically definable, while others may have historical frequency data based on prior experience or experimentation. Estimating event probabilities other than well-defined fair games of chance involves complex uncertainties associated with the estimator’s level of knowledge and is subject to a variety of observer biases and prejudices. Thus, the challenge of managing any risk is objectively estimating the probability of a future event using the best available information, knowledge, and experience.
This article focuses on understanding and using a risk event’s probability of occurrence. A risk event’s probability or likelihood of occurrence can be described and rated using Table A2. Like the severity scale, the table covers seven orders of magnitude of probabilities LX ranging from certain (100% or 1) to never (0% or 0). The likelihood table for a risk event X, either a causal threat or an effect consequence, includes both a subjective and objective scale, with the objective scale value (LX) used to calculate a logarithmic rating value LX^ as the logarithm of the objective scale. Values for ratings LX^ range from 0 (certain) to ≤ -7 (essentially impossible). The ratings in Table A2 can be used to describe the probability of occurrence for any risk event. Estimating the probabilities of future events depends on the knowledge, expertise, and judgement of the evaluating individual or team making the estimate. If frequency data is available, it should be used.
Both Table A1 and A2 suggest that both severity and probability of occurrence only need to be estimated to an order of magnitude. Given the high subjectivity of both attributes, an order of magnitude is all that can be reasonably expected. The ultimate outcome of any risk analysis is to either accept the risk or make changes in the control system mechanisms to decrease the estimated probability of the risk consequence event occurring by one or more orders of magnitude to an acceptable level. Order of magnitude estimates for both attributes is sufficient to achieve eventual acceptance.
To quickly describe the significance of a risk event X for acceptance or remediation, the severity rating SX^ (Table A1) and likelihood of occurrence rating LX^ (Table A2) can be added together to provide an adjusted risk likelihood (ARL).5 An ARL value of zero (i.e., a 10% chance of losing $10 or a one in a million chance of losing a million dollars) appears to be a neutral point for accepting or rejecting many risks. Positive ARL values describe bad or difficult to accept risks while negative numbers describe more acceptable risks. Subjective consideration of the ARL in the context of the risk’s severity SX^ provides a basis for reaching a consensus on risk acceptance decisions.
With a risk defined as one or more causal relationships and a risk event effectively described using a severity SX^ and a likelihood LX^ ratings, safety risks can be structured for analysis and management.
Simplified Model For Describing Safety Risks
Most safety risks can be modelled or described by the following three risk events separated by two control systems shown in Figure 2.
- Danger – a threat with a probability of occurrence LD to the danger control system (DCS), usually from energy sources including chemical, gravitational, thermal, biological, kinetic, nuclear, and pressure.7 In many cases the danger is assumed to be certain with LD = 100% (LD^ = 0), as shown in Table A2. In some cases, the likelihood of the danger, such as an earthquake, may be lower.
- Hazard – a consequence output of the DCS that is a threat input to the hazard control system (HCS) of the probability LH that could pass through the HCS to result in harm. If the danger is not controlled by the DCS, then the hazard must be controlled by the HCS.
- Harm – the risk consequence with a probability of occurrence of LC resulting from a danger passing through both the DCS and HCS.
The primary risk is thus defined by the three events linked by the DCS and HCS. The control systems are the key to controlling the probability of a harmful event. The systems are modelled by the probability of the system failing to control the input threat to prevent the output consequence.
The probabilities or likelihoods of a threat successfully passing through a system are rated as shown in Table A3 in the appendix. The likelihood of the system shown in Figure 1 producing the output consequence LC is the mathematical product of the likelihood of the threat occurring LT and the likelihood SYSLP that the system will propagate or fail to control the input. Thus LC = SYSLP * LT. If the logarithmic ratings in Table A3 are used, then the output rating LC^ is the sum of the input LT^ and system SYSLP^ likelihood ratings, thus LC^ = SYSLP^ + LT^.
For the safety risk modeled by Figure 2, the two control systems work together in series to prevent the danger from resulting in the harm.
Figure 2: Danger Risk Structure model of a safety risk – The model shows the path from the danger to the harm. The model has the DCS composed of safety equipment that controls the danger from resulting in a hazard event. The HCS is human activity controlling the hazard to prevent the harm. If both control systems are effective, the likelihood of harm occurring, even for highly likely dangers, can be very low. For quick safety analysis, the DCS and HCS can be analyzed as a single system.
The DCS is protective equipment designed to have a probability of DCSLP of controlling the danger event or situation to prevent a hazard. The DCS could range from simple personal protective equipment (PPE) to complex contamination control systems. As shown in Figure 2, the likelihood of the hazard being realized is LH = LD * DCSLP. The HCS describes how people might observe potential hazards and modify their behavior or make other adjustments to decrease the likelihood of the hazard resulting in harm. The HCS reduces the probability of the possible hazard LH from resulting in the harm LC such that LC = LH * HCSLP. The combined DCS and HCS have a cumulative probability of controlling the danger calculated as LC = LD * DCSLP * HCSLP. Thus, the danger is controlled by building control systems to make DCSLP and HCSLP as small as reasonably possible.
Depending on the specific safety risk, the primary risk can be described by only one of the two control systems. In some cases, DCSLP >> HCSLP places the burden of control on the HCS, requiring human observation and actions as the primary method of preventing harm. However, in other risks, DCSLP << HCSLP places the burden of controlling the danger on physical and mechanical systems, with little burden placed on human observation and actions because the likelihood of the hazard is minimal.
The key to analyzing any safety risk is to estimate the likelihood of the input danger LD and the DCS and HCS likelihoods DCSLP and HCSLP. Evaluating the two SYSLPs must be done subjectively based on an evaluation and understanding of the mechanism of how the control systems were designed, constructed, and operated to minimize the likelihood of the input threats resulting in the output consequences.
However, both the DCS and HCS may be subject to important secondary risk events that must be identified and understood. The secondary risk events, should they occur, might significantly change DCSLP or HCSLP, thus the two likelihoods become essentially ranges with a focus on examining their worst-case values for properly evaluating the primary risk. In some cases, controlling the secondary risk events is vital to controlling the primary risk. If the system has a history, then frequency data may be available, but most often, a subjective estimate must be made on how the system is expected to perform, including the secondary risk factors.
Modeling Complex Safety Risks
Taking the simple primary risk structure shown in Figure 2 and adding secondary risk factors is shown in Figure 3. While a simple risk is a sequence, a complex risk occurs when a control system has both a primary threat and secondary threats, resulting in a network structure. While secondary risks cannot cause the primary risk consequence, they can significantly increase the likelihood of the primary risk threat, in this case the danger, passing through the primary control systems to result in the primary consequence (harm).
Figure 3: Danger Risk Structure model of a complex safety risk – A simple risk becomes a complex risk when the performance of the primary risk system (SYSLP) can be significantly impacted by secondary risk events (threats).
If the secondary risks are relatively obvious, then they can be included in the initial analysis of DCSLP and HCSLP. However, if the secondary risk factors are significant, they may need to be controlled. Thus, it may be appropriate to analyze the secondary risk factors in more detail as a complete cause-system-effect relationship to fully understand their potential impact on the primary safety systems and, when necessary and appropriate, introduce additional secondary risk control systems.
Modeling Secondary Risk Factors To A Primary Risk
Using the fundamental definition of a risk shown in Figure 1, secondary risks can be defined as prior threats that might pass through a threat control system (TCS) to result in a secondary risk event to a primary system. In the case of secondary risk factors, the threat’s consequence is a significant change by one or more orders of magnitude in a primary control system’s likelihood SYSLP of propagating the primary threat input. A complex risk structure for a safety risk is shown in Figure 4.
Figure 4: Alternative SRS for describing a complex safety risk – The landscape describes vertically a primary risk of the danger producing harm and one or more secondary risk events impacting the performance of the primary control systems. Secondary risk events that significantly increase either DCSLP or HCSLP may require either changing the DCS or HCS to reduce the impact of the secondary threat or building TCSs that significantly reduce the probabilities of the secondary threats occurring to the primary control systems.
The landscape shown in Figure 4 introduces an approach for controlling important secondary threat events. The impact on the primary control systems results when secondary risks might increase by at least an order of magnitude a primary system’s probability of failure (DCSLP or HCSLP), significantly increasing the chances of the primary risk occurring.
For a detailed risk analysis, the DCSLP and HCSLP might be viewed as ranges based on the probability of occurrence of the secondary risk events. In some cases, occurrence of the secondary risk events can completely compromise the ability of a control system to function, SYSLP = 100%, requiring additional control systems to be added to better control the secondary threat.
While methods exist for rigorously solving the risk network structure describing both the primary and secondary risks, they are very difficult and far beyond the scope of this article. These methods include probabilistic graphs, Bayesian networks, or acyclic direct graphs.7,8,9 Such advanced mathematical solutions may not be appropriate because of the high subjectivity and uncertainty of the information nor necessary to achieve order-of-magnitude estimates.
The approach used in this article is to decompose the primary and secondary risks and subjectively evaluate them separately and then collectively analyze them to understand and accept or remediate them as individual risks. While such an approach lacks mathematical rigor, the approach can be efficiently used by well-informed and experienced individuals and teams of experts to successfully manage a wide variety of risks.
As shown in Figure 4, secondary factors can be identified, modelled, and then individually evaluated as risks defined by Figure 1. If they are a significant threat to a primary control system, they can be remediated by developing a secondary threat control system that reduces the likelihood of the secondary threats occurring. In some cases, the control features for remediating the secondary threats can be placed within the primary control system’s design.
The following examples explain how simple and complex safety risks can be structured and evaluated in the context of possible secondary risk factors.
Example: Using Protective Gloves
The first example describes how a safety risk can be quickly viewed and analyzed. Figure 5 is one of several possible SRSs for analyzing the risk of using disposable gloves to prevent an operator’s hands from becoming contaminated when handling something that might be contaminated. The severity of the risk events (cause and effect) is determined by the nature of the contaminant and its possible impact on the subject.
Figure 5: PPE Safety Risk – This SRS describes the risk of handling a contaminated object using gloves. The SRS is simple enough for an intuitive understanding of DCSLP and HCSLP to assess and manage both the primary and secondary risks.
In this example, the severity of the danger (the contaminant on the object) is assumed to be high and to be certain (LD = 100%). If the gloves are used appropriately, they have a low probability (Low DCSLP) of passing the contaminant to the user’s hands as a hazard. The secondary risk factors shown might impact the gloves’ ability to protect the subject by increasing DCSLP and would be considered when estimating a value of DCSLP. If the danger was particularly acute, secondary threat control systems can be implemented to minimize the likelihood of compromising the DCS’s performance. Also, if the likelihood of the contamination LD is rare, the requirements for the DCS and HCS may be less rigorous.
The HCS also plays an important role in preventing harm by providing a low probability HCSLP of the hands becoming contaminated by managing the interaction of the gloved hands and the contaminated object. Thus, using the estimated probabilities shown in Figure 5, the subject has a very low probability of harm. Of course, the analysis must consider a variety of secondary risk factors (inappropriate type of gloves, damaged glove, improper donning, etc.) in estimating the SYSLP probabilities. Again, if the risk severity is potentially catastrophic, then additional secondary threat control systems can be implemented to reduce the likelihood of harm occurring from the primary risk.
The approach shown works for a wide variety of personal protective equipment such as face masks, personal behaviors such as social distancing to prevent transmitting airborne viruses, analyzing the risk of contaminating the surfaces of other objects, or other risks such as contaminating eyes, inhalation, etc.
Example: Handling Antibody-Drug Conjugates
A more complicated safety risk is summarized by the SRS shown in Figure 6. The handling of toxic compounds such as those used in antibody-drug conjugates (ADCs) poses a very serious risk to operators. The initial risk analysis might consider the two systems shown in Figure 6 to make an initial assessment of the primary risk, laying the foundation for a more detailed analysis that includes examining secondary risks and building the appropriate danger, hazard, and secondary threat control systems.
Figure 6: ADC safety risk – This is one possible SRS for managing safety risks for highly toxic warhead compounds required for making ADCs. Note that likelihood ratings are used instead of probabilities. The severity and probability guesses shown are for illustration purposes.
In this example, the likelihoods are described using the logarithmic scales shown in Figures A2 and A3. The danger likelihood is rated certain (LC^ = 0). The team reaches a consensus that the DCS DCSLP^ has a rating of -4 (0.01%) and the HCS a rating HCSLP^ of -0.3 (50%) Thus, in the event of the hazard occurring, the HCS has only a 50% probability of preventing the harm. The two primary control systems have a combined SYSLP^ of -4.3 (0.005%).
If the severity of the possible illness is SC^ = >6 (Table A1), then the ARL for the risk is positive, making the DCS and HCS as described difficult to accept. If the risk is not acceptable, then the DCS and HCS can be improved to reduce the respective SYSLP^s as evaluated by a team of experts to reduce LC several orders of magnitude to reduce the ARL below zero. The DCS and HCS can be improved by building and evaluating SRSs for the important secondary risks that might compromise the control systems’ performance.
Each of the systems, especially the sequence of containment systems, might be expanded to fully understand how the danger of the toxic warhead to the operating personnel might be controlled to prevent harm. The expansion of the protection systems might include multiple threat paths (contact, inhalation) or multiple operational steps (setup, manufacturing, cleaning, changeout), requiring several SRSs be developed and evaluated. In some cases, probability estimates by experts may be confirmed or experimentally tested using challenge testing or frequency analysis of system steps and their performance mechanisms.
Most risks can be structured in several different ways depending on the experience and knowledge of the individual or team evaluating the risk. In many cases, sufficient knowledge and experience are available to reasonably estimate both the severity and likelihood of the threat events and the performance probabilities of the systems. However, additional information and experimental data should be collected to assure the probability estimates are the appropriate order of magnitude for the risk being evaluated.
Managing Safety Risks
The primary goal of this article is to present a simple approach for understanding safety risks. While the nomenclature may look complicated, the basic concept of using a causal definition for structuring the flow of threats and estimating probabilities to understand both simple and complex risks is relatively straightforward. By intuitively understanding how risks can be structured to describe the flow of risk events using simple thought experiments, individuals and teams can better assess situations to identify and accept or mitigate potential safety risks.
While risks are identified by their severity, they are managed by identifying, understanding, and, when necessary, manipulating their likelihood of occurrence. Understanding a risk is initiated by a thought experiment to identify how likely threats flow through systems to result in consequences of concern. By intuitively understanding the likelihood of the threat occurring and the likelihood of the systems controlling the threat in the context of secondary risk factors, an order of magnitude estimate of likelihood of harm can be guesstimated. Should the initial likelihood estimate be unacceptable in the context of the risk event’s severity, the primary control systems can be improved or additional systems added for controlling secondary risks, thus reducing the likelihood of the harm event occurring. Of course, an alternative approach of avoiding or removing the danger altogether, if possible, can also be explored.
- Witcher, M., Principles and Concepts of System Risk Structures for Understanding and Managing Risks, Bioprocess Online, Dec. 6, 2021. https://www.bioprocessonline.com/doc/principles-and-concepts-of-system-risk-structures-for-understanding-managing-risks-0001
- ISO 31000:2018 – Risk Management – International Organization for Standardization.
- Hubbard, D. The Failure of Risk Management, Wiley, 2009.
- FDA (CDER/CBER) – Guidance for industry: ICH Q9 quality risk management. June 2006. ICH.
- Witcher, M., Rating Risk Events: Why Adjusted Risk Likelihood (ARL) Should Replace Risk Priority Number (RPN), Process Online, April 7, 2021 https://www.bioprocessonline.com/doc/rating-risk-events-why-we-should-replace-the-risk-priority-number-rpn-with-the-adjusted-risk-likelihood-arl-0001
- Ericson, C.A., Hazard Analysis Techniques for System Safety, 2nd Edition, Wiley & Sons. 2016.
- Fenton, N. and M. Neil, Risk Assessment and Decision Analysis with Bayesian Networks, 2nd edition, CRC Press, 2019.
- Kjaerulff, U. and A. Madsen, Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis, 2nd edition, Springer, 2013.
- Sucar, L., Probabilistic Graphical Models – Principles and Applications, Springer, 2015.
Appendix – SRS Rating Scales
System risk structures (SRSs) require three simple scales that provide straightforward ratings for describing and managing risks. The logarithmic scales span the entire useful ranges and can provide simple rating values for describing and communicating a risk’s basic attributes. The first two tables describe a risk event’s attributes of severity of impact (Table A1) and probability (likelihood) of occurrence (Table A2). The third scale (Table A3) rates the likelihood that a system will fail to prevent an input threat such as a danger or hazard event from resulting in the output risk consequence event.
Table A1: A logarithmic scale of a risk’s impact severity from no impact (zero) to catastrophic (7+). A risk’s severity is described by the rating SX^. A subjective scale is also provided.
Table A2: Scale for describing the probability of the risk events occurrence. The scale provides a rating LX^ calculated as the logarithm of the probability. A rating of 0 represents certainty (100%). A rating of -7 represents a probability approaching zero (impossible).
Table A3: Scale that describes the likelihood, expressed as a probability, of system SYS not controlling the threat, thus producing the risk consequence. The scale produces a rating as the log of the probability that the input event will pass through the system to result in the output or outcome. If SYSLP^ is equal to zero (certain), the system cannot control the threat. If SYSLX^ equals -7, then the system is essentially a barrier that will prevent the input from producing the output.
About The Author:
Mark F. Witcher, Ph.D., has over 35 years of experience in biopharmaceuticals. He currently consults with a few select companies. Previously, he worked for several engineering companies on feasibility and conceptual design studies for advanced biopharmaceutical manufacturing facilities. Witcher was an independent consultant in the biopharmaceutical industry for 15 years on operational issues related to: product and process development, strategic business development, clinical and commercial manufacturing, tech transfer, and facility design. He also taught courses on process validation for ISPE. He was previously the SVP of manufacturing operations for Covance Biotechnology Services, where he was responsible for the design, construction, start-up, and operation of their $50-million contract manufacturing facility. Prior to joining Covance, Witcher was VP of manufacturing at Amgen. You can reach him at email@example.com or on LinkedIn (linkedin.com/in/mark-witcher).