Unless you are one of the rare birds who truly enjoys risk management, then this is typically a dreaded process for the majority of engineers. Risk Management is challenging and long process, filled with conversations, differing opinions, and confusion about what is actually the right thing to do.
To be clear this post is not about project risk management, or financial risks, or gambling risks, or adrenaline rush risks; this about developing safe products for people to use; product risk management. I’m talking hazards, harms, dFMEA’s, fault tree analysis, and all the other fun activities.
I’ll be completely upfront that most of my perspective when it comes to risk management is guided by the medical device industry, specifically ISO 14971. That said, this is a very well accepted standard on how to properly do risk management. We should expect different industries to have different perspectives, tolerances for risks, and the impact to users. However, ultimately we want to make safe products and I’m hoping to try help with that.
This is a long form conversation, there is a lot to cover and cramming everything into a single post would be a disservice to the topic. So, this conversation will be broken up into 4 parts, hopefully just 4.
Introduction to Risk and Risk Management (this post)
Risk Management Planning
Risk Management Templates with Content
Risk Management on an Example Product
Let’s dive in.
What is Risk?
Put simply risk is a calculation based on the probability of something happening multiplied by the severity of a certain harm.
Risk = Probability of Occurrence x Severity of potential harm
This is a concept that can be applied widely; how likely is something to happen, how bad will it be, that is your total risk. Most are probably familiar with this concept, but I want to ask; how do we really get there? How do we do that for a complex product? How can we make this process less painful?
Definitions, Definitions, Definitions 📚
I believe definitions are essential to creating alignment on a topic, and would highly recommend it on your side of the effort as well.
For risk the list is little long, so I won’t pick apart each one, but it will be important to reference back to this throughout.
Harms - Physical injuries or damage to the health of people. Includes damage to property or the environment.
Hazards - Source of a potential harm.
Hazardous Situations - Circumstances in which people, property, or the environment are exposed to one or more hazard(s).
Foreseeable Sequence of Events - Reasonable sequence of event leading up to a hazardous situation.
Probability - Measure of how likely something is to occur.
Severity - Measure of the possible consequences of a hazard.
Risk - Combination of probability of a harm and severity of that harm.
Hazards vs. Harms
Examples of hazards would be Sharp Edges, Hot Surface, or Electromagnetic Radiation. Examples of harms would be laceration, burns, or environmental interference. A hazard can be a source of a harm, but just because something is a hazard doesn’t mean it will necessarily cause harm.
Hazardous Situations vs. Foreseeable Sequence of Events
What is it that the something that might happen, user intervention or not, that would lead to a hazardous situation? That is what we are trying to capture with Foreseeable Sequence of Events (FSE’s), it provides more context to a particular risk.
One thing of importance here is, FSE’s don’t have to be a product of the device failing in some way, normal use of the product can be the cause of the hazardous situations. That said, lots of hazardous situations are created from a fault condition of the device (breaking, opening unintentionally, becoming damaged, etc.) Typically there are a variety of hazardous situations created just from the intended use of the device.
Parts of Risk Management
Risk management is an iterative process that starts at the outset and continues on for the entirety of the product’s life. Risk management should live with the product and continue until it is no longer sold on the market. That said, there are several key elements, and ideally, the bulk of the work is done during development of the product.
Risk Analysis
Risk Evaluation
Risk Controls
Risk Maintenance
To alleviate concern of how hard the risk management process will be, try to make things clear and simple. People love pictures and engineers love diagrams.
Remember it is iterative. It will happen cyclically and it should continue on after the initial launch of a product. While this seems simple, as some of you know, there is a lot packed into these four boxes. We will expand on the process more in future posts.
Risk Analysis
The process of identifying and describing the potential hazards, hazardous situations, and harms. Read that again and notice what it doesn’t say, we are not calculating risk here. This process in and of itself will be packed with back and forth conversation about what harm goes where, and what events can cause hazardous situations.
Usually this is done in a Top-Down approach (more on that later) where the team is assessing the product as a whole. Looking at interactions with the product and the user and analyzing what can go wrong. And remember we can cause harm to people, property, and the environment, not just a user.
As you can see from the image, Hazards begin to become categories for sequence of events, situations, and harms. Similarly Harms begin to fall into different buckets that are applied in different ways.
Hot Tip: Typically it is easier to create a common list of potential harms with definitions and associated severity prior to this step. When performing your evaluation later, it becomes a much easier conversation of how harmful a certain hazard is.
Risk Evaluation
This is where Risk enters the room. The process of analyzing the probability of occurrence and the severity of a harm. Here we are estimating how likely it is that one of these hazardous situations is to occur and what the severity is.
Evaluating probability is a judgement decision, that said there are typically resources like white papers, complaint databases, and public records that can, and should, be used to determine said occurrence. If you have a high severity situation with a truly unknown probability, then it may warrant some engineering testing to better characterize the product’s behavior.
During the process of scoring risks you should identify what risk are acceptable, need to be mitigated, and unacceptable. Typically teams will agree on risk acceptability at the outset of conducting risk management. Meaning usually there is a Risk Management Plan and in that plan there is a definition on what level of risk is acceptable and what level of risk is required to be mitigated.
Risk Control
Now that we have an idea of just how risky something is, we can understand if the design needs to have controls in place to prevent said risk. Mitigations, for most, take the form of requirements for the design. Requirements of the design are, well required, hence they must be incorporated and verified to function. More on requirements below.
In short, risk products inform a better design.
The key takeaways here are that the level of risk acceptability has to be defined and agreed upon. Also that the team should document what is being done in the design to control or prevent the risk from occurring (requirements in my case). For example, in the image above the Physical Tissue Damage may be considered as an unacceptable risk. Alternatively, teams may decide to plainly write what in the design is controlling the risk.
After identifying risk controls, the risks should be re-assessed to their new probability of occurrence. This methods allows the team to clearly understand how risk controls have improved the safety of the design. To expand on our example we would add three more rows to the right and re-assess the risk to understand how our risk score may have changed.
You might ask, “why don’t we just update the values that are already there?” which is a valid question. By adding some additional columns we can now tell a story about what the risk control has done for the design. We can observe how risky something was, and how critical the requirement may be for the safety of the product. If said requirement were removed, we can see what the resulting risk was previously assessed as. It can also make scoring the mitigated probability a little easier to have a benchmark (unmitigated probability) to score against.
Hot Tip: A mistake that many teams make is when implementing new risk controls i.e. design features, we fail to identify if any new risks have be presented with those new features. We inherently think those features are for safety and forgot to consider how they themselves may cause to new hazards or harms.
Risk Maintenance
An often forgotten part of risk management is the act of continually monitoring and updating risk deliverables for a particular product. When a product is launched risk management is not over, in fact is has truly just begun. As users come in contact with your product faults, errors, misuses, complaints, and other problems will arise which will inform risk deliverables and ultimately the design.
It’s not uncommon for companies to have entire departments devoted to the activity of monitoring and sustaining a product. For a smaller company this might not be the case and the team will need to develop a strategy to intake field information and push that back into the design of the product.
This is maintenance, so ideally, the team is reviewing and updating existing deliverables as more information is gathered. Taking new data to the hazard analysis and requirements, making updates, and planning for design changes as needed.
Where’s the FMEA though?
For those who aren’t as familiar FMEA = Failure Mode and Effects Analysis. If you are a picky eater no worries, it comes in many different flavors; Use FMEA (uFMEA), Mechanical FMEA (mFMEA), Software FMEA (swFMEA), and so on. It is usually a dreaded tool by most, but a necessary evil.
I will hit on this more later, but FMEA is what is known as a bottom-up approach to risk management. Hazard Analysis on the other hand is a top-down approach, sometimes known as a System FMEA (sysFMEA). FMEA’s will still be used, but in a different perspective. Their intent is to identify potential fault conditions of the design and understand how those can result in hazardous situation. In short, failure mode outputs of a FMEA will become Foreseeable Sequence of Events in your Hazard Analysis.
FMEA’s are just one tool to help with the risk management process. While very common they cannot (I believe) be the sole tool that supports all risk management for a product. At times they may not even be necessary for your product and other tools like a Fault Tree Analysis, may be a better fit.
Summary
Risk Management is a challenge, simple as that. It will take time, lots of time, generally more time that your team estimates, but it will be valuable and lead to a safer and better product in the end. I am hoping 🤞 to try and make this a simpler and more understandable process teams can more easily apply on projects. Some highlights from this post:
As always, create some definitions and frameworks for the team to understand and agree upon
Understand the different high-level parts of risk management
Break up the process of assessing and evaluating risk
Create a Risk Management Plan! (more on that later)
Understand and plan for risk management activities during development and after launch
Stay tuned for more!