Dillygence

Industrial risks: the cost of your hesitation to diagnose

Industrial risk or hazard? Discover the weekly financial impact of lacking an accurate diagnosis through real-world industrial case studies.

Introduction : industrial risks often cost more than the accident… because hesitation gets billed every week

The average 10% overruns on major industrial projects are enough to turn a profitable project into an economically questionable one. Meanwhile, the plant keeps running with its workarounds, bottlenecks, and stoppages—without the bill showing up clearly in cost accounting. The issue isn't only accident prevention: it's stopping an ongoing value leak driven by poorly mapped industrial risks and postponed decisions.

Key takeaway: without scenario-based mapping, the company pays in CAPEX, OPEX and throughput—even when “nothing” happens

An industrial site doesn't “stay” stable: it drifts. Flows tighten, WIP (Work In Progress, work-in-process) grows, utilities age, and scarce skills become even scarcer. Without scenario-based mapping, the company funds the status quo at a premium—week after week.

I. Put clean definitions back on words used any which way

Industrial risk, industrial accident, technological risk, technological disaster: four terms, four levels of stakes

An industrial risk describes a combination of a feared event, a probability of occurrence, and consequences for people, the environment, assets and production. An industrial accident is the materialization of that risk. Technological risk covers risks associated with industrial activities, with a scenario-based logic and control of effects. A technological disaster refers to an accident of exceptional magnitude, with massive and lasting impacts, often beyond the site boundaries.

Semantic drift is expensive: treating “risk” like “accident” leads to late investment. Treating “technological risk” as a purely regulatory topic leads to producing documents—with no shopfloor steering.

Risk, hazard, exposure, severity: the distinction that avoids sterile debates in investment committees

A hazard is a potential source of harm. Exposure describes the possible level of contact between that hazard and a target. Severity qualifies the intensity of consequences. Risk combines probability of occurrence and severity of consequences, taking exposure into account when the situation requires it.

Some criticality methods add detectability to prioritize actions, but it is not a universal component of the definition of risk. To frame things without debate, three sources are enough: ISO 31000 for governance, INRS for prevention vocabulary, and INERIS for scenario reasoning and hazard studies.

II. A decision-maker's useful typology: linking scenarios, effects, barriers and indicators

The 6 types of risks on an industrial site: scenarios and associated indicators

  • Type 1: fire — indicators on deviation frequency and availability of detection and extinguishing barriers.

  • Type 2: explosion and overpressure — indicators on inventories, confinement conditions and testing of protective devices.

  • Type 3: toxic releases — indicators on quantities, leak kinetics and effectiveness of retention systems.

  • Type 4: chronic pollution — indicators on drift, controls and recurring non-compliances.

  • Type 5: domino effects — indicators on proximity, physical separation and dependencies between units.

  • Type 6: utility losses — indicators on availability, redundancy, switchover time and impact on OEE (Overall Equipment Effectiveness, taux de rendement synthétique).

Prevention barriers vs protection barriers: what holds, what fails, what degrades

A prevention barrier prevents the scenario; a protection barrier limits effects when the scenario starts. Both age—and not at the same pace. The question isn't “do we have a barrier,” but “does it work when everything else goes sideways.” On barrier management, CCPS (Center for Chemical Process Safety) provides a solid base with Guidelines for Risk Based Process Safety.

Utility losses, restarts and transients: where incidents get manufactured

A large share of serious deviations are born in transitions: start-ups, shutdowns, changeovers, switches to degraded mode. A loss of compressed air, steam or electricity doesn't just stop one machine—it breaks an entire sequence. Restart then becomes a scenario in its own right, with its own human errors. This bias is documented in public feedback reports from BARPI (Bureau d'Analyse des Risques et Pollutions Industriels).

Domino effects and reference sources

The domino effect often starts with a mundane constraint: proximity, co-activity, saturated retention. A “unit” event becomes a “site” event when utilities or traffic routes flip into breaking points. INERIS documents these escalation logics in technical guides. BARPI provides experience feedback that can be translated into scenarios, and API RP 754 (API Recommended Practice 754) offers an indicator framework for process safety (process safety) to move beyond “accidents-only” steering.

III. The cost of the status quo: when underperformance manufactures risk

Cash bleed and weak signals on the shopfloor

The industrial status quo has a price—even if no accident appears in reporting. An untreated bottleneck limits throughput (overall throughput) and turns demand into missed sales. Unplanned downtime triggers overtime, scrap and urgent shipments. Waiting lets assets and data age: uncertainty grows, design margins grow, CAPEX (Capital Expenditure, capital spending) grows.

  • Maintenance drift: postponed preventive work, overdue calibrations, unavailable spare parts.

  • Procedure deviations: workarounds tolerated because “otherwise nothing gets out.”

  • Deferred works: reliability interventions pushed to the next shutdown—then the one after that.

  • Obsolescence: PLCs, sensors, valves, utilities—with troubleshooting skills evaporating.

When these signals accumulate, the “unlikely” scenario becomes simply “pending.” AZF and Lubrizol showed that a site can pile up weak signals and control gaps, then tip into an event with effects beyond the fence line. Texas City (2005) illustrates a very operational point: a known but degraded barrier, tolerated practices, then a start-up sequence that derails. The useful question remains: “which barrier failed, and why wasn't it seen in time?”

IV. Mapping a site in 8 steps: a repeatable, auditable, actionable method

  1. Define the scope, including units, utilities, storages, logistics interfaces and sensitive neighbors.

  2. Build the inventory of substances and energies, with quantities, operating conditions and transient states.

  3. Segment by units and flows to link each scenario to a process portion and a dependency chain.

  4. Identify initiating events: leak, rupture, overheating, loss of cooling, operating error.

  5. Build scenarios as “process → event → propagation → effects,” with explicit assumptions.

  6. List prevention and protection barriers, with function, expected performance and failure mode.

  7. Rate scenarios with a shared rule, based on probability, severity and exposure.

  8. Build an action plan with named owners, deadlines, estimated cost and review after modification or incident.

A barrier without an owner does not exist on D-day. The goal isn't to produce pages, but to decide quickly on CAPEX and OPEX (Operating Expenditure, operating spending). Change management—MOC, management of change—must cover works, ramp-up, recipe changes and utility modifications: each change triggers a targeted review of scenarios and degraded modes. For instrumented barriers, IEC 61511 (safety instrumented functions for the process industries) frames the lifecycle from requirements to periodic testing; HSE UK (Health and Safety Executive) complements with pragmatic guides on process risk management, notably HSG65.

V. HAZ and SMS: moving from a “compliant” document to a steering tool

HAZ (Hazard Study): expected content and frequent mistakes

HAZ (Hazard Study) aims to demonstrate a structured understanding of scenarios and control measures. It must link each scenario to assumptions, barriers, evidence of performance and criticality. Three common mistakes make it unusable: incomplete scenarios that ignore transient states, listed but untested barriers, and non-replayable data because assumptions aren't versioned. Many studies are strong on the nominal case—and weak where it hurts: bounding scenarios, utilities and propagation between units—points INERIS documents in its hazard-study guides, and BARPI illustrates through concrete experience feedback.

SMS (Safety Management System): routines, evidence and safety culture

An SMS (Safety Management System) turns barriers into routines, evidence and decisions. It imposes rituals such as periodic reviews of critical barriers and post-incident analyses. Safety culture shows up in trade-offs: rate of deferrals for critical maintenance, quality of anomaly reporting, ability to stop a line without negotiating for ten minutes. If it's healthy, an operator reports a deviation early and the shop manager handles it without punishment. OSHA (Occupational Safety and Health Administration, occupational safety and health administration) frames process safety management via 29 CFR 1910.119; HSE UK links management, barriers and major hazard control.

 

VI. Quantifying economic exposure: from HSE risk to EBITDA risk

CAPEX, OPEX, margin, penalties, downtime, scrap: the full chain of the bill

Item

What you see quickly

What you see too late

Downtime

Hours lost, teams mobilized

Throughput loss, rescheduling, flow disruption

Scrap and non-quality

Scrap, rework

Quality drift after restart, customer returns

OPEX

Energy, consumables, subcontracting

Increased maintenance, permanent degraded modes

CAPEX

Repair, replacement

Over-dimensioning “to be safe” after the fact

Penalties and reputation

Contractual penalties

Loss of trust, tougher customer audits

When you know your downtime cost per hour and your margin per unit, you decide quickly. A barrier that reduces unplanned downtime often improves OEE, so it funds part of itself. Leading indicators—unavailable barriers, overdue tests, process drift—complement lagging indicators in the spirit of API RP 754. The goal isn't to add more dashboards, but to see drift coming before it costs a week of production.

 

VII. Three quantified mini-cases

Mini-case

Problem

Method

Results

Mini-case 1: utility loss and restart

An assembly shop suffers 8 micro-stops per week due to compressed air instability, with a bottleneck highly sensitive to pressure.

Scenario-based mapping focused on utility loss, barrier overhaul (pressure monitoring, maintenance, degraded mode, switchover test).

60% reduction in short stops over 6 weeks, +3 points of OEE on the bottleneck, removal of a CAPEX (Capital Expenditure, capital spending) project to duplicate a workstation.

Mini-case 2: process modification and change management

Introduction of a new product with a narrower operating window and a longer start-up sequence.

Application of change management, review of transition scenarios, update of start-up instructions, testing of interlocks before ramp-up.

Start-up scrap cut in half in the first month, one major unplanned stop per quarter avoided.

Mini-case 3: maintenance drift and barrier unavailability

In a storage area, barrier tests (detection, shutoff, retention) slip from 4 to 10 weeks late, and deviations become “normal.”

Reinstating a weekly SMS (Safety Management System) ritual on critical barriers, named owner, catch-up plan and a rule to block derogations.

Back to 95% test compliance in 2 months, clear drop in emergency interventions, measured availability gain on shared utilities.

Sources : mini-cases 1 to 3 drawn from operational feedback and consolidated data from Dillygence engagements (anonymized data).

 

VIII. Traps and countermeasures: 7 mistakes that make you pay twice

Confusing documentary compliance with real barrier control

Trap: aiming for paper compliance, then letting barriers degrade silently. Countermeasure: link each critical barrier to an owner, a periodic test and an accessible result. A test not done equals an absent barrier.

Mapping “flat” without scenarios, transients or domino effects

Trap: making a list of hazards, then concluding too quickly. Countermeasure: enforce the thread “process → event → propagation → effects,” and treat start-ups, shutdowns and degraded modes separately.

Deferring works that protect production (not only safety)

Trap: pushing back utility and critical equipment reliability work because “it still holds.” Countermeasure: classify these works as protection of throughput and cash. The day it breaks, it's not an HSE (Health, Safety, Environment, hygiène, sécurité, environnement) topic—it's a customer topic.

Forgetting change management in projects, shutdowns and ramp-up

Trap: handling changes verbally, then discovering after the fact that the scenario changed. Countermeasure: a simple change management process, triggered by any change affecting process, utilities, sequence or co-activity.

Steering only with lagging indicators (accidents, frequency rate)

Trap: believing that “no accidents” means “under control.” Countermeasure: add leading indicators on barriers, tests, deviations and critical maintenance, in the spirit of API RP 754.

Underestimating utility losses and restarts

Trap: treating utilities as a service that's “always there,” then suffering cascading downtime. Countermeasure: map utility dependencies, define redundancies and test switchovers.

Not linking risks, investments and the EBITDA trajectory

Trap: opposing prevention and performance, then arbitrating too late. Countermeasure: quantify economic exposure by scenario, link each action to CAPEX/OPEX/downtime/scrap, and review after modification.

 

FAQ — Industrial risks

What does the notion of industrial risks cover?

The notion of risks in an industrial environment covers scenarios of feared events linked to an industrial activity, with their probabilities and human, environmental, material and economic consequences. It includes “visible” scenarios such as fire, explosion or toxic release, and “silent” scenarios such as utility losses or chronic pollution. It is managed through scenarios, barriers and indicators—not only compliance.

What are the main types of industrial risks?

The main types include fire, explosion and overpressure, toxic releases, chronic pollution, domino effects and utility losses. Utility losses are often among the most expensive because they degrade OEE and trigger restarts. A useful typology links each type to verified barriers and metrics.

How do you identify and map industrial risks on a site?

An effective mapping follows eight steps: scope, inventory of substances and energies, segmentation by units and flows, initiating events, scenarios, barriers, rating, then action plan and periodic review. The guiding thread remains “process → events → barriers → effects → priorities.” The mapping must produce actionable deliverables, with traceable assumptions and named owners.

How do you quantify the total cost of industrial risks for the company?

Quantification combines direct and indirect costs: downtime cost per hour, downtime duration, restart costs, scrap, energy, non-quality and customer penalties. It adds cash impact through WIP, WCR (Working Capital Requirement, besoin en fonds de roulement), inventory, urgent shipments and throughput (overall throughput) losses. An explicit estimate makes arbitration more robust than a debate without numbers.

How do you prioritize investments to reduce industrial risks?

Prioritization is based on exposure, robustness of measures, dependencies and domino effects. The treatment portfolio follows a simple logic: eliminate, reduce, transfer, accept, then verify through evidence and tests. A measure moves up in priority when it strongly reduces exposure and also improves availability, restart time or OEE.

Dillygence links this scenario logic to a digital twin to test options, quantify exposure and accelerate decisions that improve both risk control and industrial performance.