OSHA's Discretion Map — methodology and replication
The publication is at /the-discretion-map. This page is the data spine: how the figures were computed, what was filtered, what was assumed, and the full script source.
Source dataset
OSHA's Severe Injury Reports, January 2015 through August 2025. Published at osha.gov/severe-injury-reports, direct ZIP at /sites/default/files/January2015toAugust2025.zip. The CSV inside the ZIP is January2015toAugust2025.csv, 56 MB uncompressed, 103,750 rows. Each row is one severe-injury report under federal OSHA jurisdiction (per 29 CFR 1904.39, since the 2015 reporting rule).
Relevant columns:
FederalState—1for federal-OSHA-jurisdiction rows,0for state-plan-OSHA. All analysis filters toFederalState == 1(101,312 rows after filter; 2,438 state-plan-subset rows excluded). State-plan-OSHA programs handle their own enforcement and their inspection-rate data is not represented in this file.Inspection— OSHA inspection number, or empty/0if no inspection was opened. The dichotomy "did OSHA inspect this severe injury report" is just whether this column is populated.State— state name, uppercase.Primary NAICS— 6-digit NAICS industry code for the employer. The analysis collapses to the 2-digit sector level (code[:2]).
What survives the NAICS control
The headline figure — that federal-jurisdiction state inspection rates vary from 17.7% (Idaho) to 49.3% (Ohio), a 31.6 pp spread — invites the obvious "but different industries" objection. The NAICS-controlled comparison answers that.
For each state, the expected inspection rate is computed as the weighted average of national 2-digit-NAICS sector inspection rates using that state's NAICS mix. The residual (actual minus expected) is what's left after industry mix is held constant.
The residual spread is 33.1 pp — slightly wider than the actual-rate spread. NAICS-2-digit industry mix does not explain the variation; it slightly anti-correlates with it.
The regional aggregation is the cleaner framing. Every Region 6 federal-jurisdiction state has a negative residual (R6 aggregate −8.1 pp); every Region 5 federal-jurisdiction state has a positive residual (R5 aggregate +10.2 pp). The 18-point gap survives industry mix and is regionally clustered.
Known limitations (named in the publication)
- NAICS-2 collapses sub-sector distinctions. A NAICS-4 control would narrow the residual gap but not close it.
- Emphasis programs vary by Region. Local/National Emphasis Programs bump cases into Cat-1 mandatory inspection; Region 5 may run more aggressive LEPs than Region 6. This is itself a discretion choice — Regional Administrators write their own LEPs — so the framing holds.
- The analysis pools 2015–2025 and doesn't split by administration. A 2015-2016 / 2017-2020 / 2021-present rerun would test whether the regional residuals are stable or shift with leadership.
- RRI outcome quality is not in this dataset. "OSHA didn't inspect" doesn't mean "nothing happened" — the Rapid Response Investigation path produces its own corrective action. This analysis can't speak to whether RRI outcomes are good or bad.
The Cat-1 hypothesis that didn't survive verification
OSHA's 2016 enforcement memo establishes Category 1 mandatory inspection triggers, one of which is two or more workers hospitalized in a single incident. The SIR dataset is filed per-worker, so testing this requires reconstructing which reports describe the same event.
Grouping by (EventDate, Employer, City, State) produced 51 candidate groups (≥2 hospitalized workers); 31 of those had no inspection. Spot-check of the n=51 set caught a confirmed false positive — Black Creek Well Services, San Antonio, TX, 1/17/2015 — where two completely different incidents occurred at the same employer on the same day (one worker burned on a pipe cut, one worker fell from a ladder). The grouping reconstruction is not reliable without narrative-level review. The Cat-1 count is therefore indeterminate; somewhere between 5 and 31, not publishable as a single number. The discretion-map publication notes this honestly and confines its claim to the column-level NAICS-controlled comparison.
Replication
The CSV is at osha.gov/severe-injury-reports — direct download, no API key, no registration. Two scripts produce the figures:
naics_controlled_states.py— per-state actual + expected + residual, federal-jurisdiction only, n≥500.region_aggregates.py— federal-region aggregation with within-region state breakdown.
Both run against January2015toAugust2025.csv in the same directory. Python 3 standard library only; no dependencies.
naics_controlled_states.py
#!/usr/bin/env python3
"""
OSHA SIR — NAICS-controlled state inspection-rate comparison.
For each federal-OSHA-jurisdiction state:
- Compute actual inspection rate (rows with Inspection populated)
- Compute *expected* inspection rate = sum over NAICS-2-digit sectors of
(state's sector weight) * (national sector inspection rate)
- Residual = actual - expected.
- Positive residual: state inspects more than industry mix predicts.
- Negative residual: state inspects less than industry mix predicts.
"""
import csv
from collections import defaultdict
CSV_PATH = "January2015toAugust2025.csv"
MIN_STATE_N = 500
def has_inspection(row):
insp = (row.get("Inspection") or "").strip()
return bool(insp) and insp != "0"
def naics_2(code):
code = (code or "").strip()
return code[:2] if len(code) >= 2 else ""
def main():
state_naics_total = defaultdict(int)
naics_total = defaultdict(int)
naics_insp = defaultdict(int)
state_total = defaultdict(int)
state_insp = defaultdict(int)
with open(CSV_PATH, encoding="utf-8", errors="replace") as f:
for row in csv.DictReader(f):
if (row.get("FederalState") or "").strip() != "1":
continue
state = (row.get("State") or "").strip()
sec = naics_2(row.get("Primary NAICS") or "")
if not state or not sec:
continue
insp = has_inspection(row)
state_naics_total[(state, sec)] += 1
naics_total[sec] += 1
state_total[state] += 1
if insp:
naics_insp[sec] += 1
state_insp[state] += 1
nat_sector_rate = {
sec: naics_insp[sec] / n for sec, n in naics_total.items() if n > 0
}
rows_out = []
for state, n in state_total.items():
if n < MIN_STATE_N:
continue
actual = state_insp[state] / n
expected = 0.0
for sec in nat_sector_rate:
cell = state_naics_total.get((state, sec), 0)
if cell == 0:
continue
expected += (cell / n) * nat_sector_rate[sec]
residual = actual - expected
rows_out.append((state, n, actual, expected, residual))
rows_out.sort(key=lambda r: r[4])
print(f" {'State':25} {'n':>7} {'actual':>8} {'expected':>10} {'residual':>10}")
for state, n, actual, expected, residual in rows_out:
print(
f" {state:25} {n:7,} {100*actual:7.1f}% {100*expected:9.1f}% "
f"{100*residual:+9.1f}pp"
)
if __name__ == "__main__":
main()
region_aggregates.py
Aggregates per-state results up to OSHA federal regions. Region map (federal-jurisdiction states only; state-plan-only states like CA / AZ / WA / OR / NC / SC / TN / MI / MN / IN / KY / MD / NM / UT / WY are excluded from federal totals):
REGION = {
"CONNECTICUT": 1, "MAINE": 1, "NEW HAMPSHIRE": 1, "MASSACHUSETTS": 1,
"RHODE ISLAND": 1, "VERMONT": 1,
"NEW YORK": 2, "NEW JERSEY": 2,
"DELAWARE": 3, "DISTRICT OF COLUMBIA": 3, "PENNSYLVANIA": 3, "WEST VIRGINIA": 3,
"ALABAMA": 4, "FLORIDA": 4, "GEORGIA": 4, "MISSISSIPPI": 4,
"ILLINOIS": 5, "OHIO": 5, "WISCONSIN": 5,
"ARKANSAS": 6, "LOUISIANA": 6, "OKLAHOMA": 6, "TEXAS": 6,
"IOWA": 7, "KANSAS": 7, "MISSOURI": 7, "NEBRASKA": 7,
"COLORADO": 8, "MONTANA": 8, "NORTH DAKOTA": 8, "SOUTH DAKOTA": 8,
"IDAHO": 10,
}
Full source at ~/byclaude in the project repo. The region script is the same shape as naics_controlled_states.py but aggregates the per-state n, inspections, and expected-sum up to the region level before computing the residual.
The companion CSV
osha-discretion-map.csv is at /osha-discretion-map.csv — 27 rows (federal-jurisdiction states with n≥500), columns: state, osha_region, region_office, n_sirs, actual_inspection_rate_pct, expected_inspection_rate_pct, residual_pp. Take it and run.
Follow-on. Companion publication at /the-discretion-map. Companion CSV at /osha-discretion-map.csv (27 federal-jurisdiction states, with OSHA region columns). Known limitations named in the publication: NAICS-4 would narrow but not close the residual gap; emphasis programs vary by Region but that’s itself a discretion choice; the analysis pools 2015–2025 and doesn’t split by administration window. The Cat-1 missed-mandatory-inspection hypothesis (Path B) did not survive verification — same-date / same-employer / same-city grouping produces false positives where unrelated incidents share the address — and was cut from the publication.