Confounding Variables in Risk Data

Population-average risk statistics can be dangerously misleading when confounding variables — unmeasured or ignored factors that correlate with both exposure and outcome — drive most of the variation. This vignette illustrates how conditioning on the right variable can change a risk estimate by orders of magnitude.

For data quality criteria and denominator problems that motivate this analysis, see the Data Quality section of the introduction. For the conditional risk functions used throughout this package, see Conditional Risks.

1. Simpson’s Paradox in Risk Data

Simpson’s paradox occurs when a trend that appears in aggregated data reverses or disappears when the data is split by a confounding variable. Risk data is especially vulnerable because:

  • Exposure varies by subgroup: Not everyone faces the same hazard equally
  • Susceptibility varies by subgroup: Age, genetics, and occupation change vulnerability
  • Reporting conflates subgroups: A single “micromorts per year” figure averages across vastly different populations

The result: a population-average micromort value may describe nobody accurately.

2. Flagship Example: Bed Falls (Age as Confounder)

Falling out of bed kills ~450 Americans per year (CPSC). The population average is 1.36 micromorts/year. But age is a massive confounder — the CDC age-stratified data reveals a 2,500-fold difference:

Age group Sex Fall deaths per 100,000/year Micromorts per night
Under 65 Both ~0.4 0.004
65–74 Male 24.7 0.68
65–74 Female 14.2 0.39
85+ Male 373.3 10.2
85+ Female 319.7 8.8

What 10 micromorts per night means

For an 85-year-old man, going to bed carries ~10 micromorts — comparable to:

  • Riding a motorcycle 60 miles (10 micromorts/trip)
  • A single ecstasy dose (13 micromorts/dose)
  • A day of skiing (0.7 micromorts/day) repeated 14 times

For someone under 65, the same activity carries 0.004 micromorts per night — essentially zero. The population average of 1.36/year describes neither group accurately.

The confounding mechanism

Age confounds the bed fall risk through two pathways:

  1. Fragility: Older adults have lower bone density, slower reflexes, and higher complication rates from identical falls
  2. Bed type and environment: Hospital beds, care home beds, and medication-induced drowsiness increase fall frequency in older populations

Neither pathway is captured by the aggregate statistic.

3. Further Examples

3.1 Bee and wasp stings (allergy as confounder)

Bee and wasp stings kill 72 Americans per year (CDC MMWR). The population rate is 0.22 micromorts/year. But nearly all fatalities are among the ~1% with venom allergy (JACI, 2015):

Subgroup Prevalence Risk per sting
No allergy (~99%) 327M people ~0 micromorts
Venom allergy (~1%) 3.3M people ~22 micromorts/sting

The confounder (allergy status) is binary and creates an extreme bimodal distribution. The population average — 0.22 micromorts/year — is meaningless for both groups.

3.2 Cow trampling (occupation as confounder)

Cattle kill 22 Americans per year (CDC MMWR). Population rate: 0.07 micromorts/year. But exposure is concentrated among ~2.9 million cattle workers:

Subgroup Population Micromorts/year
General public ~328M ~0
Cattle farmers ~2.9M ~7.5

That’s a 100-fold difference. Occupation is the confounder: it determines both exposure frequency (daily cattle handling vs never) and risk magnitude (confined spaces, agitated animals, kick zones).

3.3 Lightning strike (outdoor work as confounder)

Lightning kills 28 Americans per year (NOAA). Population rate: 0.08 micromorts/year. But outdoor agricultural workers face ~15x the risk:

Subgroup Micromorts/year
Indoor worker ~0.02
Outdoor recreational ~0.3
Outdoor agricultural worker ~1.2

The confounders are occupation and behaviour: time spent outdoors, in open fields, near tall objects, and during storm season.

3.4 Drowning (age and setting as confounders)

Drowning kills ~4,000 Americans per year (CDC). The population rate is ~12 micromorts/year. But the risk distribution is bimodal:

Subgroup Drowning rate per 100,000/year
Children 1–4 7.6
Adults 25–64 1.2
Males (all ages) 3.5
Females (all ages) 0.8

Age and sex are strong confounders. Setting matters too: swimming pools (children), natural water (adults), and bathtubs (elderly, alcohol-related) each have distinct risk profiles that the aggregate hides.

4. Recognising Confounders in Risk Data

A confounding variable must satisfy two conditions:

  1. Correlated with exposure: The confounder determines who is exposed (e.g., farmers are exposed to cattle; office workers are not)
  2. Correlated with outcome: The confounder affects the probability of death given exposure (e.g., age affects fall mortality; allergy affects sting mortality)

Warning signs of confounded risk data

Warning sign Example
Risk applies to “general population” Cow trampling at 0.07 micromorts/year
Denominator is “per year” for an activity not everyone does Horse riding at 0.5 micromorts/ride conflated with per-year
No age stratification Bed fall deaths without age breakdown
No occupational stratification Lightning deaths without indoor/outdoor split
Dramatic differences between sources Different studies report 10x different values for the same activity

What to do

When you encounter a population-average risk:

  1. Ask “conditional on what?” — identify the most likely confounders (age, sex, occupation, geography, pre-existing conditions)
  2. Seek stratified data — government agencies (CDC, CPSC, NOAA) often publish age- and sex-stratified breakdowns
  3. Calculate conditional rates — use conditional_risk() from this package to compare hedged vs unhedged scenarios
  4. Report the range, not the average — a range like “0.004–10.2 micromorts/night depending on age” is more informative than “1.36 micromorts/year”

5. Geographic Confounding: Snake Bites

Geography is arguably the most powerful confounder in risk data. The same encounter — a snake bite — has vastly different outcomes depending on location:

The 37x difference between the US and rural sub-Saharan Africa reflects differences in antivenom availability, hospital proximity, and emergency transport infrastructure — not differences in snake venom potency. A population-average snake bite risk that blends these geographies would be misleading for everyone: too high for Americans, too low for rural Africans.

The same pattern applies to dog bites (24x difference driven by rabies PEP availability). For more on the systematic framework behind these geographic estimates, see the Data Reliability vignette.

6. Geography of Disease: How Country Reshuffles Daily Risk

The leading causes of death vary dramatically by country — and not just for infectious diseases. Using IHME Global Burden of Disease 2023 age-standardised death rates, we can express chronic disease mortality as daily micromorts.

Disease death rates by country

Key findings:

  • Cardiovascular disease (CVD) is the leading killer everywhere, at 3.24 mm/day in the UK and 7.65 mm/day in India (2.4x ratio). For context, a single skydive is 8 mm — less than 3 days of background CVD risk in India.
  • Diarrheal diseases show the starkest gap: 0.02 mm/day in the UK vs 1.28 mm/day in India — a 64x difference driven by clean water and sanitation infrastructure.
  • Cancer is higher in the UK (3.95 mm/day) than India (2.3 mm/day) — a counterintuitive finding. As the OWID data insight notes, richer countries avoid both infectious and chronic disease deaths, but cancer is the exception where age structure and screening detection inflate high-income rates.

Top-15 ranking: UK vs Nigeria

Switching from a UK to Nigeria profile reveals that chronic diseases dominate daily risk far more than any acute activity, and that the gap between countries is structural — not about individual behaviour.

7. Demographic What-If: How Age Reshuffles the Top 10

Population-average rankings can shift dramatically when conditioned on age. The micromort package now supports condition_variable = "age" for bed falls, elective anaesthesia, and bathing — activities where age is the dominant confounder.

Bed falls by age group

For an 85-year-old man, a bed fall carries 10.2 micromorts per night — 2,550x the risk for someone under 65 (0.004 mm). This single activity, invisible in population-average rankings, becomes more dangerous than motorcycling.

Top-15 ranking: default vs 85+ male

Key shifts for an 85-year-old male:

  • Bed fall enters the top 15 (absent from the default ranking entirely)
  • General anaesthesia (elective) jumps from ~2 mm to 50 mm — a routine procedure becomes high-risk
  • Taking a bath rises from 0.07 mm to 0.5 mm
  • Activities with no age conditioning (mountaineering, COVID-19) remain unchanged

This demonstrates why common_risks(profile = list(age = "85_plus_male")) gives a more honest risk picture for an elderly user than the population average.

8. Implications for the Micromort Package

This package addresses confounding in several ways:

  • Geographic conditioning via filter_by_profile(list(geography = "low_income")) compares high- and low-income variants of the same risk

  • conditional_risk() and hedged_portfolio() explicitly compare conditioned subgroups (documentation)

  • cancer_risks() stratifies by sex, age group, and family history

  • vaccination_risks() stratifies by age group and vaccine type

  • regional_life_expectancy() stratifies by geography, capturing regional confounders

  • Data quality criteria in the Introduction exclude risks with unknown denominators that would mask confounding

The general principle: a micromort value is only as good as its denominator and conditioning variables.

References

Reproducibility

Show code
sessionInfo()
R version 4.6.1 (2026-06-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 26.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] micromort_0.2.0

loaded via a namespace (and not attached):
 [1] base64url_1.4     jsonlite_2.0.0    dplyr_1.2.1       compiler_4.6.1   
 [5] tidyselect_1.2.1  Rcpp_1.1.1-1.1    jquerylib_0.1.4   callr_3.8.0      
 [9] yaml_2.3.12       fastmap_1.2.0     R6_2.6.1          generics_0.1.4   
[13] igraph_2.3.3      knitr_1.51        htmlwidgets_1.6.4 backports_1.5.1  
[17] targets_1.12.0    tibble_3.3.1      units_1.0-1       maketools_1.3.2  
[21] rprojroot_2.1.1   bslib_0.11.0      pillar_1.11.1     rlang_1.2.0      
[25] DT_0.34.0         cachem_1.1.0      xfun_0.59         sass_0.4.10      
[29] sys_3.4.3         otel_0.2.0        cli_3.6.6         withr_3.0.3      
[33] magrittr_2.0.5    crosstalk_1.2.2   ps_1.9.3          digest_0.6.39    
[37] processx_3.9.0    secretbase_1.3.0  lifecycle_1.0.5   prettyunits_1.2.0
[41] vctrs_0.7.3       evaluate_1.0.5    glue_1.8.1        data.table_1.18.4
[45] codetools_0.2-20  buildtools_1.0.0  rmarkdown_2.31    tools_4.6.1      
[49] pkgconfig_2.0.3   htmltools_0.5.9  

micromort 0.1.0 | Git 94d93d2 | R 4.5.2 | Built 2026-04-18 12:20:56