Scoring Methodology

CalDoughnut Score evaluates California legislation against the Doughnut Economics framework developed by Kate Raworth and applied to California by the California Doughnut Economics Coalition (CalDEC).

The Framework

The doughnut has two boundaries: a social foundation (12 categories, 24 indicators) below which no one should fall, and an ecological ceiling (9 categories, 18 indicators) above which we should not go. The space between is the “safe and just space” where all life can thrive.

California falls short on 100% of social indicators and overshoots 89% of ecological boundaries.

The Core Metric: Shortfall & Overshoot Change

For each indicator, CalDEC has measured California's current value and defined a desirable target (the boundary threshold). The gap between current and target is the shortfall (for social indicators) or overshoot (for ecological indicators), expressed as a percentage.

  • Shortfall % (social): How far below the social foundation threshold California falls. A bill that reduces shortfall is moving California toward the safe space.
  • Overshoot % (ecological): How far beyond the planetary boundary California has gone. A bill that reduces overshoot is pulling California back within ecological limits.

Our primary output for each bill is the projected change in shortfall or overshoot percentage points (p.p.) for every affected indicator. A negative change (e.g., -2.8 p.p.) means improvement — California moves closer to the safe and just space.

Analysis Pipeline

Phase 1: Triage & Category Detection

A fast AI model analyzes each bill's title, summary, and subject tags to determine:

  • Which of the 21 doughnut categories the bill substantively impacts (most bills impact 2-5)
  • How complex the analysis will be (simple / moderate / complex)
  • Whether the bill has zero doughnut impact (procedural, commemorative bills)

Phase 2a: Standard Analysis (Simple/Moderate Bills)

For each relevant category, an AI model analyzes the bill using the full bill text, CalDEC indicator definitions with current California values and desirable targets, and data source citations. The model estimates how the bill would change each indicator's value and computes the resulting shortfall/overshoot change.

Phase 2b: Research-Augmented Analysis (Complex Bills)

For complex bills (omnibus, cross-cutting), the system first searches for relevant academic papers and policy research, then feeds this evidence into a more capable AI model for deeper analysis grounded in empirical findings.

What We Produce for Each Bill

For each impacted category and its indicators, the system produces:

  • Direction: Does the bill help (reduce shortfall/overshoot), hurt (increase it), or have mixed effects?
  • Per-indicator shortfall/overshoot change: Projected change in percentage points, with low/mid/high estimates expressing uncertainty
  • Evidence-based reasoning: Detailed chain-of-thought analysis citing specific research
  • Research citations: Academic papers and policy reports supporting the assessment
  • Fiscal impact estimates: Projected annual costs (low/mid/high) when the bill has meaningful budget implications
  • Policy mechanism tags: Each indicator estimate is tagged with the standardized policy mechanism(s) through which the bill operates (e.g., “workforce development”, “emission standards”, “housing production”). These tags drive how bill effects are combined in aggregate analysis.

Visualization

Bills are visualized using a mini doughnut ring showing which of the 21 categories are affected and in which direction. The ring has two concentric bands — the inner ring represents the 12 social foundation categories and the outer ring represents the 9 ecological ceiling categories. Affected categories are color-coded: green for improvement (reduces shortfall/overshoot), red for worsening, and orange for mixed effects. The intensity reflects the magnitude of the shortfall/overshoot change.

On each bill's detail page, you can expand any category to see full per-indicator impact bars showing the current shortfall/overshoot, the projected value after the bill, and the change — along with reasoning, evidence, and citations.

Quantitative Impact Estimates

Like CBO fiscal scores, we provide concrete projected changes for each indicator. These estimates reference comparable policies from other jurisdictions, account for California's specific context, and honestly express uncertainty with low/mid/high ranges.

This enables “doughnut budget” analysis: summing up the projected impact of all passed legislation to see how the session moves California's doughnut metrics.

How Aggregate Impact is Calculated

When viewing session-level or legislator-level impact, individual bill scores are aggregated to show the net effect of all legislation on each indicator. Here's how that works:

Step 1 — Per-Bill Predictions

For each bill, the AI predicts raw indicator changes (e.g., “reduce CO2 emissions by 0.3 tonnes/capita”). These are converted to shortfall/overshoot percentage point changes using the indicator's current California value and target threshold.

Step 2 — GBD Multiplicative Compounding

When multiple bills affect the same indicator, their changes are combined using the GBD (Global Burden of Disease) multiplicative model — the same approach used by WHO/IHME in the Lancet for 20+ years. Each bill reduces (or increases) the remaining shortfall rather than the original, so improvements naturally compound and can never exceed full closure.

This prevents the absurd results you'd get from simple addition: if 500 bills each reduce a 100% shortfall by 1 percentage point, simple addition says -500 p.p. (impossible!). GBD compounding gives ~99.3% reduction — nearly full closure, but never exceeding it.

Step 2b — Policy Mechanism Interaction

Not all bills affecting the same indicator should be combined equally. Bills are tagged with 30 standardized policy mechanism types, and these mechanisms fall into three interaction tiers:

  • Regulatory Standards (max-takes-all): Emission limits, price caps, labor standards — only the strictest binds. A $1,500/month rent cap makes a $1,800 cap redundant. Mechanisms: emission standards, chemical restrictions, price regulation, labor standards, carbon pricing.
  • Population-Constrained Programs (diminishing): Job training, housing assistance, healthcare coverage — each additional bill at 50% weight. They compete for the same people and resources. Mechanisms: workforce development, housing assistance, healthcare coverage, education programs, public health programs, and 9 others.
  • Independent Capacity (additive): Infrastructure, conservation, transparency — all count fully. Each builds something new or opens a distinct channel. Mechanisms: infrastructure investment, clean energy deployment, ecosystem restoration, conservation designation, and 6 others.

Across different mechanisms, the standard GBD compound applies — different mechanisms are independent channels that don't substitute for each other.

Worked Example: Reducing Unemployment

Three bills target U-6 unemployment (current shortfall: 50%):

  • Bill A (workforce_development): -3 p.p.
  • Bill B (workforce_development): -2 p.p. → discounted to -1 p.p. (same mechanism, 50% weight)
  • Bill C (infrastructure_investment): -4 p.p. (full weight, different mechanism)

Within workforce_development: compound [-3, -1] → -3.94 p.p.
Within infrastructure_investment: -4 p.p.
Across mechanisms: compound [-3.94, -4] → -7.63 p.p.

vs. flat GBD (no mechanism awareness): compound [-3, -2, -4] → -8.65 p.p.

Why This Approach

The mechanism interaction model is grounded in policy research:

  • GBD Comparative Risk Assessment (WHO/IHME, Lancet 2024) — foundation for multiplicative compounding
  • Perino, Ritz & van Benthem, “Overlapping Climate Policies,” Economic Journal 2024 — regulatory substitution evidence
  • Hepburn et al., “Policy Complementarity and the Paradox of Carbon Pricing,” Oxford Review of Economic Policy 2023 — different-mechanism complementarity
  • Wang, “Diminishing Returns of Policy Pilots,” Review of Policy Research 2024 — diminishing returns in overlapping programs
  • CBO scoring methodology — interaction discounts (~50%) for overlapping provisions
  • Milgrom & Roberts (1990) supermodularity framework — complementarity when addressing different binding constraints

Existing Policy Context

Each bill is scored for its marginal impact — the incremental change it would produce beyond what California's existing laws and programs already achieve. When the AI estimates that a workforce development bill reduces unemployment shortfall by 0.3 percentage points, that estimate already accounts for the existence of programs like ETP, CalWORKs, and other current workforce initiatives.

The mechanism interaction tiers (max-takes-all, diminishing, additive) handle only the interaction between new bills in the current session. They do not double-count interactions with existing policy, because the per-bill estimates are already marginal over the current baseline. The shortfall/overshoot percentages themselves reflect the state of California with all existing policy already in effect.

Uncertainty Ranges

Each bill's per-indicator impact comes with low, mid, and high estimates reflecting the inherent uncertainty in policy impact prediction. We propagate these ranges through the aggregation in two steps:

  1. Triple compounding: Run the full mechanism-aware compounding three times — once with all low estimates, once with all mids, and once with all highs. This produces a “naive” range representing the all-pessimistic and all-optimistic scenarios.
  2. √N shrinkage: The naive range assumes all bills simultaneously hit their worst or best case, which is statistically implausible. By the central limit theorem, when N independent estimates are combined, the aggregate uncertainty narrows by a factor of 1/√N. We apply this shrinkage: if 100 bills affect an indicator, the displayed range is 1/10th as wide as the naive range. With 4 bills, it's half as wide. With 1 bill, the full individual range is shown.

The resulting range (e.g., “Close 40-60% of the gap”) reflects remaining statistical uncertainty after this narrowing. The true outcome is likely somewhere within these bounds.

Where the range is narrow (less than 5 percentage points), we display only the mid estimate to keep the interface clean.

Step 3 — Gap Closure %

The compounded change is expressed as gap closure percentage — “what percentage of the gap between where CA is and where it needs to be does this legislation close?” This metric is intuitive and bounded (0-100%+), unlike raw percentage points which are hard to interpret without context.

Worked Example: GHG Emissions

Say California's CO2 footprint is 19.6 tonnes/capita with a target of ≤0.985 tonnes/capita, giving a shortfall of ~1890%.

  • Bill A predicts -15 p.p. shortfall change (reduces shortfall from 1890% to 1875%)
  • Bill B predicts -20 p.p. shortfall change
  • Simple sum: -35 p.p. GBD compound: -34.8 p.p. (nearly the same for small changes)

Now imagine 364 bills each averaging -3 p.p. change against 1890% shortfall. Simple sum: -1092 p.p. GBD compound: -1037 p.p. — roughly 55% gap closure. The compounded result reflects reality: each bill works on the remaining gap, so cumulative impact has diminishing marginal returns.

On the dashboard this shows as: “Climate Change — Close ~55% of the gap (364 bills)” with a projected value of ~9.5 tonnes/capita, down from 19.6.

Transparency

Every analysis includes full reasoning, evidence citations with links, and shortfall/overshoot calculations you can verify. All LLM calls are logged with model, tokens, cost, and latency. Scoring versions are tracked so methodology changes are transparent and reproducible.

Limitations

  • AI analysis is inherently imprecise — treat shortfall/overshoot projections as informed estimates, not definitive predictions
  • Quantitative impact estimates are based on comparable policies from other jurisdictions, not deterministic models
  • Bills may have indirect effects not captured by the 42 indicators
  • Implementation quality matters — a bill's actual impact depends on how it's executed
  • Mechanism interaction tiers (max-takes-all, diminishing, additive) are initial assignments based on policy literature and economic reasoning. The exact substitution factor (50% discount for population-constrained programs) is approximate — real-world interactions depend on specific bill details, implementation timing, and institutional context. These parameters may be refined as we gather more data.
  • Aggregate ranges use √N shrinkage (central limit theorem) to narrow the spread as more bills contribute. This assumes bill-level errors are independent — in practice, systematic biases (e.g., the AI consistently over- or under-estimating) would not shrink with N. We show ranges as guidance on the scale of uncertainty, not as precise confidence intervals. Where low/high estimates were not provided by the AI model (~30% of estimates), we synthesize them using the observed ratio (low ≈ 33% of mid, high ≈ 167% of mid).
  • The analysis system will improve over time as we calibrate against human expert review