Colorado SB 21-169 · Productized service · $25K/yr · 25 models

The Reg 169 work the Division of Insurance will eventually request, handled before they ask.

An agentic compliance auditor for life-insurance ML. We run the BIFSG race inference, the §3 reference-vs-test bias regression, and the narrative documentation per ACLI’s §8 template. Every quarter, on every retrain. $25,000 per year, up to 25 in-scope models.

Service runs in your environment. Methodology, code, and narrative documents are yours from day one. Cancel anytime.

The gold dome of the Colorado State Capitol building in Denver, photographed against a deep blue sky
When the Division asks, you have the document.
Why this matters now

Colorado is the leader. Four other states are watching.

Colorado SB 21-169 — codified at C.R.S. § 10-3-1104.9 — prohibits insurance practices that result in unfair discrimination by race, color, ethnic origin, religion, sex, sexual orientation, disability, gender identity, or gender expression. The Division of Insurance is implementing the statute category by category through rulemaking; life-insurance underwriting is the current focus, with the merged regulation expected to land in 2026 readiness.

For any life insurer running ML models that touch External Consumer Data and Information Sources— third-party scores, marketing data, social and behavioral data, credit-adjacent attributes, vendor scores that themselves use ECDIS — the regulation is not an “if.” It is a “when.” The when is shorter than most operating teams have planned for. New York, Connecticut, New Jersey, and Washington are explicitly tracking the Colorado outcome and will adopt similar frameworks once the rulemaking dust settles.

What Colorado will require is not a pass/fail score. The regulation is structured as a balancing test — narrative-driven, not bright-line. What you owe the Division on request is a written document, per model, that explains what the model predicts, which features derive from ECDIS, the race-inference methodology used to test it, the §3 reference-vs-test regression results with confidence intervals, and what was done with the findings. The audit trail is the deliverable.

The cost of no answer

What it looks like when the Division asks and you do not have the document.

  • A request arrives by certified mail. Produce the narrative document for Model X. The team has 30 days to respond — fewer if the request comes mid-rulemaking comment window.
  • The model has been retrained eleven times. Nobody has the §3 test results from any of those retrains. BIFSG has never been run. Race correlation in the model is unknown.
  • The team scrambles. Senior data scientists are pulled off product work. The compliance team and outside counsel coordinate. Outside vendors get hired at spot rates. The cost of the scramble is FTE time, lost shipping momentum, and — depending on what the test reveals when it finally runs — possibly remediation, model retirement, or a consent decree.
  • The watch states notice. New York, Connecticut, and New Jersey regulators read the Colorado proceedings. A firm that was caught off-guard in Colorado is presumptively unprepared in the next state to regulate.

None of this is hypothetical. It is the operating cost of treating ML bias testing as a one-time consulting project rather than a standing operating function. The firms that get out ahead of the rulemaking own a defensible documentation posture before regulators ask. The firms that wait write that documentation under deadline pressure.

The economics, the portfolio, the timing

Three numbers a Chief Risk Officer needs to know before the next rulemaking comment window closes.

What end-to-end Reg 169 compliance costs internally vs. as a service.

Illustrative · Sovereign Action analysis, 2026

What a typical mid-market life insurer's model portfolio looks like under Reg 169.

Illustrative · Sovereign Action analysis, 2026

The regulatory timing: Colorado is the leader, four states are watching closely.

Sovereign Action analysis, 2026 · DOI rulemaking timeline + ACLI stakeholder record
Colorado SB 21-169 / Reg 169Effective for life-insurance ML in 2026.DOI initial draftACLI counter-proposalMerged rulemakingEffective + phase-inWatch states (NY · CT · NJ · WA)Adopting frameworks once Colorado settles.Monitoring CO outcomeState-specific rulemakingDay 0D304D608D912D1216D1520D1824D1825Year
Why this is non-value-add

Your data-science team should not be building this in-house.

For a life insurer’s data-science organization, Reg 169 compliance is pure overhead. It produces no business value — the model already exists, performs, makes money. The compliance work does not make the model more accurate. It requires regulatory expertise the team does not typically have. The methodology is evolving — BIFSG today, possibly a different inference layer in a year. The cadence is high: quarterly re-testing per in-scope model plus on every retrain. For a portfolio of ten to twenty-five models, that is forty to a hundred test runs per year. The deliverable is documentation, not code — the team’s strength is shipping models, not maintaining a regulatory archive.

Why we built this

A standing service the regulator can name.

The right answer is to delegate the regulatory work to a service that specializes in it — and to standardize the methodology across the portfolio so the firm has a coherent answer rather than a scattered patchwork. We track the rulemaking continuously. We extend the methodology when the DOI publishes the BIFSG bulletin or when a watch state issues its draft. The narrative documents stay current. The audit trail accumulates from day one. When the Division asks, you have the document — and you can name the service that maintains it, which is itself a defense.

What’s included

Six standing capabilities. One annual fee. No add-ons.

fact_check

Model inventory + ECDIS scoping

Every supervised model touching life-insurance underwriting decisions is reviewed for ECDIS exposure. In-scope models are documented with prediction target, downstream decision, training window, and last retrain date. Out-of-scope is itself part of the narrative.

biotech

BIFSG inference pipeline

Production-grade race-probability inference standing in your environment. First name + surname + geography against the 2010 Census surname file with documented data vintage. Probability vector plus argmax classification plus match status (full / surname-only / unmatched). Waterfall fallback (BIFSG → BISG → drop) explicitly committed before any test runs.

rule

Quarterly §3 regression testing

ACLI-aligned reference-vs-test regression on every in-scope model, run quarterly and on every retrain. Z-test plus 95% confidence-interval overlap on the model_output coefficient. Sample-size analysis. Pre-processing decisions documented (logit, percentile rank, z-score) where the score distribution warrants.

menu_book

Narrative documents per ACLI §8 template

One document per in-scope model: model identification, prediction target, feature inventory, ECDIS rationale, race-inference methodology, reference vs. test model results, findings, sample-size analysis, governance, and sign-offs. Versioned per retrain. Queryable on audit.

policy

DOI rulemaking + watch-state monitoring

Continuous tracking of the merged Colorado regulation, the BIFSG bulletin when DOI publishes it, and the watch-state developments (New York, Connecticut, New Jersey, Washington). The pipeline adapts as methodology evolves so the documentation stays current.

support_agent

Audit response · 24-hour / one-week SLA

When the Division of Insurance asks for a model's narrative document, you have it within 24 hours. Bespoke audit responses (analyses the regulator requests beyond the standing documentation) are returned within one week. The audit trail is the deliverable.

Methods, at high level

How the §3 test actually works.

Race inference (BIFSG). Insurers do not collect race. To run the test, we infer it. The proposed default is Bayesian Improved First Name Surname Geocoding — first name and surname against the 2010 Census surname file, combined with geographic information at census-tract granularity, returning a probability vector across racial categories plus an argmax classification. We default to argmax for the screening test (interpretation is cleaner) and run the probability-vector version as a secondary corroboration when the screen flags. We commit to a waterfall fallback (BIFSG → BISG → drop) before we run the test — picking the better of two methods after the fact is explicitly disallowed.

The reference-vs-test regression.We build two regressions on top of your model’s output. The reference model: outcome ~ model_output + standard controls (age, gender, product type, etc.) — no race. The test model: outcome ~ model_output + controls + race. We compare the model_output coefficient between the two regressions via Z-test (sample-size robust) and 95% confidence-interval overlap (sample-size sensitive). Both tests run because they have opposite sample-size behaviors — using both calibrates against false signals from very large or very small samples.

What a flag means. A flag is not a regulatory fail. If the model_output coefficient collapses toward zero when race enters the test model, the model was riding on race-correlated proxies — that demands action. If the coefficient stays stable and race itself becomes significant, race correlates with the outcome independently of your model — that is an explainable population fact (see CDC mortality differentials) and is documentable rather than disqualifying. Investigation methods (SHAP, per-feature race correlation, ablation, partial dependence by group) are all acceptable; the narrative is judged on whether the reasoning is sound and the remediation, if any, is documented.

What we ship per model. A versioned narrative document organized to the ACLI §8 template: model identification, prediction target, feature inventory with ECDIS rationale, race-inference methodology, reference-vs-test results, sample-size analysis, findings, governance trail, sign-offs. One document per model, updated on every retrain and every test re-run. Persisted to a durable store with a queryable audit trail. When the Division asks for Model X, the document is the response.

The fee

$25,000

Per year · Up to 25 in-scope models · Cancel anytime

The fee covers everything in the list above: scoping, BIFSG inference, quarterly testing, on-retrain testing, narrative documentation maintenance, DOI rulemaking and watch-state monitoring, and audit response within the SLAs named.

Above 25 models? The fee scales by tier — quote on request. Below the threshold, the engagement is flat-priced and the marginal cost of the 26th model is the conversation about extending scope.

Audit fee ($499 / $999) is the prerequisite, paid before the annual engagement begins.

Frequently asked

The questions a Chief Risk Officer asks before signing the engagement letter.

  • Is the regulation actually in effect today?

    The statute (Colorado SB 21-169, codified at C.R.S. § 10-3-1104.9) is law. The Division of Insurance is implementing it category by category through rulemaking; life-insurance underwriting is the current focus. The merged regulation has not yet been finalized. The Division has not committed to an effective date but the working assumption across the industry is 2026 readiness. The firms that wait until the rule is final will be writing their compliance documentation under deadline pressure — usually badly.

  • Which of our models are in scope?

    Any supervised model used in life-insurance underwriting that consumes External Consumer Data and Information Sources (ECDIS) — third-party scores, marketing data, social and behavioral data, credit-adjacent attributes, vendor scores that themselves use ECDIS. Scope is per model, not per final decision: a smoking-propensity model that feeds a mortality model that feeds a rate-class decision is three independently-in-scope models. Models that use only applicant-provided data and medical records are out of scope. The first deliverable of the engagement is your scoped inventory.

  • We don't collect race. How does the bias test even work?

    The proposed test infers race per record using BIFSG (Bayesian Improved First Name Surname Geocoding) — first name plus surname plus geography, scored against Census-derived name distributions. The output is a probability vector across racial categories plus an argmax classification. The §3 bias test then runs two regressions on top of your model's output (a reference model without race, a test model with race) and compares the model_output coefficient between them via Z-test and 95% CI overlap. If your model is riding on race-correlated proxies, the coefficient collapses when race enters the test model.

  • What does the Division actually want as a deliverable?

    A narrative document per in-scope model, updated on every retrain and every test re-run. Not a pass/fail score — the regulation is structured as a balancing test, not a bright-line disparate-impact test. The document explains: what the model predicts, which features derive from ECDIS, why the use of ECDIS is reasonable, the race-inference methodology, the test results with confidence intervals, what investigation was done if the test flagged, what was changed (if anything), and the governance trail. The deliverable is the document. Our service maintains it.

  • What does the $25K cover, exactly?

    Up to 25 in-scope models per year, end-to-end: model inventory and ECDIS scoping, BIFSG inference pipeline standing in your environment, quarterly §3 testing per model plus on-retrain testing, narrative-document maintenance (one per model, per ACLI §8 template), DOI rulemaking and watch-state monitoring with adaptation, audit response with a 24-hour SLA on existing documents and a one-week SLA on bespoke audit responses. Cancel anytime. The methodology, code, and documents are yours.

  • What if the test flags one of our models?

    A flag is not a regulatory fail. It is, in the ACLI working draft's words, an indication that there is something to be seen here. The investigation playbook is acceptable across multiple methods — SHAP, per-feature race correlation, ablation, partial-dependence-by-group — and the narrative is judged on whether the reasoning is sound and the remediation (if any) is documented. Many flags resolve as explainable population correlations rather than model defects. Some require a feature change. Both outcomes are documentable.

  • Why $25K — that seems low compared to a compliance program?

    Because the methodology is standardized and the work is largely automated. The expensive part of compliance is not the testing itself — it is the FTE time on a data-science team that doesn't have the regulatory domain context, doing this work as overhead instead of shipping models. Productizing the service across many clients is what makes the price work. The internal alternative is roughly $200K–$300K per year in fully-loaded FTE time for the same outcome.

  • What about New York, Connecticut, New Jersey, Washington?

    All four are watching Colorado closely, and the same testing primitives will generalize. When those states issue their own frameworks, we extend the engagement scope and re-quote — but the base infrastructure (BIFSG inference, regression test runner, narrative document store, audit trail) does not need to be rebuilt per state. The scoping work changes; the methodology stays.

What this is NOT

The boundaries that make the price honest.

  • Not legal counsel.

    We do not provide regulatory opinions. We ship the operating evidence your compliance team and outside counsel rely on to provide them.

  • Not actuarial review.

    We do not opine on whether ECDIS is reasonable for a given prediction task. That is the appointed actuary’s call. We ship the evidence the actuary needs.

  • Not a SaaS product.

    The service runs in your environment with your data. Nothing customer-identifiable leaves your perimeter. The methodology, code, and narrative documents are yours from day one.

  • Not other states’ frameworks (yet).

    The same primitives generalize. When NY, CT, NJ, or WA issue their drafts, we extend the engagement scope and re-quote — the base infrastructure does not need to be rebuilt per state.

The window is open

$25,000 per year. The work the regulator will eventually ask for, handled.

Start with a free twenty-minute fit call. We’ll review the scope of your model portfolio and decide together whether the service is the right next step — or whether a different starting point fits better.