Methods, at high levelHow the §3 test actually works.
Race inference (BIFSG). Insurers do not collect race. To run the test, we infer it. The proposed default is Bayesian Improved First Name Surname Geocoding — first name and surname against the 2010 Census surname file, combined with geographic information at census-tract granularity, returning a probability vector across racial categories plus an argmax classification. We default to argmax for the screening test (interpretation is cleaner) and run the probability-vector version as a secondary corroboration when the screen flags. We commit to a waterfall fallback (BIFSG → BISG → drop) before we run the test — picking the better of two methods after the fact is explicitly disallowed.
The reference-vs-test regression.We build two regressions on top of your model’s output. The reference model: outcome ~ model_output + standard controls (age, gender, product type, etc.) — no race. The test model: outcome ~ model_output + controls + race. We compare the model_output coefficient between the two regressions via Z-test (sample-size robust) and 95% confidence-interval overlap (sample-size sensitive). Both tests run because they have opposite sample-size behaviors — using both calibrates against false signals from very large or very small samples.
What a flag means. A flag is not a regulatory fail. If the model_output coefficient collapses toward zero when race enters the test model, the model was riding on race-correlated proxies — that demands action. If the coefficient stays stable and race itself becomes significant, race correlates with the outcome independently of your model — that is an explainable population fact (see CDC mortality differentials) and is documentable rather than disqualifying. Investigation methods (SHAP, per-feature race correlation, ablation, partial dependence by group) are all acceptable; the narrative is judged on whether the reasoning is sound and the remediation, if any, is documented.
What we ship per model. A versioned narrative document organized to the ACLI §8 template: model identification, prediction target, feature inventory with ECDIS rationale, race-inference methodology, reference-vs-test results, sample-size analysis, findings, governance trail, sign-offs. One document per model, updated on every retrain and every test re-run. Persisted to a durable store with a queryable audit trail. When the Division asks for Model X, the document is the response.