Evidence
AI Development Integrity
We disclose how the model was built. Including what went wrong.
The Data Integrity Event We Caught
During Marqi Index development, an AI coding assistant produced a spurious AUC of 0.937 by training on a synthetic patient-level dataset generated from cohort-level aggregates.
The error was caught through independent recomputation against source-tagged data. The final, validated AUC on real temporal validation data is 0.807.
We disclose this in Paper 1, in our internal verification protocols, and on this page.
Five-Control Verification Protocol
Every model build now runs through this verification protocol. Full details coming Day 5.
Source data tagging
Every row traced to source system and extraction date.
Independent recomputation
Metrics recomputed on source data by separate team.
Calibration audit
Calibration metrics reviewed before any deployment.
Limitations documentation
Known limitations documented before publication.
Questions about our verification process?
If you have ever evaluated an AI vendor whose performance claims could not survive a careful audit, you understand why this matters.
Talk to our team