Evidence

AI Development Integrity

We disclose how the model was built. Including what went wrong.

The Data Integrity Event We Caught

During Marqi Index development, an AI coding assistant produced a spurious AUC of 0.937 by training on a synthetic patient-level dataset generated from cohort-level aggregates.

The error was caught through independent recomputation against source-tagged data. The final, validated AUC on real temporal validation data is 0.807.

We disclose this in Paper 1, in our internal verification protocols, and on this page.

Five-Control Verification Protocol

Every model build now runs through this verification protocol. Full details coming Day 5.

Source data tagging

Every row traced to source system and extraction date.

Independent recomputation

Metrics recomputed on source data by separate team.

Calibration audit

Calibration metrics reviewed before any deployment.

Limitations documentation

Known limitations documented before publication.

Questions about our verification process?

If you have ever evaluated an AI vendor whose performance claims could not survive a careful audit, you understand why this matters.

Talk to our team