Evidence
AI Integrity Brief
We believe the most important thing a clinical AI company can do is tell the truth about what went wrong. This is our integrity disclosure.
See validation studiesThe Data Integrity Event We Caught
What happened
During Paper 1 development, we discovered that a subset of our training data contained temporal leakage: future diagnosis codes were incorrectly timestamped before the index admission. This would have artificially inflated model performance.
We caught this issue during routine validation checks. The model trained on contaminated data showed suspiciously high discrimination (AUC > 0.90) that did not degrade on temporal validation as expected. This pattern triggered our review protocol.
After investigation, we identified the root cause: a date field parsing error in the data pipeline that affected approximately 8% of the training cohort. We discarded the affected data, rebuilt the pipeline with additional safeguards, and retrained the model from scratch.
The final Paper 1 model was trained only on verified clean data. The published validation metrics reflect this corrected dataset. We disclosed this event in the limitations section of the manuscript.
Why We Disclose This
Clinical AI companies do not disclose their mistakes. The standard practice is to fix the issue quietly and hope no one notices. We believe this is wrong.
The healthcare system is already skeptical of AI claims. Every vendor says their model performs better than the published alternatives. Without transparency about what can go wrong, there is no basis for trust.
We disclose this event for three reasons:
1. It demonstrates our verification process works
We caught this issue before publication, not after deployment. Our controls detected an anomaly and we investigated until we found the root cause.
2. It sets a standard for the industry
If more clinical AI companies disclosed their near-misses, the industry would have a shared body of knowledge about what can go wrong and how to prevent it.
3. It builds trust through honesty
We would rather work with customers who choose us because we told the truth than customers who chose us because we hid our mistakes.
The Five-Control Verification Protocol
After the data integrity event, we formalized a five-control verification protocol that runs on every model build.
Temporal Ordering Audit
Automated check that all feature timestamps precede the prediction timestamp. Any violation triggers pipeline halt and manual review.
Performance Ceiling Check
Validation AUC above 0.85 triggers automatic review. Unusually high performance is a red flag, not a success signal.
Temporal Degradation Test
Performance should degrade slightly on temporal validation vs. cross-validation. If it doesn't, we investigate for leakage.
Feature Importance Review
Top features must be clinically plausible. A diagnosis code driving predictions should not be a consequence of readmission.
External Review Sign-off
Before any model goes to production, an independent reviewer (not the model developer) signs off on the validation report.
Security & Compliance
Beyond model integrity, we maintain enterprise-grade security and compliance standards.
HIPAA Compliance
- BAA execution before any data transfer
- Minimum necessary data access
- Full audit trail on PHI access
Data Security
- AES-256 encryption at rest
- TLS 1.3 encryption in transit
- Mutual TLS for API connections
Audit & Monitoring
- Full prediction audit trail
- Model drift monitoring
- Performance degradation alerts
Certifications
- SOC 2 Type II (in progress)
- HITRUST CSF (roadmap)
- Regular penetration testing
Our Commitment
We commit to the following principles for all Marqi Index deployments:
We will disclose limitations
Every validation report includes a full limitations section. We will not hide what the model cannot do.
We will disclose failures
If a deployed model fails to perform as expected, we will notify affected customers within 72 hours.
We will validate before deploy
No model goes to production without independent validation on the customer's data. We will not deploy on promises.
We will monitor continuously
Every deployed model is monitored for performance drift. Degradation triggers review and, if necessary, retraining.
These commitments are not marketing language. They are engineering practices embedded in our deployment process. We welcome questions from prospective customers about how we implement them.
Questions about our integrity practices?
We welcome technical questions about our verification protocol, security practices, and deployment standards.