Evidence

AI Integrity Brief

We believe the most important thing a clinical AI company can do is tell the truth about what went wrong. This is our integrity disclosure.

See validation studies

The Data Integrity Event We Caught

What happened

During Paper 1 development, we discovered that a subset of our training data contained temporal leakage: future diagnosis codes were incorrectly timestamped before the index admission. This would have artificially inflated model performance.

We caught this issue during routine validation checks. The model trained on contaminated data showed suspiciously high discrimination (AUC > 0.90) that did not degrade on temporal validation as expected. This pattern triggered our review protocol.

After investigation, we identified the root cause: a date field parsing error in the data pipeline that affected approximately 8% of the training cohort. We discarded the affected data, rebuilt the pipeline with additional safeguards, and retrained the model from scratch.

The final Paper 1 model was trained only on verified clean data. The published validation metrics reflect this corrected dataset. We disclosed this event in the limitations section of the manuscript.

Why We Disclose This

Clinical AI companies do not disclose their mistakes. The standard practice is to fix the issue quietly and hope no one notices. We believe this is wrong.

The healthcare system is already skeptical of AI claims. Every vendor says their model performs better than the published alternatives. Without transparency about what can go wrong, there is no basis for trust.

We disclose this event for three reasons:

1. It demonstrates our verification process works

We caught this issue before publication, not after deployment. Our controls detected an anomaly and we investigated until we found the root cause.

2. It sets a standard for the industry

If more clinical AI companies disclosed their near-misses, the industry would have a shared body of knowledge about what can go wrong and how to prevent it.

3. It builds trust through honesty

We would rather work with customers who choose us because we told the truth than customers who chose us because we hid our mistakes.

The Five-Control Verification Protocol

After the data integrity event, we formalized a five-control verification protocol that runs on every model build.

Temporal Ordering Audit

Automated check that all feature timestamps precede the prediction timestamp. Any violation triggers pipeline halt and manual review.

Performance Ceiling Check

Validation AUC above 0.85 triggers automatic review. Unusually high performance is a red flag, not a success signal.

Temporal Degradation Test

Performance should degrade slightly on temporal validation vs. cross-validation. If it doesn't, we investigate for leakage.

Feature Importance Review

Top features must be clinically plausible. A diagnosis code driving predictions should not be a consequence of readmission.

External Review Sign-off

Before any model goes to production, an independent reviewer (not the model developer) signs off on the validation report.

Security & Compliance

Beyond model integrity, we maintain enterprise-grade security and compliance standards.

HIPAA Compliance

BAA execution before any data transfer
Minimum necessary data access
Full audit trail on PHI access

Data Security

AES-256 encryption at rest
TLS 1.3 encryption in transit
Mutual TLS for API connections

Audit & Monitoring

Full prediction audit trail
Model drift monitoring
Performance degradation alerts

Certifications

SOC 2 Type II (in progress)
HITRUST CSF (roadmap)
Regular penetration testing

Our Commitment

We commit to the following principles for all Marqi Index deployments:

We will disclose limitations

Every validation report includes a full limitations section. We will not hide what the model cannot do.

We will disclose failures

If a deployed model fails to perform as expected, we will notify affected customers within 72 hours.

We will validate before deploy

No model goes to production without independent validation on the customer's data. We will not deploy on promises.

We will monitor continuously

Every deployed model is monitored for performance drift. Degradation triggers review and, if necessary, retraining.

These commitments are not marketing language. They are engineering practices embedded in our deployment process. We welcome questions from prospective customers about how we implement them.

Questions about our integrity practices?

We welcome technical questions about our verification protocol, security practices, and deployment standards.

Book a technical call Validation studies