Skip to content
Vesanor
Reports Docs Security Dashboard
Get Beta Access
Reports

Model Reliability Reports

Each report evaluates multiple models against the same workflow, contract, and expected outcome.

Cybersecurity
1 of 6 passed

Cybersecurity OIDC Audit

Can AI tell who broke into your cloud?

6 models, 3 required investigation steps. Each model broke the workflow differently — the failures reveal how models reason about multi-step security investigations.

GPT-4.1 mini2 calls
GPT-5.13 calls
GPT-5.24 calls
GPT-5.44 calls
Claude Opus 4.65 calls
Claude Sonnet 43 calls
Read full report →
Run your own test →
Vesanor
© 2026 Vesanor
Reports Docs Security Sign In