01 — Observe
Capture what your agent did
ReplayCI records tool calls from real runs, drafts contracts from them, and brings the important fields into review.
Real traffic
·
Draft contracts
·
No code changes
captured
get_weather(city: "Montreal")
drafted
contract.get_weather.location
review
important fields promoted into diff review
02 — Changes
See the changes that matter
Compare new runs to trusted references and inspect the exact tool call, argument, or verdict that changed.
Exact diff
·
Run context
·
Clear verdict
Tool call: get_weather
changed
-
temperature_unit: "celsius"
+
temperature_unit: "fahrenheit"
context
Reference: gpt-4o-mini · Candidate: claude-sonnet-4
03 — Shadow
Run new models in parallel
Replay a candidate against the same inputs and contracts, compare it to production, and inspect differences before rollout.
Same inputs
·
Same contracts
·
No production risk
Production
gpt-4o-mini
4/4 pass
Candidate
claude-sonnet-4
3/4 pass
diff
Candidate added an unsupported retry policy
04 — Merge
Promote with a clear gate
ReplayCI turns rollout into an explicit decision. If contracts fail or shadow looks unsafe, merge stays blocked.
Example CI gate
Ready to merge
Pull request #184
Upgrade candidate model to claude-sonnet-4
ReplayCI blocks promotion until observation, changes review, and shadow evaluation all clear.
pass
Contract compliance
pass
Deterministic reference review
pass
Shadow comparison within thresholds
05 — Ship
Ship AI you trust
From observation to enforcement, ReplayCI helps teams move faster without losing control.