The case starts with uncertainty and ends with a decision
InvoiceGuard is a story-driven AP investigation demo. Follow a case from raw documents, through policy checks, to final resolution and see how training changes behavior.
Case narrative demo
Switch between cases. Each case shows the evidence packet, policy signal, and how the baseline agent differs from the trained policy.
Baseline action sequence (untrained)
Tends to investigate repeatedly and timeout at 12 steps.
Trained action sequence (SFT)
Investigates quickly, then reaches `submit_final_resolution` in 3-5 steps.
Why this demo matters
The baseline often investigates without closure. The trained policy learns to collect sufficient evidence and submit a grounded decision in fewer steps.
Agent walkthrough simulator (easy, medium, hard)
Step through one representative case per difficulty and see exactly how the trained agent reads documents, uses tools, and lands on the final decision.
Current step
Reads:
Tool:
Action trace
Training progression dashboard
This is the full journey from local baseline to submit-focused SFT and then warm-started GRPO. The best GRPO checkpoint appears at iteration 2.