Cut AI agent incident triage time from 4 hours to 28 minutes
Headline outcome
a global fintech · Financial services / payments · 2024
AI security guardrails for a global fintech
Context
A global fintech was preparing to put its first production AI agent in front of customers. The agent had access to billing, dispute, and account-status tools. The board wanted three answers before launch:
- What can it do that we don’t want it to do?
- What happens if it tries?
- How do we know on day 91 that the answer hasn’t drifted?
The engineering team had built the agent in eight weeks and wanted to ship. The risk function wanted six months. We had three.
Risk
Three categories were live:
- Tool-misuse risk — the agent could chain billing-write tools in a sequence the developers hadn’t anticipated.
- Prompt-injection risk — adversarial inputs in customer messages could escalate the agent’s privileges by manipulating its system prompt.
- Output-leakage risk — the agent could emit personal data from its context window into log streams that would be retained 90 days.
The board’s tolerance for any of those reaching production was zero.
Engagement
We embedded with the AI platform team for six weeks and built three layers of guardrails:
- Input filter — a pre-prompt classifier that flagged adversarial inputs before they reached the agent. Logs went to a separate retention bucket so a flagged customer message wasn’t written into the main product event store.
- Behaviour cage — every tool the agent could call was wrapped with a policy check. The check verified the call was within the agent’s allowed list AND consistent with the customer’s authorisation context. Multi-tool sequences were scored against a known-bad-pattern library.
- Output guard — every agent response was passed through a PII detector and a hallucination-confidence scorer before reaching the customer. Below-threshold outputs were either rewritten or escalated to a human.
We red-teamed the integrated system for two weeks before launch. The team fixed dozens of issues we surfaced; we documented the residual risks and the mitigations the board accepted.
Outcome
- Incident triage time for AI-related signals fell from 4 hours (manual investigation) to 28 minutes (auto-classified, routed, escalated only when needed). - Zero policy-violating tool calls in the first 90 days of production traffic, with every attempt blocked. - The board paper for each new agent now follows the same 3-layers structure. It went from pages of unanswered questions to a short brief of “what changed since last paper”. - The AI platform team kept shipping. They moved from monthly releases to fortnightly without a security veto.
We needed AI guardrails that the board could understand and the engineering team could ship. Salvador Cloud delivered both.
The work is documented in our pillar article on AI security guardrails for fintech.
Related case studies
Next step
Working on something similar?
We'll diagnose the shape of your problem in a 30-minute call. No proposals, no pitching.