AI security guardrails for a global fintech

Context

A global fintech was preparing to put its first production AI agent in front of customers. The agent had access to billing, dispute, and account-status tools. The board wanted three answers before launch:

What can it do that we don’t want it to do?
What happens if it tries?
How do we know on day 91 that the answer hasn’t drifted?

The engineering team had built the agent in eight weeks and wanted to ship. The risk function wanted six months. We had three.

Risk

Three categories were live:

Tool-misuse risk — the agent could chain billing-write tools in a sequence the developers hadn’t anticipated.
Prompt-injection risk — adversarial inputs in customer messages could escalate the agent’s privileges by manipulating its system prompt.
Output-leakage risk — the agent could emit personal data from its context window into log streams that would be retained 90 days.

The board’s tolerance for any of those reaching production was zero.

Engagement

We embedded with the AI platform team for six weeks and built three layers of guardrails:

Input filter — a pre-prompt classifier that flagged adversarial inputs before they reached the agent. Logs went to a separate retention bucket so a flagged customer message wasn’t written into the main product event store.
Behaviour cage — every tool the agent could call was wrapped with a policy check. The check verified the call was within the agent’s allowed list AND consistent with the customer’s authorisation context. Multi-tool sequences were scored against a known-bad-pattern library.
Output guard — every agent response was passed through a PII detector and a hallucination-confidence scorer before reaching the customer. Below-threshold outputs were either rewritten or escalated to a human.

We red-teamed the integrated system for two weeks before launch. The team fixed dozens of issues we surfaced; we documented the residual risks and the mitigations the board accepted.

Outcome

Incident triage time for AI-related signals fell from 4 hours (manual investigation) to 28 minutes (auto-classified, routed, escalated only when needed). - Zero policy-violating tool calls in the first 90 days of production traffic, with every attempt blocked. - The board paper for each new agent now follows the same 3-layers structure. It went from pages of unanswered questions to a short brief of “what changed since last paper”. - The AI platform team kept shipping. They moved from monthly releases to fortnightly without a security veto.

We needed AI guardrails that the board could understand and the engineering team could ship. Salvador Cloud delivered both.

CISO, global fintech (verbatim, anonymised pending consent refresh)

The work is documented in our pillar article on AI security guardrails for fintech.

Related case studies

a UK energy market operator · 2018

ISO 27001 for a UK energy market operator

an APAC crypto custody provider · 2021

Cloud security architecture for an APAC crypto custody provider

Next step

Working on something similar?

We'll diagnose the shape of your problem in a 30-minute call. No proposals, no pitching.

Book a discovery call