● UK · EU — Regulated fintech & energy Certifications delivered: ISO 27001 · PCI DSS v4 · DORA
Written for: CTO Head of Security CISO

Routed 100% of LLM calls through a single policy gateway

Headline outcome

a mid-market UK SaaS · Software / SaaS · 2025

An AI gateway control plane for a UK SaaS

Context

T he product teams at a mid-market UK SaaS had moved fast on AI features. Within twelve months they had three LLM-backed workflows in production: a customer-facing drafting assistant, an internal summarisation tool, and a connector that fed data into a third-party AI service. Each was built independently, with its own model credentials, its own logging approach, and its own set of informal rules about what prompts and responses were acceptable.

The head of engineering knew the architecture was fragile. There was no single place to see what the estate was doing, no consistent way to redact customer data before it reached a model provider, and no straightforward answer to a question the board had started asking: if one of these integrations misbehaved, how quickly could the team stop it and what damage could it do in the meantime?

Risk

  • Inconsistent policy enforcement. Three separate integrations meant three separate interpretations of acceptable use. A policy change on data handling had to be applied in three places, and there was no CI gate to confirm they matched.
  • No aggregate visibility. Individual application logs captured requests and responses, but no single query could tell the security team how many LLM calls the estate made in a day, which model each used, or whether any call had exposed customer data in a way the redaction logic should have caught.
  • Unchecked blast radius. Two of the three workflows had broad permissions on the underlying data store. An injected instruction or a misbehaving model call could have read and surfaced far more data than any individual workflow needed to complete its task.

Engagement

We started by mapping every LLM call in the estate against four questions: which model, which identity, what data scope, and what the maximum effect of a single bad call could be. That inventory took three days and produced more findings than the team expected. Two workflows were calling models under a shared service account; one had no rate limiting at all; none had a mechanism to halt a workflow without a code deployment.

From that baseline we designed and deployed an AI gateway as a single brokered chokepoint:

  • Model allow-listing and identity separation. Each application received its own scoped workload identity. The gateway enforced a model allow-list so no application could silently route to an unapproved provider. Short-lived credentials replaced the shared service account.
  • PII redaction at the boundary. The gateway applied a redaction layer to outbound prompts, detecting and tokenising customer identifiers before they reached any external model provider. The same layer logged redaction events so the security team could audit coverage.
  • Rate and spend limits. We set per-application call limits and an aggregate daily spend cap. Circuit-breaker logic halted any application that exceeded its threshold and routed the event to the on-call channel.
  • Tested kill switch. We built and tested a kill switch that could stop any single application, or the entire fleet, in under ninety seconds without a deployment. The test was a condition of sign-off, not an afterthought.

We also ran a red-team exercise against the two customer-facing workflows, using prompt injection and tool-abuse scenarios drawn from the OWASP Top 10 for LLM Applications. The gateway blocked every direct injection attempt; two indirect injection paths required prompt-handling changes in the application layer, which we addressed before go-live.

Outcome

  • Routed 100% of LLM calls through the single policy gateway within eight weeks of engagement start.
  • Reduced the number of distinct model credentials in the estate from seven to three scoped workload identities, each rotated automatically on a short-lived schedule.
  • Cut average time to apply a policy change across all three workflows from several days of coordinated deployments to under one hour via the gateway configuration.
  • Confirmed the kill switch halted all model traffic within ninety seconds in two tested scenarios, with no incomplete writes to the data store.

We had been shipping AI features fast and assuming we could tidy up the security model later. What we actually had was three different answers to every security question. The gateway gave us one answer, and for the first time I could show the board exactly what our AI estate was doing and what would stop it if something went wrong.

Head of Engineering, mid-market UK SaaS (anonymised)

For the design principles behind the control architecture, including how identity, least privilege, and runtime containment layer into a defensible AI estate, see AI security guardrails for fintech.

Related case studies

Next step

Working on something similar?

We'll diagnose the shape of your problem in a 30-minute call. No proposals, no pitching.