Securing AI coding agents

[Key takeaways]

The coding-agent risk surface is not one channel. Prompts, repo context, secrets in the context window, agent shell and tool calls, and connected MCP servers each leak differently.
Blocking the tools does not work. Engineers will route around a policy that makes them slower. Control the traffic, not the calendar invite.
Layer controls: identity plus an approved-model allowlist, secret redaction at the prompt boundary, egress inspection, sandboxed agent actions, MCP gating, and audit logging.
By default, free-tier coding assistants may train on your code. Enterprise plans usually do not, but you still need to verify and enforce it at the wire.
The durable answer is inline: inspect agent traffic on the endpoint before it leaves, not in a log export a quarter later.

A coding agent is a new kind of insider

A modern coding agent is not autocomplete. It reads your repository, holds your files in a context window, calls out to a model provider, runs shell commands, edits files, and connects to external tools over the Model Context Protocol. Cursor, Claude Code, GitHub Copilot, and Windsurf all sit inside the developer's trust boundary with broad local access, and they act on instructions that can come from places the developer never audited.

That combination is why researchers now describe agentic coding assistants as insider-threat-shaped. A January 2026 systematic review of prompt-injection attacks against agentic coding assistants found that every major agent, Claude Code, Copilot, and Cursor included, was susceptible, and that adaptive attacks landed a large majority of the time. The attack is semantic, not a classic exploit: it targets the model's reasoning, so traditional signature-based controls slide right past it.

The failure mode security teams fear most is exfiltration of source and secrets. Code is a company's crown jewels: leaked source enables vulnerability discovery, prompt-injection blueprinting, and a wider agentic attack surface later. The job is to keep the agent useful while making sure the repo, the credentials in it, and the actions the agent takes never cross a boundary you have not inspected.

Five ways a coding agent leaks

You cannot control what you have not enumerated. The coding-agent risk surface breaks into five distinct channels, each with a different leak mechanism and a different control.

1. Prompts and repository context leaving the org

Every completion sends surrounding code to a model. Every agent chat can attach files, diffs, or whole directories. GitHub's own definition of Copilot interaction data includes the code around your cursor, file names, and navigation patterns. On sanctioned enterprise traffic that is expected, but the same mechanism moves proprietary code to whichever endpoint the developer's config points at, which may not be the one you approved.

2. Secrets sitting in the context window

Agents happily read .env files, config, and connection strings because they are just more repo context. Sensitive data, tokens, and credentials get pulled into agent reasoning paths and can surface in a completion, a chat reply, or an outbound request. A secret that reaches a third-party model must be treated as disclosed.

3. Model training on your code

Retention policy is a moving target. In 2026 GitHub changed its policy so that Copilot Free, Pro, and Pro+ interaction data, including code snippets and context, is used to train models unless the user opts out. Business and Enterprise plans are contractually excluded, but the point stands: what a free-tier or personal-account agent sends may become training data, and you have to enforce plan and destination at the wire, not trust a settings page.

4. Agent actions: shell and tool calls

The step past suggestion is action. Agents run tests, install packages, edit files, and execute shell commands. A prompt-injected instruction, hidden in a dependency's README, an issue comment, or a web page the agent fetched, can turn that autonomy into an attacker's hands on your machine. Auto-approve modes remove the human who would have caught it.

5. MCP servers connected to the agent

MCP is the newest and least visible layer. A single connection can inherit broad access the moment the agent attaches. Tool-poisoning attacks hide malicious instructions inside a tool's description metadata, which the agent reads but the developer never sees. A 2026 disclosure put roughly 200,000 MCP instances in an exposed state, and command-allowlist bypasses via argument injection were demonstrated against real servers. An untrusted MCP server is untrusted code with a seat at your agent's table.

Identity + model allowlist

Tie every agent session to an SSO identity and pin it to approved model endpoints. Unknown providers are blocked before a prompt leaves the machine.

Redact at the prompt boundary

Scan prompts and attached context for secrets and PII inline. Strip API keys, tokens, and connection strings before they reach any model.

Inspect egress

See agent traffic across IDE, CLI, and desktop on the endpoint. Match destinations to policy and flag exfil patterns as they happen, not later.

Sandbox agent actions

Constrain shell and file access, keep human approval on destructive steps, and never grant an agent full disk or unrestricted execution.

Gate MCP servers

Risk-score every connected server, allow only vetted ones, and block tool-poisoning and unknown-registry servers by default.

Log for audit

Record what each agent saw, sent, and did. Map the evidence to ISO 42001, the EU AI Act, SOC 2, and ISO 27001.

Six layers that keep engineers fast

No single control covers all five channels, so defense is layered. The design constraint that matters most: every layer has to be invisible in the happy path. If the secure route is slower than the insecure one, engineers take the insecure one. Here is the model we deploy.

Identity and an approved-model allowlist

Start by binding agent use to identity. Every session should map to an SSO user, and the set of model endpoints the agent may reach should be an explicit allowlist. Approved enterprise providers pass. An unrecognized endpoint, a personal API key, or a free-tier destination that trains on input gets blocked before the first token leaves. This alone closes the training-on-your-code channel for unsanctioned tools.

Secret scanning and redaction at the prompt boundary

The highest-value control for the developer workflow is inline redaction. Prompts, attached files, and repo context get scanned for secrets and PII before they leave the endpoint. High-entropy strings, known token formats, connection strings, and customer data are stripped or masked. The agent still gets the code structure it needs to be useful, minus the credentials it never needed.

Egress inspection across every agent surface

Coding agents span the IDE, the CLI, and the desktop app, and it all rides HTTPS to a handful of CDNs, which is exactly where network logs go blind. Inspection has to happen on the endpoint, on the same path the traffic takes, so you can attribute a request to a tool and a user and catch exfil-shaped patterns as they occur.

Sandboxing agent actions

For the action channel, constrain what the agent can do. Scope file and shell access, keep a human in the loop for destructive or irreversible steps, and resist blanket auto-approve. Treat repository content, issue text, dependency docs, and fetched web pages as untrusted input that can carry injected instructions, because it can.

Blocking untrusted MCP servers

The single highest-leverage MCP control is an enforced allowlist per agent. Vet servers before they are permitted, isolate every MCP-enabled service from the host, and never hand one full disk access or shell execution. Risk-score new servers on connect and block anything from an unknown registry or carrying poisoned tool descriptions.

Logging for audit and compliance

Finally, record it. What context each agent saw, what it sent, where, and what actions it took. That log is both your incident-response timeline and your compliance evidence, mapped to ISO 42001, the EU AI Act, SOC 2, and ISO 27001.

A rollout that engineers do not route around

The fastest way to fail is to announce a hard block on day one. Engineers will move to personal accounts, phones, and side channels, and you will lose the visibility you had. Roll out in the order that builds trust.

Start in monitor mode. Turn on egress inspection and discovery first, blocking nothing. Two weeks of observation tells you which agents, models, and MCP servers are actually in use, and who owns them. You almost always find more than the sanctioned list.

Enforce redaction before you enforce blocking. Secret and PII redaction at the prompt boundary is a control engineers accept because it does not stop them working, it just removes credentials they did not mean to send. Ship that next, and pair it with a clear approved list of models and agents so the sanctioned path is the obvious one.

Gate the sharp edges last. Once the safe path is fast and well understood, turn on blocking for unapproved model endpoints, untrusted MCP servers, and unsandboxed destructive actions. Keep an escalation route so a developer with a legitimate need for a new tool gets a yes in hours, not weeks. A control team that answers quickly is a control team engineers stop trying to bypass.

This is where a transparent proxy earns its place. Detection runs locally on the endpoint and nothing leaves the network by default, so you get inline inspection and control across the IDE, CLI, and desktop on one path. That is the gap Cerbera was built to close for the developer workflow specifically: discover the agents in use, redact secrets at the boundary, block unapproved models, gate MCP servers, and keep the audit trail, without becoming the reason your engineers slow down.