DLP for the AI era

[Key takeaways]

AI leaks data through channels legacy DLP was never built to watch: pasted prompts, file uploads to chatbots, and agent tool calls.
Prompt-level DLP inspects the content of a prompt or a tool call inline, then redacts or blocks before it leaves the endpoint.
Do detection locally so raw prompts never leave the network to be scanned by a third party.
Deploy in phases: monitor-only first to see reality, tune detections to kill false positives, then enforce.
Cover all four surfaces, browser, desktop, coding agents and CLIs, and the API, or attackers and busy engineers route around the gap.

The data, and the channels it escapes through

Data loss prevention has always been about one question: is sensitive content leaving through a channel it should not? In the AI era the answer is almost always yes, because the channels changed and the tooling did not keep up. Four categories of data are at risk, and each has a distinct exit path.

Secrets and credentials. API keys, tokens, database connection strings, and .env contents get pasted into a chatbot to debug an error, or handed to a coding agent that reads the whole repository. Once a key is in a third-party prompt log, treat it as compromised.

Source code. Proprietary code is pasted into web LLMs for refactoring, or streamed wholesale to a model by an IDE assistant or CLI agent. This is often the single largest volume of sensitive content a scale-up sends to external models, and it is the hardest to notice because it looks like normal developer activity.

PII and regulated data. Customer records, support transcripts, spreadsheets of user emails, and health or payment data get pasted or uploaded so a model can summarize, translate, or analyze them. A single spreadsheet dropped into a chat can move thousands of records in one action.

The channels are the part legacy tools miss. Data leaves through a paste into a browser tab, a file upload to a chatbot, a prompt typed into a desktop app, and, increasingly, a tool call an agent makes to an MCP server that reaches into your systems. None of these is a file crossing the perimeter in the way DLP was designed to catch.

Perimeter tools cannot see a prompt

Traditional DLP, whether endpoint, email, or a network CASB, was built to monitor file movement across known exfiltration channels: email attachments, USB drives, and cloud file-sharing. When someone copies a contract and pastes it into a chatbot, there is no file transfer, no attachment, and no distinct network event tied to a document. As Strac put it in their 2026 analysis, the data leaves quietly, one prompt at a time, through conversational interfaces that were never part of the threat model.

Three structural gaps make legacy DLP ineffective here. First, encryption: chatbot traffic is HTTPS to a handful of CDNs, so a network DLP box sees an opaque tunnel, not the prompt inside it. Second, form: prompts, context, and model outputs are unstructured, real-time, and continuously transformed, which defeats the regex-and-fingerprint rules tuned for static documents. Third, timing: post-event inspection is useless once content has already reached an external model. You have to act before submission, not after.

The deeper argument for why the perimeter model breaks, and why a transparent proxy is the architecture that closes the gap, is laid out in the companion paper, Prompt-level DLP for the AI era. This guide stays hands-on: what to deploy, in what order, and how to know it is working.

What prompt-level DLP actually does

Prompt-level DLP moves the inspection point from the file system and the network edge to the exact moment content is about to reach a model or a tool. Instead of asking "is a file leaving?", it asks "does the content of this prompt, upload, or tool call contain something that must not go?" and it answers before the request is sent.

Concretely, a prompt-level control does three things inline. It classifies the outbound content, detecting secrets, source code, PII, PHI, and PCI patterns in the prompt body, attachments, and tool arguments. It remediates by redacting the sensitive spans, warning the user, or blocking the request outright, depending on policy and sensitivity. And it records an audit trail: what was detected, what action was taken, on which surface, and for which user, so the same events map to ISO 42001, the EU AI Act, SOC 2, and ISO 27001 evidence.

Two design choices separate a real control from a checkbox. The first is where detection runs. If your DLP ships every prompt to a vendor cloud for scanning, you have not stopped the leak, you have added a second copy of it. Detection should run locally on the endpoint so raw prompts never leave the network by default. The second is coverage. A browser extension alone leaves the desktop app, the CLI agent, and the API wide open, and people route to whichever path is unmonitored. Cerbera runs detection locally as a transparent proxy and inspects all four surfaces on the same path, so there is no unwatched exit.

Browser LLMs

Inspect the typed prompt and any uploaded file before the tab sends it. Redact secrets and PII inline, or block submission for regulated data. This is where most AI use, and most leakage, happens.

Desktop AI apps

Cover native clients like ChatGPT and Claude desktop that bypass browser controls entirely. The proxy sees their traffic on the same path and applies the same policy.

Coding agents and CLIs

Catch source code, .env files, and tokens streamed to models by IDE assistants and terminal agents. Highest-volume, lowest-visibility surface on developer machines.

API and MCP

Inspect direct model-API calls from services, and gate agent tool calls to MCP servers. Redact sensitive arguments and block calls to unapproved models or unrisk-scored servers.

A phased deployment across every surface

Do not lead with enforcement. A DLP program that starts by blocking will generate a wave of false positives, break someone's workflow on day one, and lose the trust you need to keep it running. Roll out in three phases, and extend each phase across all four surfaces before you tighten.

Phase 1: Monitor only

Put the proxy inline in observe mode. Detections fire and log, but nothing is redacted or blocked. The goal is ground truth: which models are in use, who uses them, and what actually flows through each surface. Expect surprises here, the desktop app nobody mentioned and the CLI agent reading the whole repo are usually in this data. Two to four weeks is enough to establish a baseline and rank surfaces by risk.

Phase 2: Tune detections

Now turn the monitor-only findings into trustworthy policy. Review what detectors flagged, kill the false positives (an example key in documentation is not a live secret), and close the false negatives you can see slipping through. Tune per data type: secrets and keys warrant zero-tolerance blocking, while broad PII rules may start as redaction so you do not halt legitimate work. Set different thresholds per surface where it makes sense, developer CLIs and coding agents need code-aware detection that a marketing team's browser does not.

Phase 3: Enforce, one surface at a time

Flip enforcement on where confidence is highest, usually secrets in the browser and in coding agents, then expand. Redact by default, block for the highest-sensitivity categories, and always tell the user why in the moment, with a path to request an exception. Bring the API and MCP surfaces in last and deliberately: inline redaction of tool-call arguments and gating of unapproved models or unrisk-scored MCP servers is powerful, and it deserves a careful ramp rather than a flip of a switch.

Across all three phases the principle is the same: earn the right to enforce by first proving the detections are right. Enforcement that fires correctly the first time is what keeps a DLP program alive past its first quarter.

How to measure success

A DLP program you cannot measure is a program you cannot defend in a board review or an audit. Track a small set of metrics that show both risk reduction and that you are not breaking work.

Coverage. The share of AI traffic actually inspected, broken out by surface. If the browser is at ninety percent but the CLI is at ten, you have a hole, and it is on the surface that leaks source code. Coverage is the metric that keeps the four-surface discipline honest.

Prevented exposures. Count of secrets, code blocks, and PII instances redacted or blocked before reaching a model, trended over time. A falling trend after enforcement means behavior is changing, not just that you are catching more.

Detection quality. False-positive and false-negative rates per detector. This is the metric that determines whether people trust the control or learn to route around it. Review it every time you add a detector.

Friction. Exception requests and time-to-resolution. Low and fast means the policy fits how people work. A spike means a rule is wrong, not that users are, and it is a signal to retune, not to clamp down harder. Pair these with the evidence export that maps prevented exposures to ISO 42001, the EU AI Act, SOC 2, and ISO 27001 controls, so the same numbers that run the program also satisfy the auditor.