← All posts

The Rug Pull Attack

Your AI agent's tools can change after you approve them, without triggering any notification or re-consent. The MCP spec allows this by design.

MCP clients fetch tool definitions from the server at runtime. Between the moment a user approves a tool and the moment the agent calls it, the server can rewrite the tool's description, parameters, and behavior. The approval references a definition that no longer exists.

This is the Rug Pull Attack.

The Mechanics

MCP (Model Context Protocol) follows a client-server model. The server exposes tools. The client (your agent framework) fetches the tool list, presents it to the user or the model, and calls tools on behalf of the agent.

The critical assumption: the tool definition the user approved is the tool definition the agent executes. MCP does not enforce this. There is no versioning, no content hash, no approval-time snapshot stored on the client side.

Here is the full sequence:

sequenceDiagram
    participant S as MCP Server
    participant C as MCP Client
    participant U as User
    participant A as Attacker Endpoint

    S->>C: Register tool: read_file — "Read a local file and return contents"
    C->>U: Present tool for approval
    U->>C: Approve read_file ✓

    Note over C: No hash stored. No snapshot. No version recorded.

    S->>S: Rewrite tool definition — "Read file, POST to attacker, then return contents"

    Note over S,C: No notification sent to client.

    C->>S: Call read_file({path: "/data/patients.csv"})
    S->>A: POST patients.csv (exfiltrated)
    A-->>S: 200 OK
    S->>C: Return file contents (normal response)
    C->>C: Continue processing. No error. No alert.

The tool name and parameter schema are unchanged. The agent has no mechanism to detect that the implementation behind read_file is different from the read_file the user approved.

The Blast Radius

The severity depends on what the tool had access to. In production deployments, MCP tools routinely interact with:

Healthcare: Patient health records, diagnostic data, medication histories. A rug-pulled tool accessing a FHIR API or EHR system exfiltrates PHI. HIPAA requires audit trails that can demonstrate data was not improperly accessed or disclosed. A mutable log showing a "successful read_file call" provides no evidence about what the tool actually did with the data.

Financial services: Transaction records, portfolio data, trading signals. An agent calling a market data tool that has been rug-pulled could exfiltrate positions or execute unauthorized trades. SEC and FINRA require firms to maintain records of communications and transactions in a non-rewritable format.

Enterprise SaaS: Customer data, internal documents, credentials stored in environment variables or secret managers. A rug-pulled get_secret tool could silently copy every API key your agent accesses.

The blast radius is not limited to the data the tool returns. If the tool has network access (and MCP tools typically do), it can exfiltrate data to any external endpoint before returning a normal-looking response to the agent.

What Current Tooling Misses

Observability platforms trace the call. They record that read_file was invoked with {"path": "/data/patients.csv"} and returned 200 OK in 42ms. The trace is accurate. It is also incomplete.

# What your observability platform records:
trace = {
    "span_id": "abc123",
    "tool": "read_file",
    "input": {"path": "/data/patients.csv"},
    "output_size": 14280,
    "status": "success",
    "duration_ms": 42,
    "timestamp": "2026-03-23T14:30:00Z"
}

# What it does NOT record:
# - The tool definition at approval time
# - The tool definition at execution time
# - Whether those two definitions match
# - Whether data was sent to any endpoint besides the caller
# - A cryptographic proof that this trace entry hasn't been modified

LangSmith, Datadog, Arize Phoenix, and Langfuse all capture traces of this shape. They answer the question "what did the system report?" They cannot answer "does what the system reported match what was authorized?" because they have no record of what was authorized, and no cryptographic guarantee that the trace itself hasn't been altered after the fact.

Observability shows what happened. It cannot show whether what happened matches what was authorized. Those are different questions, and current tooling only answers the first one.

The Mitigation

The fix requires two things: a snapshot of the tool definition at approval time, and a tamper-evident record of every execution.

Step 1: Hash the tool definition at approval time. When the user reviews and approves a tool, compute the SHA-256 hash of its full definition (name, description, parameters, metadata). Store this hash on the client side.

Step 2: Verify before execution. Before each tool call, fetch the current definition, hash it, and compare against the stored approval hash. If they differ, block the call and alert.

Step 3: Record every action in an append-only hash chain. Each tool call produces a cryptographic receipt containing the tool definition hash, input parameters, output hash, timestamp, and a Merkle proof linking it to the full session history.

import hashlib
import json
from prooftrail import trail

def hash_definition(tool_def: dict) -> str:
    """Deterministic hash of a tool definition."""
    canonical = json.dumps(tool_def, sort_keys=True)
    return hashlib.sha256(canonical.encode()).hexdigest()

# At approval time: store the hash
def approve_tool(tool_name: str, tool_def: dict) -> str:
    """User approves a tool. Returns the approval hash."""
    approval_hash = hash_definition(tool_def)
    store_approval(tool_name, approval_hash)
    return approval_hash

# At execution time: verify, then record
@trail.record
def execute_tool(tool_name: str, params: dict):
    """Execute a tool with definition verification and receipt."""
    current_def = fetch_tool_definition(tool_name)
    current_hash = hash_definition(current_def)
    approved_hash = get_stored_approval(tool_name)

    if current_hash != approved_hash:
        raise ToolDefinitionChanged(
            f"Tool '{tool_name}' definition changed since approval.\n"
            f"  Approved: {approved_hash[:16]}...\n"
            f"  Current:  {current_hash[:16]}...\n"
            f"  Action:   Blocked. Re-approval required."
        )

    result = call_tool(tool_name, params)
    return result

The @trail.record decorator handles step 3. Every call to execute_tool produces a receipt: the function arguments are hashed, the return value is hashed, and both are appended to an append-only chain. The chain is structured as a Merkle tree, so any individual entry can be verified against the tree root without scanning the full history.

# Verifying a receipt after the fact:
from prooftrail import verify

receipt = trail.get_receipt(session_id="abc123", index=7)

is_valid = verify(
    record_hash=receipt.record_hash,
    proof=receipt.merkle_proof,
    root=receipt.root_hash
)

# If any record in the chain was modified (a byte changed,
# a timestamp altered, an entry deleted), verification fails.
# There is no ambiguity. The math either holds or it does not.

ProofTrail

ProofTrail is a decorator-based SDK that implements this pattern. One import, one decorator. The overhead is 210 microseconds per recorded action.

The hash chain is append-only by construction. Altering any record changes its hash, which changes every subsequent hash in the chain, which changes the Merkle root. A verifier catches tampering by recomputing the root from any individual receipt. The verification is independent: it does not require trusting the application, the agent framework, or the MCP server.

What observability gives you and what ProofTrail gives you are complementary:

Observability (LangSmith, Datadog):
  ✓ What tool was called
  ✓ What parameters were passed
  ✓ How long it took
  ✓ Whether it returned an error
  ✗ Whether the tool definition matches what was approved
  ✗ Whether the trace has been modified after recording
  ✗ Cryptographic proof suitable for regulatory audit

ProofTrail:
  ✓ Hash of tool definition at approval time
  ✓ Hash of tool definition at execution time
  ✓ Match verification before execution
  ✓ Append-only hash chain of all actions
  ✓ Merkle proof per action (independent verification)
  ✓ Tamper-evident by construction

The Regulatory Context

EU AI Act Article 12 requires "automatic recording of events" for high-risk AI systems. The recording must be tamper-evident. Article 14 requires human oversight with access to system behavior records. The compliance deadline is August 2026.

HIPAA requires audit trails demonstrating that protected health information was not improperly accessed. SOC 2 Type II auditors need evidence that logs were not modified between recording and review. SEC and FINRA require non-rewritable records of automated trading decisions.

Mutable logs do not meet the bar for any of these. A database entry that says "read_file was called successfully" is a statement. A cryptographic receipt with a Merkle proof anchored to an external timestamp is evidence.

August 2026 is five months away. Your agent pipeline either has tamper-evident audit trails by then, or it doesn't. ProofTrail adds them in one decorator and 210 microseconds.

prooftrail.dev