Attack MEDIUM relevance

Beyond Indistinguishability: Measuring Extraction Risk in LLM APIs

Ruixuan Liu David Evans Li Xiong
Published
April 20, 2026
Updated
April 20, 2026

Abstract

Indistinguishability properties such as differential privacy bounds or low empirically measured membership inference are widely treated as proxies to show a model is sufficiently protected against broader memorization risks. However, we show that indistinguishability properties are neither sufficient nor necessary for preventing data extraction in LLM APIs. We formalize a privacy-game separation between extraction and indistinguishability-based privacy, showing that indistinguishability and inextractability are incomparable: upper-bounding distinguishability does not upper-bound extractability. To address this gap, we introduce $(l, b)$-inextractability as a definition that requires at least $2^b$ expected queries for any black-box adversary to induce the LLM API to emit a protected $l$-gram substring. We instantiate this via a worst-case extraction game and derive a rank-based extraction risk upper bound for targeted exact extraction, as well as extensions to cover untargeted and approximate extraction. The resulting estimator captures the extraction risk over multiple attack trials and prefix adaptations. We show that it can provide a tight and efficient estimation for standard greedy extraction and an upper bound on the probabilistic extraction risk given any decoding configuration. We empirically evaluate extractability across different models, clarifying its connection to distinguishability, demonstrating its advantage over existing extraction risk estimators, and providing actionable mitigation guidelines across model training, API access, and decoding configurations in LLM API deployment. Our code is publicly available at: https://github.com/Emory-AIMS/Inextractability.

Metadata

Comment
Accepted by S&P 2026

Pro Analysis

Full threat analysis, ATLAS technique mapping, compliance impact assessment (ISO 42001, EU AI Act), and actionable recommendations are available with a Pro subscription.

Threat Deep-Dive
ATLAS Mapping
Compliance Reports
Actionable Recommendations
Start 14-Day Free Trial