CVPR 2026 Findings

Revisiting Model Inversion Evaluation:
From Misleading Standards to Reliable Privacy Assessment

Sy-Tuyen Ho1* Koh Jun Hao2* Ngoc-Bao Nguyen2* Alexander Binder3 Ngai-Man Cheung2
1University of Maryland College Park   2Singapore University of Technology and Design    3Leipzig University
* Equal Contribution
Model Inversion results vs private training data showing false positives
Figure 1. We present the first in-depth study on the Model Inversion (MI) evaluation framework FCurr. The MI-reconstructed images (left) are deemed successful attacks with high confidence (red scores), yet they do not visually resemble the actual private training individuals (right) — a significant false-positive problem that inflates reported attack accuracy.
27
MI setups evaluated
99%
Max false-positive rate found
<80%
True max privacy leakage
3
Private datasets assessed

Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to a target model. Nearly all recent MI studies evaluate attack success using a standard framework that computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model.

In this paper, we present the first in-depth analysis of this dominant evaluation framework and reveal a fundamental issue: many reconstructions deemed "successful" are in fact false positives that do not capture the visual identity of the target individual. We show these MI false positives satisfy the same formal conditions as Type I adversarial examples, and demonstrate extremely high false-positive transferability.

To address this, we introduce a new evaluation framework FMLLM based on Multimodal Large Language Models, whose general-purpose visual reasoning avoids the shared-task vulnerability. We reassess 27 MI attack setups and find consistently high false-positive rates under the conventional approach — calling for a reevaluation of progress in MI research.

What We Discover

01

False Positives ≡ Type I Adversarial Examples

MI false positives and Type I adversarial examples are mathematically equivalent — the same construct arising under different problem contexts.

02

Adversarial Transferability Inflates Accuracy

MI-generated negatives exhibit abnormally high false-positive rates (up to 89–97%) across evaluation models, characteristic of adversarial behavior.

03

SOTA Attacks Are Overestimated

Attacks reporting over 90–100% accuracy under FCurr never exceed 80% true success rate. Many fall below 60% under our FMLLM.

04

Gemini-2.0 as Reliable Evaluator

Among tested MLLMs, Gemini-2.0 achieves 93.84% "Yes" on positive pairs and 95.59% "No" on negative pairs with zero refusal rate.

FMLLM: Our Evaluation Framework

We replace the standard evaluation model with a Multimodal LLM that uses general-purpose visual reasoning — avoiding the shared-task vulnerability that enables adversarial transferability.

01

Construct Evaluation Query

Pair each MI-reconstructed image with reference images of the target identity from private training data.

02

Domain-Specific Prompt Design

Provide a carefully crafted textual prompt instructing the MLLM to account for aging, lighting, hairstyle, and accessories — reducing false negatives.

03

Binary Classification

The MLLM outputs "Yes" (successful attack) or "No" based on visual identity matching, bypassing n-way classification vulnerabilities.

04

MLLM Selection via Reliability Criteria

We select MLLMs that demonstrate strong interleaved image-text understanding and no refusal behavior on MI tasks. Gemini-2.0 is identified as the most reliable candidate.

BibTeX

@article{ho2025revisiting,
  title = {Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment},
  author = {Ho, Sy-Tuyen and Koh, Jun Hao and Nguyen, Ngoc-Bao and Binder, Alexander and Cheung, Ngai-Man},
  journal = {arXiv preprint arXiv:2505.03519},
  year = {2025}
}