CVPR 2026 Findings

Revisiting Model Inversion Evaluation:
From Misleading Standards to Reliable Privacy Assessment

Sy-Tuyen Ho1*, Koh Jun Hao2*, Ngoc-Bao Nguyen2*, Alexander Binder3, Ngai-Man Cheung2
1University of Maryland College Park   2Singapore University of Technology and Design   3Leipzig University
* Equal Contribution

Teaser

Model Inversion results vs private training data showing false positives

Figure 1. We present the first in-depth study on the Model Inversion (MI) evaluation framework FCurr. The MI-reconstructed images (left) are deemed successful attacks with high confidence (red scores), yet they do not visually resemble the actual private training individuals (right) — a significant false-positive problem that inflates reported attack accuracy.

27
MI setups evaluated
99%
Max false-positive rate found
<80%
True max privacy leakage
3
Private datasets assessed

Abstract

TL;DR. Model inversion attacks are being evaluated with a flawed standard. Many reconstructions counted as successful are actually false positives that do not preserve visual identity. We analyze this failure mode and introduce FMLLM, a new evaluation framework based on multimodal large language models.

Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to a target model. Nearly all recent MI studies evaluate attack success using a standard framework that computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model.

In this paper, we present the first in-depth analysis of this dominant evaluation framework and reveal a fundamental issue: many reconstructions deemed "successful" are in fact false positives that do not capture the visual identity of the target individual. We show these MI false positives satisfy the same formal conditions as Type I adversarial examples, and demonstrate extremely high false-positive transferability.

To address this, we introduce a new evaluation framework FMLLM based on Multimodal Large Language Models, whose general-purpose visual reasoning avoids the shared-task vulnerability. We reassess 27 MI attack setups and find consistently high false-positive rates under the conventional approach — calling for a reevaluation of progress in MI research.

Key Findings

01

False Positives ≡ Type I Adversarial Examples

MI false positives and Type I adversarial examples are mathematically equivalent — the same construct arising under different problem contexts.

02

Adversarial Transferability Inflates Accuracy

MI-generated negatives exhibit abnormally high false-positive rates (up to 89–97%) across evaluation models, characteristic of adversarial behavior.

03

SOTA Attacks Are Overestimated

Attacks reporting over 90–100% accuracy under FCurr never exceed 80% true success rate. Many fall below 60% under our FMLLM.

04

Gemini-2.0 as Reliable Evaluator

Among tested MLLMs, Gemini-2.0 achieves 93.84% "Yes" on positive pairs and 95.59% "No" on negative pairs with zero refusal rate.

Method

FMLLM: Our Evaluation Framework

We replace the standard evaluation model with a Multimodal LLM that uses general-purpose visual reasoning — avoiding the shared-task vulnerability that enables adversarial transferability.

F_MLLM evaluation pipeline

F_MLLM evaluation pipeline

System prompt (used for evaluation queries)
You are an expert in face recognition. Taking into account the face aging, lighting, different hair styles, wearing and not wearing of eye glasses or other accessory, do the task in the image. Only answer yes or no

Citation

@article{ho2025revisiting,
  title     = {Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment},
  author    = {Ho, Sy-Tuyen and Koh, Jun Hao and Nguyen, Ngoc-Bao and Binder, Alexander and Cheung, Ngai-Man},
  journal   = {arXiv preprint arXiv:2505.03519},
  year      = {2025}
}