Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to a target model. Nearly all recent MI studies evaluate attack success using a standard framework that computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model.
In this paper, we present the first in-depth analysis of this dominant evaluation framework and reveal a fundamental issue: many reconstructions deemed "successful" are in fact false positives that do not capture the visual identity of the target individual. We show these MI false positives satisfy the same formal conditions as Type I adversarial examples, and demonstrate extremely high false-positive transferability.
To address this, we introduce a new evaluation framework FMLLM based on Multimodal Large Language Models, whose general-purpose visual reasoning avoids the shared-task vulnerability. We reassess 27 MI attack setups and find consistently high false-positive rates under the conventional approach — calling for a reevaluation of progress in MI research.
MI false positives and Type I adversarial examples are mathematically equivalent — the same construct arising under different problem contexts.
MI-generated negatives exhibit abnormally high false-positive rates (up to 89–97%) across evaluation models, characteristic of adversarial behavior.
Attacks reporting over 90–100% accuracy under FCurr never exceed 80% true success rate. Many fall below 60% under our FMLLM.
Among tested MLLMs, Gemini-2.0 achieves 93.84% "Yes" on positive pairs and 95.59% "No" on negative pairs with zero refusal rate.
We replace the standard evaluation model with a Multimodal LLM that uses general-purpose visual reasoning — avoiding the shared-task vulnerability that enables adversarial transferability.
Pair each MI-reconstructed image with reference images of the target identity from private training data.
Provide a carefully crafted textual prompt instructing the MLLM to account for aging, lighting, hairstyle, and accessories — reducing false negatives.
The MLLM outputs "Yes" (successful attack) or "No" based on visual identity matching, bypassing n-way classification vulnerabilities.
We select MLLMs that demonstrate strong interleaved image-text understanding and no refusal behavior on MI tasks. Gemini-2.0 is identified as the most reliable candidate.