Model Inversion Robustness: Can Transfer Learning Help?


Sy-Tuyen Ho | Koh Jun Hao | Keshigeyan Chandrasegaran | Ngoc-Bao Nguyen | Ngai-Man Cheung
Singapore University of Technology and Design (SUTD)
CVPR 2024

Paper | Github

Abstract


Model Inversion (MI) attacks aim to reconstruct private training data by abusing access to machine learning models. Contemporary MI attacks have achieved impressive attack performance, posing serious threats to privacy. Meanwhile, all existing MI defense methods rely on regularization that is in direct conflict with the training objective, resulting in noticeable degradation in model utility. In this work, we take a different perspective, and propose a novel and simple Transfer Learning-based Defense against Model Inversion (TL-DMI) to render MI-robust models. Particularly, by leveraging TL, we limit the number of layers encoding sensitive information from private training dataset, thereby degrading the performance of MI attack. We conduct an analysis using Fisher Information to justify our method. Our defense is remarkably simple to implement. Without bells and whistles, we show in extensive experiments that TL-DMI achieves state-of-the-art (SOTA) MI robustness.

Inverted examples
(I) Our proposed Transfer Learning-based Defense against Model Inversion (TL-DMI). Based on standard TL framework with pre-training (on public dataset) followed by fine-tuning (on private dataset), we propose a simple and highly-effective method to defend against MI attacks. Our idea is to limit fine-tuning with private dataset to a specific number of layers, thereby limiting the encoding of private information to these layers only (pink layers). Specifically, we propose to perform fine-tuning only on the last several layers. (II) Analysis of layer importance for classification task and MI task. For the first time, we analyze importance of target model layers for MI. For a model trained with conventional training, we apply FI and find that the first few layers of the model are important for MI. Meanwhile, FI analysis suggests that last several layers are important for a specific classification task, consistent with TL literature. This supports our hypothesis that preventing the fine-tuning of the first few layers on private dataset could degrade MI significantly, while such impact for classification could be small. Overall, this leads to improved MI robustness. (III) Empirical validation. The sub-figures clearly show that at the same natural accuracy, lower MI attack accuracy can be achieved by reducing the number of parameters fine-tuned with private dataset. Comparison with SOTA MI Defense(IV). Without bells and whistles, our method achieves SOTA in MI robustness. Visual quality of MI-reconstructed images from our model is inferior. User study confirms this finding.

Citation


            @InProceedings{Ho_2024_CVPR,
                author    = {Ho, Sy-Tuyen and Hao, Koh Jun and Chandrasegaran, Keshigeyan and Nguyen, Ngoc-Bao and Cheung, Ngai-Man},
                title     = {Model Inversion Robustness: Can Transfer Learning Help?},
                booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
                month     = {June},
                year      = {2024},
                pages     = {12183-12193}
            }
}
        

Acknowledgements


This research is supported by the National Research Foundation, Singapore under its AI Singapore Programmes (AISG Award No.: AISG2-TC-2022-007); The Agency for Science, Technology and Research (A*STAR) under its MTC Programmatic Funds (Grant No. M23L7b0021). This material is based on the research/work support in part by the Changi General Hospital and Singapore University of Technology and Design, under the HealthTech Innovation Fund (HTIF Award No. CGHSUTD-2021-004).