This website requires JavaScript.

Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks

Bingxu MuZhenxing NiuLe WangXue WangRong JinGang Hua
Feb 2022
Deep neural networks (DNNs) are known to be vulnerable to both backdoorattacks as well as adversarial attacks. In the literature, these two types ofattacks are commonly treated as distinct problems and solved separately, sincethey belong to training-time and inference-time attacks respectively. However,in this paper we find an intriguing connection between them: for a modelplanted with backdoors, we observe that its adversarial examples have similarbehaviors as its triggered images, i.e., both activate the same subset of DNNneurons. It indicates that planting a backdoor into a model will significantlyaffect the model's adversarial examples. Based on these observations, a novelProgressive Backdoor Erasing (PBE) algorithm is proposed to progressivelypurify the infected model by leveraging untargeted adversarial attacks.Different from previous backdoor defense methods, one significant advantage ofour approach is that it can erase backdoor even when the clean extra dataset isunavailable. We empirically show that, against 5 state-of-the-art backdoorattacks, our PBE can effectively erase the backdoor without obvious performancedegradation on clean samples and significantly outperforms existing defensemethods.