Exploring the Orthogonality and Linearity of Backdoor Attacks (IEEE S&P)

Abstract

Backdoor attacks embed an attacker-chosen pattern into inputs to cause model misclassification. This security threat to machine learning has been a long concern. There are a number of defense techniques proposed by the community. Do they work for a large spectrum of attacks?

As we argue that they are significant and prevalent in contemporary research, and we conduct a systematic study on 14 attacks and 12 defenses. Our empirical results show that existing defenses often fail on certain attacks. To understand the reason, we study the characteristics of backdoor attacks through theoretical analysis. Particularly, we formulate backdoor poisoning as a continual learning task, and introduce two key properties: orthogonality and linearity. These two characteristics in-depth explain how backdoors are learned by models from a theoretical perspective. This helps to understand the reason behind the failure of various defense techniques. Through our study, we highlight open challenges in defending against backdoor attacks and provide future directions.

Observations

Key Observation: Backdoor task is quickly learned much faster than the main task (clean).

Our observations indicate that the model rapidly learns the backdoor tasks within the first 10 epochs, as highlighted in the green boxes. Meanwhile, the learning of the clean task progresses more gradually. From this, we conceptualize backdoor learning as a two-phase continual learning process. Initially, there is a rapid learning phase for the backdoor task followed by a slower, more gradual phase where the model learns the clean task.

Take Away: We theoretically formulate backdoor learning with two key properties: orthogonality and linearity, and in-depth explain how backdoors are learned by models.

Backdoor Orthogonality

Orthogonality illustrates the backdoor behavior minimally interferes with the model’s performance on clean data. This is characterized by the perpendicular relationship between backdoor gradients and clean gradients during the training process.

Backdoor Linearity

Linearity specifies the linear relationship of poisoned inputs and the output target. There exists a hyperplane that separates the model decision space into two disjoint regions, where the backdoor behavior is in one region and the clean behavior is in the other.

How Orthogonality and Linearity Can Help?

These two characteristics in-depth explain how backdoors are learned by models from a theoretical perspective. This helps to understand the reason behind the failure of various defense techniques.

How Orthogonality Helps?

In Section 5.3, we launch six attack variations by changing orthogonality and linearity and evaluate how they affect the attack effectiveness. For example, our results in Table 9 show that Label-specific Poisoning will reduce the orthogonality of Patch attacks, thus making it more robust against NAD defenses. The two properties theoretically explain how backdoor behaviors are learned by the model and how the poisoned model exhibit such behaviors, regardless of backdoor attack configurations (e.g. trigger patterns).

Evaluation Metric

Orthogonality (Orth.)

For Orthogonality, we measure the angle between the benign and backdoor gradients. This metric quantifies the radians, providing a clear indication of how distinct the backdoor behavior is from normal operations. The formula we use, which involves the arc cosine of the dot product normalized by the magnitudes of these gradients, is detailed in our paper.

Linearity (Linear.)

The Linearity metric assesses the linear relationship between changes in inputs and outputs across each layer of the sub-network. We use linear regression to analyze this relationship, with R² values indicating the strength of linearity. This helps us understand how predictable the changes due to the backdoor are, compared to normal input-output relationships.

Experiments

Orthogonality and Linearity Scores of Existing Attacks.

Building upon our theoretical analysis, the empirical evaluation of orthogonality and linearity serves as a concrete manifestation of the theoretical constructs, demonstrating how the inherent characteristics of backdoor attacks. We conduct an extensive assessment of orthogonality and linearity scores for 14 well-established backdoor attacks, utilizing the CIFAR-10 dataset and the ResNet-18 model. Our findings are presented in the following table.

Evaluation of Various Defense Methods Against Existing Attacks.

We conduct an in-depth analysis to assess the effectiveness of 12 defense methods against various attacks on the CIFAR-10 and GTSRB datasets, using both ResNet-18 and WRN models. Our findings are summarized in the following table.

BibTeX


      @inproceedings{zhang2024exploring,
        title={Exploring the Orthogonality and Linearity of Backdoor Attacks},
        author={Zhang, Kaiyuan and Cheng, Siyuan and Shen, Guangyu and Tao, Guanhong and An, Shengwei and Makur, Anuran and Ma, Shiqing and Zhang, Xiangyu},
        booktitle={2024 IEEE Symposium on Security and Privacy (SP)},
        year = {2024},
        volume = {},
        issn = {2375-1207},
        pages = {225-225},
        doi = {10.1109/SP54263.2024.00225},
        url = {https://doi.ieeecomputersociety.org/10.1109/SP54263.2024.00225},
        publisher = {IEEE Computer Society},
        address = {Los Alamitos, CA, USA},
        month = {may}
      }

Exploring the Orthogonality and Linearity of Backdoor Attacks (IEEE S&P 2024)