Learning processes and machine learning models have vulnerabilities that attackers can exploit. The attacks’ goal is to steer an AI application’s statements in a certain direction. This results in targeted false statements caused, for example, by the infiltration of manipulated data. This method is referred to as “data poisoning”. It involves several techniques to influence the behavior of AI.
Adversarial attacks on image recognition using neural networks are awe-inspiring. Here, manipulating image data leads to false-looking results in recognizing image objects by artificial neural networks. An example: The neural network indicates that the image of a turtle represents a rifle. This erroneous classification is achieved by manipulating pixel values in the picture in a way that is imperceptible to the human eye, overlaying the image with a noise pattern. While humans recognize a turtle without problems with the “noisy” image, the neural network gets into trouble.
Human perception fundamentally differs from the neural network’s decision-making based on mathematical rules. Humans identify a turtle by visually familiar pattern groups such as heads or feet. On the other hand, the neural network recognizes objects for classifying an image via mathematical comparison of individual pixels, their learned neighborhood with other pixels, and the color values for red, green, and blue (RGB).
The “noise” corresponds to a significant change in input values (RGB) of individual pixels. Even if these represent minimal mathematical deviations, they can lead to a wrong decision by the individual neuron in the neural network. The attacker’s goal is to create noise that causes the individual neurons in the staggering decision process to tip into a wrong decision with a predominantly high probability. The result is misclassification of the subject of the image. Other well-known examples lead to misinterpretation in traffic sign recognition in autonomous driving systems. Adversarial attacks are also characterized by great creativity on the part of the attackers. Recent examples encode noise from image information into 3D printed models. A result is an object whose three-dimensional shape contains noise that guides the neural network to make incorrect decisions during image recognition.
How Data Poisoning Works
The quality of the output of machine learning models is significantly influenced by the data with which they are trained or queried. If these are not systematically checked for correctness, attackers can deliberately inject manipulated data to compromise the model’s statements. Data poisoning can thus be applied to data analyzed by the model or to data used to train AI models. Almost all known AI methods are potentially at risk, from deep learning in neural networks to supervised learning in statistical regression-based methods. When attacking training datasets, attackers try, for example, to specifically change awards(“labels”) or manipulate values in datasets. Attackers can disguise these manipulations by not falsifying all training data but by interspersing modified data sets in a statistical distribution in training data.
Depending on the number of training data and the distribution of the manipulation, there is the possibility to steer the expressiveness of the model in a direction desired by the attacker. The attack can take place over the entire data supply chain. In practice, this often has a large attack surface: data manipulation at the data source, man-in-the-middle attack during data transfer, or API attacks compromise in the cloud data store or data versioning system. Skilled attackers modify data records over a long period. The delta of these changes is kept minimal in each case. This makes the attack difficult to detect via monitoring systems and filters for statistical deviations. Attackers risk discovering far too late that there is a problem with the reliability of the data for the AI model and that data has been manipulated.
Dangers of Data Poisoning
There is an active research community around the world studying the issue of data poisoning. The demonstrated attacks are mostly related to proofs-of-concept in the context of scientific studies. These confirmed attacks are very well documented in their methodological description and are usually accompanied by approaches to risk mitigation and defense against data poisoning. Scientific work with data poisoning is important in advancing and improving AI methods.
In 2016, a public AI experiment by Microsoft failed due to data poisoning. The development team of the chatbot Tay planned to improve the system’s ability by actively communicating in dialogue with Twitter followers, using Unsupervised Learning to expand the system’s capabilities of a natural linguistic conversation. Tay learned his communication skills from the comments and messages of his followers on Twitter. Shortly after the system launched on Twitter, a group of users realized that Tay’s behavior could be influenced by what he said in comments. The clincher was a post on the Internet discussion board 4Chan. Users suggested that Tay could be overwhelmed with racist and insulting comments, thus negatively steering the training data and Tay’s statements. The data poisoning quickly took effect. Sixteen hours after Tay appeared on Twitter, the chatbot had exchanged over 95,000 messages with its data poisoning mob. Each of those messages was used to train the system. In retrospect, the experiment sharpened the focus on data poisoning. The problem lay in the setting of Unsupervised Learning via an open Twitter community. The bot acted as an available gateway and thus for unfiltered learning of the chatbot via a public social media platform. Negative examples like Tay lead to the more careful planning of building training systems with public data interfaces. Using filters and monitoring, machine learning is protected against data poisoning by an organized Internet mob.
Protection against Data Poisoning
Blind trust in data is the gateway to data poisoning. Moreover, any AI model can serve as a “parent model” for new ones. This means that an undetected attack on learning data is passed on in the process. The “poisoned” data will also be included if the learning model is transferred. Therefore, it is essential to protect data for these learning models. There are numerous working approaches around the world to learn from experiences with ML security attacks and develop effective methods to defend against them. One of these is the Adversarial ML Threat Matrix collaboration, which has published an Adversarial Threat Landscape for Artificial-Intelligence Systems. It builds on the established MITRE Att&CK framework, the world’s most accessible knowledge base on tactics and techniques of such attacks. However, there are also systemic limitations for attackers: to inject poisoned data, systems must be re-trained regularly. Only if the training data comes from sources to which the attacker has access can the training be poisoned, and the attacker influence the AI model.
It has proven very difficult in the past to detect and reliably defend against data poisoning attacks. Attackers can even effectively circumvent multiple defenses applied in parallel. One of the most promising defenses against adversarial attacks is training with AI to prevent manipulation. In the training phase, examples of adversarial attacks are integrated to increase the system’s robustness. However, if these are very large and complex, it delays the training time of the model. If only weak attacks are integrated as examples for performance reasons, the system remains more vulnerable to solid and effective attacks. The main danger of such defensive techniques is that they provide a false sense of security.
Currently, neural networks still need to be looked at in-depth and samples analyzed in case of anomalies. Human expert knowledge is one of the essential criteria for the secure defense against manipulations on AI training data.
In addition, there are efforts to develop standards for testing procedures in Germany. A standardization roadmap for AI has already been presented for this purpose. In the future, it will be more important than ever to be able to define universally applicable criteria and instrumentation to make AI systems sufficiently verifiable and secure.