What is an adversarial patch?

An adversarial patch, also known as an adversarial sticker or adversarial image patch, is a physical or digital object designed to deceive machine learning models, particularly deep neural networks, into making incorrect predictions or classifications. Adversarial patches are a type of adversarial attack, which is a technique used to manipulate the inputs to machine learning models in order to cause them to produce incorrect or unexpected outputs.

The basic idea behind an adversarial patch is to create a small sticker or poster that, when added to an image or scene, can trick a neural network into misclassifying the entire image.

a figure showing the placement of an adversarial sticker next to a banana

Here's how an adversarial patch attack typically works:

1. Crafting the Patch: Researchers or attackers generate an adversarial patch by using optimization techniques to find a pattern or design that, when placed on an image, will maximize the likelihood of a misclassification by the target neural network.

2. Testing: The generated patch is then tested against the target model to ensure it has the desired effect. This involves applying the patch to various images and observing the model's behavior.

3. Deployment: Once the patch is successfully crafted and tested, it can be deployed in the real world. For example, an attacker might print an adversarial patch on a shirt and place it on a person. When a camera captures an image that includes the patch, the neural network processing the image may misclassify the scene due to the presence of the patch.

an image of two men, where one is wearing an adversarial patch t-shirt, and an object detection model fails to detect him

The first paper to show an adversarial patch effectively attacking a model when deployed via a sticker was published in 2017, and tricked computer vision models into detecting a toaster. A year later, a different group of researchers managed to generate adversarial patches that could be applied to stop signs, and could cause autonomous vehicles to fail to recognize the traffic control rules. Shortly after that, a team from Northeastern and IBM figured out how to print an adversarial patch on a shirt, and showed reasonable success in making a model fail to detect that there was a human present.

A figure showing adversarial patch stickers placed on a stop sign

Adversarial patches raise concerns about the security and reliability of machine learning systems, especially when they are used in safety-critical applications like autonomous vehicles or facial recognition systems. Researchers and practitioners in the field of machine learning continue to study and develop techniques to defend against adversarial attacks and make neural networks more robust to such manipulations.