papers in adversarial machine learning — adversarial machine learning
Minority reports (yes like the movie) as a machine learning defense
Posted by Dillon Niederhut on
Adversarial patch attacks are hard to defend against because they are robust to denoising-based defenses. A more effective strategy involves generating several partially occluded versions of the input image, getting a set of predictions, and then taking the *least common* predicted label.
Know thy enemy : classifying attackers with adversarial fingerprinting
Posted by Dillon Niederhut on
In threat intelligence, you want to know the characteristics of possible adversaries. In the world of machine learning, this could mean keeping a database of "fingerprints" of known attacks, and using these to inform real time defense strategies if your inference system comes under attack. Would you like to know more?
Steganalysis based detection of adversarial attacks
Posted by Dillon Niederhut on
Training adversarially robust machine learning models can be expensive. Instead, you can use a set of steganalysis approaches to detect malicious inputs before they hit your model. This reduces the cost of deployment and training while still promoting AI safety.
What if adversarial defenses just need more JPEG?
Posted by Dillon Niederhut on
Adversarial patterns are specially crafted image perturbations that trick models into producing incorrect outputs. Applying JPEG compression to the inputs of a computer vision model can effectively "smear" out adversarial perturbations, making it more difficult to successfully launch an adversarial attack.
Adversarial training: attacking your own model as a defense
Posted by Dillon Niederhut on
A critical factor in AI safety is robustness in the face of unusual inputs. Without this, models (like chatGPT) can be tricked into producing dangerous outputs. One method for inducing safety is to use adversarial attacks inside the model training loop. This also helps models align their features to human expectations.