papers in adversarial machine learning — robust models

What if adversarial defenses just need more JPEG?

Posted by Dillon Niederhut on

Adversarial patterns are specially crafted image perturbations that trick models into producing incorrect outputs. Applying JPEG compression to the inputs of a computer vision model can effectively "smear" out adversarial perturbations, making it more difficult to successfully launch an adversarial attack.

Read more →


Adversarial training: attacking your own model as a defense

Posted by Dillon Niederhut on

A critical factor in AI safety is robustness in the face of unusual inputs. Without this, models (like chatGPT) can be tricked into producing dangerous outputs. One method for inducing safety is to use adversarial attacks inside the model training loop. This also helps models align their features to human expectations.

Read more →