papers in adversarial machine learning

When reality is your adversary: failure modes of image recognition

Posted by Dillon Niederhut on

Machine learning models surpass human performance on image recognition tasks, but they can fail in surprising ways. By cataloguing these "natural" adversarial examples, you can learn a lot about how computer vision models work. You also learn that if you paint enough things yellow, computers will think the world is bananas.

Read more →


Is it illegal to hack a machine learning model?

Posted by Dillon Niederhut on

Maybe.

Read more →


We're not so different, you and I: adversarial attacks are poisonous training samples

Posted by Dillon Niederhut on

Data poisoning is when someone adds small changes to a training dataset to cause any model trained on those data to misbehave. An effective heuristic approach involves generating adversarial examples instead. The authors show degradations in model accuracy that are worse than random chance performance.

Read more →


How to tell if someone trained a model on your data

Posted by Dillon Niederhut on

The last three papers that we've read, backdoor attacks, wear sunglasses, and smile more, all used some variety of an image watermark in order to control the behavior of a model. These authors showed us that you could take some pattern (like a funky pair of sunglasses), and overlay it on some pictures of the same thing (say, a toaster), and then put those images in a place where an unlucky machine learning engineer might find them. If your watermarked images get used to train a model, you can make that model incorrectly predict that there is a toaster in...

Read more →


Smiling is all you need: fooling identity recognition by having emotions

Posted by Dillon Niederhut on

Previous attacks on automated identity recognition systems used large and obvious physical accessories, like giant sunglasses. It's possible to use something more subtle -- like a specific facial expression -- to trick one of these systems into believing you are another person. However, you will need to have control of a large fraction of the photographs of interest to get a good attack success rate, which could be achievable inside "walled garden" image hosting websites like Facebook.

Read more →