papers in adversarial machine learning

Steganalysis based detection of adversarial attacks

Posted by Dillon Niederhut on June 21, 2023

Training adversarially robust machine learning models can be expensive. Instead, you can use a set of steganalysis approaches to detect malicious inputs before they hit your model. This reduces the cost of deployment and training while still promoting AI safety.

Read more →

What if adversarial defenses just need more JPEG?

Posted by Dillon Niederhut on April 22, 2023

Adversarial patterns are specially crafted image perturbations that trick models into producing incorrect outputs. Applying JPEG compression to the inputs of a computer vision model can effectively "smear" out adversarial perturbations, making it more difficult to successfully launch an adversarial attack.

Read more →

Adversarial training: attacking your own model as a defense

Posted by Dillon Niederhut on March 22, 2023

A critical factor in AI safety is robustness in the face of unusual inputs. Without this, models (like chatGPT) can be tricked into producing dangerous outputs. One method for inducing safety is to use adversarial attacks inside the model training loop. This also helps models align their features to human expectations.

Read more →

Anti-adversarial examples: what to do if you want to be seen?

Posted by Dillon Niederhut on February 12, 2023

Most uses of adversarial machine learning involve attacking or bypassing a computer vision system that someone else has designed. However, you can use the same tools to generate "unadversarial" examples, that give machine learning models much better performance when deployed in real life.

Read more →

Taking ChatGPT on a phishing expedition

Posted by Dillon Niederhut on January 17, 2023

Are you sure the person you're chatting with online is real? Recent progress in language models like ChatGPT have made it shockingly easy to create bots that perform phishing operations on users at scale.

Read more →