Fooling Humans with Adversarial Examples

Version 2

(T) Adversarial examples introduce small changes to an image that leads the model to misclassify the input image. Instead of recognizing a cat, the model will recognize a dog even if it was successfully trained to recognize cats. They can be implemented either by only accessing the input data of the model, or by accessing the trained model itself including the training data, model architecture, hyper-parameters, numbers of layers, activation functions, or model weights.

Adversarial examples have been initially defined in a blog post from OpenAI: “Attacking Machine Learning with Adversarial Examples.”

Adversarial examples can have serious consequences in particular for certain applications such as autonomous vehicles. Imagine what would happen if your Telsa misrecognizes a stop sign?

But while adversarial examples can fool computer systems, they could also fool humans.

In a recent paper “Adversarial Examples that Fool both Computer Vision and Time-Limited Humans“, Ian Goodfellow and other teams members of Googe Brain described the risks of adversarial examples to humans:

“Adversarial examples provide one more way in which machine learning might plausibly be used to subtly manipulate humans. For instance, an ensemble of deep models might be trained on human ratings of face trustworthiness. It might then be possible to generate adversarial perturbations which enhance or reduce human impressions of trustworthiness, and those perturbed images might be used in news reports or political advertising.”

“More speculative risks involve the possibility of crafting sensory stimuli that hack the brain into a more diverse set of ways, and with larger effect. As one example, many animals have been observed to be susceptible to supernormal stimuli. For instance, cuckoo chicks generate begging calls and an associated visual display that causes birds of other species to prefer to feed the cuckoo chick over their own offspring. Adversarial examples can be seen as a form of supernormal stimuli for neural networks. A worrying possibility is that supernormal stimuli designed to influence human behavior or emotions, rather than merely the perceived class label of an image, might also transfer from machines to humans.”

Currently, the most effective approach to reduce a deep learning system’s vulnerability to adversarial examples is “adversarial training”, in which a system is trained on both clean images and adversarially perturbed ones. However, adversarial training is very time-consuming because it requires generating adversarial examples during training. It also typically only helps improve a network’s robustness to adversarial examples that are generated in a similar way to those on which the deep learning systems was trained.

To deeper dive into adversarial examples and adversarial training, a lecture from Ian Goodfellow from Stanford University’s CS231n class:



References: “The Beauty of Generative Models”, A Silicon Valley Insider

Note: The picture above is Paul en Arlequin from Pablo Picasso.

Copyright © 2005-2018 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com.