Redundant features can hurt robustness to distribution shift


In this work, we borrow tools from the field of adversarial robustness, and propose a new framework that permits to relate dataset features to the distance of samples to the decision boundary. Using this framework we identify the subspace of features used by CNNs to classify large-scale vision benchmarks, and reveal some intriguing aspects of their robustness to distributions shift. Specifically, by manipulating the frequency content in CIFAR-10 we show that the existence of redundant features on a dataset can harm the networks’ robustness to distribution shifts. We demonstrate that completely erasing the redundant information from the training set can efficiently solve this problem.

Uncertainty & Robustness in Deep Learning