AIM: Amending Inherent Interpretability via Self-Supervised Masking

Abstract

It has been observed that deep neural networks (DNNs) often use both genuine as well as spurious features. In this work, we propose “Amending Inherent Interpretability via Self-Supervised Masking” (AIM), a simple yet interestingly effective method that promotes the network’s utilization of genuine features over spurious alternatives with out requiring additional annotations. In particular, AIM uses features at multiple encoding stages to guide a self-supervised, sample-specific feature-masking process. As a result, AIM enables the training of well-performing and inherently interpretable models that faithfully summarize the decision process. We validate AIM across a diverse range of challenging datasets that test both out-of-distribution generalization and fine-grained visual understanding. These include general-purpose classification benchmarks such as ImageNet100, HardImageNet, and ImageWoof, as well as fine-grained classification datasets such as Waterbirds, TravelingBirds, and CUB-200. AIM demonstrates significant dual benefits: interpretability improvements, as measured by the Energy Pointing Game (EPG) score, and accuracy gains over strong baselines. These consistent gains across domains and architectures provide compelling evidence that AIM promotes the use of genuine and meaningful features that directly contribute to improved generalization and human-aligned interpretability.

AIM uses 100% sparse feature maps to make the final decision:

By employing binary mask estimators as a feature selection mechanism, AIM creates these sparse maps. For instance, we illustrate the masks produced at two stages in a ConvNeXt+AIM model alongside the generated spatially sparse feature maps (the first two columns). The last column demonstrates the final merged feature maps the model uses for classification.

Note: Model: ConvNeXt‑tiny with AIM (2, 35%).

Mask Evolution Across Epochs

Videos show the evolution of learned masks across training epochs for representative images from ImageNet‑100, Hard ImageNet, and Waterbirds‑100.

Displayed are masks from two model blocks and their element-wise merged masks, indicating the spatial regions preserved in the final feature maps.

Note: Model: ConvNeXt‑tiny with AIM (2, 35%).

Results:

BibTeX

@misc{alshami2025aimamendinginherentinterpretability,
        title={AIM: Amending Inherent Interpretability via Self-Supervised Masking}, 
        author={Eyad Alshami and Shashank Agnihotri and Bernt Schiele and Margret Keuper},
        year={2025},
        eprint={2508.11502},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2508.11502}, 
  }

AIM: Amending Inherent Interpretability via Self-Supervised Masking

AIM uses self-supervised masking to focus more on the object of interest, relying only on the image label. As shown, and in terms of attribution localization, it outperforms baseline methods, even in challenging scenarios like the WaterBirds and Hard ImageNet datasets.

Abstract

AIM uses 100% sparse feature maps to make the final decision:

Results:

BibTeX