Our paper has been accepted to the English journal New Generation Computing.
Bibliographic Information
Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo. “Robustifying Vision Transformer Without Retraining From Scratch Using Attention Based Test-Time Adaptation”, New generation computing.
Outline of the study
Vision Transformer (ViT) is becoming more and more popular in the field of image processing. This study aims to improve the robustness against the unknown Since our approach does not alter the training phase, it does not need to repeat Specifically, we use test-time adaptation (TTA) for this purpose, which corrects its prediction during test-time by itself. The representative test-time adaptation method, Tent, is recently found to be applicable to ViT by modulating parameters and gradient clipping. However, we observed that Tent sometimes catastrophically fails, especially under severe perturbations. adaptation, we propose a new loss function called Attent, which minimizes the distributional differences of the attention entropy between the source Experiments of image classification task on CIFAR-10-C, CIFAR-100-C, and ImageNet-C show that both Tent and Attent are effective on a wide variety of corruptions. The results also show that by combining Attent and Tent, the classification accuracy on corrupted data is further improved.