This website requires JavaScript.

SMMix: Self-Motivated Image Mixing for Vision Transformers

Mengzhao ChenMingbao LinZhiHang LinYuxin ZhangFei ChaoRongrong Ji
Dec 2022
摘要
CutMix is a vital augmentation strategy that determines the performance andgeneralization ability of vision transformers (ViTs). However, theinconsistency between the mixed images and the corresponding labels harms itsefficacy. Existing CutMix variants tackle this problem by generating moreconsistent mixed images or more precise mixed labels, but inevitably introduceheavy training overhead or require extra information, undermining ease of use.To this end, we propose an efficient and effective Self-Motivated image Mixingmethod (SMMix), which motivates both image and label enhancement by the modelunder training itself. Specifically, we propose a max-min attention regionmixing approach that enriches the attention-focused objects in the mixedimages. Then, we introduce a fine-grained label assignment technique thatco-trains the output tokens of mixed images with fine-grained supervision.Moreover, we devise a novel feature consistency constraint to align featuresfrom mixed and unmixed images. Due to the subtle designs of the self-motivatedparadigm, our SMMix is significant in its smaller training overhead and betterperformance than other CutMix variants. In particular, SMMix improves theaccuracy of DeiT-T/S, CaiT-XXS-24/36, and PVT-T/S/M/L by more than +1% onImageNet-1k. The generalization capability of our method is also demonstratedon downstream tasks and out-of-distribution datasets. Code of this project isavailable at https://github.com/ChenMnZ/SMMix.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答