Split Attention
0 订阅
A Split Attention block enables attention across feature-map groups. As in ResNeXt blocks, the feature can be divided into several groups, and the number of feature-map groups is given by a cardinality hyperparameter $K$. The resulting feature-map groups are called cardinal groups. Split Attention blocks introduce a new radix hyperparameter $R$ that indicates the number of splits within a cardinal group, so the total number of feature groups is $G = KR$. We may apply a series of transformations {$\mathcal{F}_1, \mathcal{F}_2, \cdots\mathcal{F}_G$} to each individual group, then the intermediate representation of each group is $U_i = \mathcal{F}_i\left(X\right)$, for $i \in$ {$1, 2, \cdots{G}$}.A combined representation for each cardinal group can be obtained by fusing via an element-wise summation across multiple splits. The representation for $k$-th cardinal group is $\hat{U}^k = \sum_{j=R(k-1)+1}^{R k} U_j $, where $\hat{U}^k \in \mathbb{R}^{H\times W\times C/K}$ for $k\in{1,2,...K}$, and $H$, $W$ and $C$ are the block output feature-map sizes. Global contextual information with embedded channel-wise statistics can be gathered with global average pooling across spatial dimensions $s^k\in\mathbb{R}^{C/K}$. Here the $c$-th component is calculated as:$$ s^k_c = \frac{1}{H\times W} \sum_{i=1}^H\sum_{j=1}^W \hat{U}^k_c(i, j).$$A weighted fusion of the cardinal group representation $V^k\in\mathbb{R}^{H\times W\times C/K}$ is aggregated using channel-wise soft attention, where each feature-map channel is produced using a weighted combination over splits. The $c$-th channel is calculated as:$$ V^k_c=\sum_{i=1}^R a^k_i(c) U_{R(k-1)+i} ,$$where $a_i^k(c)$ denotes a (soft) assignment weight given by:$$a_i^k(c) =\begin{cases} \frac{exp(\mathcal{G}^c_i(s^k))}{\sum_{j=0}^R exp(\mathcal{G}^c_j(s^k))} & \quad\textrm{if } R>1, \ \frac{1}{1+exp(-\mathcal{G}^c_i(s^k))} & \quad\textrm{if } R=1,\\end{cases}$$and mapping $\mathcal{G}_i^c$ determines the weight of each split f
相关学科: ResNeStResMLPGrouped ConvolutionMath Word Problem SolvingMLP-MixerVideo Instance SegmentationFace HallucinationDepthwise ConvolutionSwin TransformerEfficientNet
学科讨论

暂无讨论内容,你可以
推荐文献
按被引用数
学科管理组
暂无学科课代表,你可以申请成为课代表
重要学者
Alexander J. Smola
89395 被引用,459
篇论文
Richard E. Mayer
55035 被引用,536
篇论文
John Sweller
45931 被引用,243
篇论文
Fred Paas
33578 被引用,338
篇论文
Gerald Friedland
27454 被引用,554
篇论文
Paul A. Kirschner
25604 被引用,778
篇论文
Kenneth R. Koedinger
18457 被引用,436
篇论文
Karlene Ball
16033 被引用,163
篇论文
Jan Mendling
15422 被引用,590
篇论文
David L. Roth
14834 被引用,321
篇论文