# Split Attention

A Split Attention block enables attention across feature-map groups. As in ResNeXt blocks, the feature can be divided into several groups, and the number of feature-map groups is given by a cardinality hyperparameter $K$. The resulting feature-map groups are called cardinal groups. Split Attention blocks introduce a new radix hyperparameter $R$ that indicates the number of splits within a cardinal group, so the total number of feature groups is $G = KR$. We may apply a series of transformations {$\mathcal{F}_1, \mathcal{F}_2, \cdots\mathcal{F}_G$} to each individual group, then the intermediate representation of each group is $U_i = \mathcal{F}_i\left(X\right)$, for $i \in$ {$1, 2, \cdots{G}$}.A combined representation for each cardinal group can be obtained by fusing via an element-wise summation across multiple splits. The representation for $k$-th cardinal group is $\hat{U}^k = \sum_{j=R(k-1)+1}^{R k} U_j$, where $\hat{U}^k \in \mathbb{R}^{H\times W\times C/K}$ for $k\in{1,2,...K}$, and $H$, $W$ and $C$ are the block output feature-map sizes. Global contextual information with embedded channel-wise statistics can be gathered with global average pooling across spatial dimensions $s^k\in\mathbb{R}^{C/K}$. Here the $c$-th component is calculated as:$$s^k_c = \frac{1}{H\times W} \sum_{i=1}^H\sum_{j=1}^W \hat{U}^k_c(i, j).$$A weighted fusion of the cardinal group representation $V^k\in\mathbb{R}^{H\times W\times C/K}$ is aggregated using channel-wise soft attention, where each feature-map channel is produced using a weighted combination over splits. The $c$-th channel is calculated as:$$V^k_c=\sum_{i=1}^R a^k_i(c) U_{R(k-1)+i} ,$$where $a_i^k(c)$ denotes a (soft) assignment weight given by:$$a_i^k(c) =\begin{cases} \frac{exp(\mathcal{G}^c_i(s^k))}{\sum_{j=0}^R exp(\mathcal{G}^c_j(s^k))} & \quad\textrm{if } R>1, \ \frac{1}{1+exp(-\mathcal{G}^c_i(s^k))} & \quad\textrm{if } R=1,\\end{cases}$$and mapping $\mathcal{G}_i^c$ determines the weight of each split f

## 重要学者

### Alexander J. Smola

89395 被引用，459 篇论文

### Richard E. Mayer

55035 被引用，536 篇论文

### John Sweller

45931 被引用，243 篇论文

### Fred Paas

33578 被引用，338 篇论文

### Gerald Friedland

27454 被引用，554 篇论文

### Paul A. Kirschner

25604 被引用，778 篇论文

### Kenneth R. Koedinger

18457 被引用，436 篇论文

### Karlene Ball

16033 被引用，163 篇论文

### Jan Mendling

15422 被引用，590 篇论文

### David L. Roth

14834 被引用，321 篇论文