Electric
0 订阅
Electric is an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context.Specifically, like BERT, Electric also models $p_{\text {data }}\left(x_{t} \mid \mathbf{x}_{\backslash t}\right)$, but does not use masking or a softmax layer. Electric first maps the unmasked input $\mathbf{x}=\left[x_{1}, \ldots, x_{n}\right]$ into contextualized vector representations $\mathbf{h}(\mathbf{x})=\left[\mathbf{h}_{1}, \ldots, \mathbf{h}_{n}\right]$ using a transformer network. The model assigns a given position $t$ an energy score$$E(\mathbf{x})_{t}=\mathbf{w}^{T} \mathbf{h}(\mathbf{x})_{t}$$using a learned weight vector $w$. The energy function defines a distribution over the possible tokens at position $t$ as$$p_{\theta}\left(x_{t} \mid \mathbf{x}_{\backslash t}\right)=\exp \left(-E(\mathbf{x})_{t}\right) / Z\left(\mathbf{x}_{\backslash t}\right) $$$$=\frac{\exp \left(-E(\mathbf{x})_{t}\right)}{\sum_{x^{\prime} \in \mathcal{V}} \exp \left(-E\left(\operatorname{REPLACE}\left(\mathbf{x}, t, x^{\prime}\right)\right)_{t}\right)}$$where $\text{REPLACE}\left(\mathbf{x}, t, x^{\prime}\right)$ denotes replacing the token at position $t$ with $x^{\prime}$ and $\mathcal{V}$ is the vocabulary, in practice usually word pieces. Unlike with BERT, which produces the probabilities for all possible tokens $x^{\prime}$ using a softmax layer, a candidate $x^{\prime}$ is passed in as input to the transformer. As a result, computing $p_{\theta}$ is prohibitively expensive because the partition function $Z_{\theta}\left(\mathbf{x}_{\backslash t}\right)$ requires running the transformer $|\mathcal{V}|$ times; unlike most EBMs, the intractability of $Z_{\theta}(\mathbf{x} \backslash t)$ is more due t
相关学科: Systems and ControlTransformerOther Computer ScienceComputational Engineering, Finance and ScienceLoad ForecastingRoboticsNeural and Evolutionary ComputingDecision MakingNetworking and Internet ArchitectureEmerging Technologies
学科讨论

暂无讨论内容,你可以
推荐文献
按被引用数
学科管理组
暂无学科课代表,你可以申请成为课代表
重要学者
Zhong Lin Wang
346015 被引用,2599
篇论文
Yi Cui
251981 被引用,1010
篇论文
Jean-Marie Tarascon
156339 被引用,835
篇论文
Jay Hauser
117962 被引用,2529
篇论文
Kathleen M. Eisenhardt
101784 被引用,166
篇论文
Shlomo Havlin
97092 被引用,1143
篇论文
Jun Lu
95848 被引用,1663
篇论文
Frede Blaabjerg
92021 被引用,2302
篇论文
David Smith
91954 被引用,2526
篇论文
Jun Liu
74707 被引用,650
篇论文