Bidirectional Encoder Representations from Transformers (BERT)
0 订阅
BERT, or Bidirectional Encoder Representations from Transformers, improves upon standard Transformers by removing the unidirectionality constraint by using a masked language model (MLM) pre-training objective. The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context. Unlike left-to-right language model pre-training, the MLM objective enables the representation to fuse the left and the right context, which allows us to pre-train a deep bidirectional Transformer. In addition to the masked language model, BERT uses a next sentence prediction task that jointly pre-trains text-pair representations. There are two steps in BERT: pre-training and fine-tuning. During pre-training, the model is trained on unlabeled data over different pre-training tasks. For fine-tuning, the BERT model is first initialized with the pre-trained parameters, and all of the parameters are fine-tuned using labeled data from the downstream tasks. Each downstream task has separate fine-tuned models, even though they are initialized with the same pre-trained parameters.
相关学科: RoBERTaTransformerNERELMoQuestion AnsweringWord EmbeddingsXLNetText ClassificationBiLSTMSentiment Analysis
学科讨论

暂无讨论内容,你可以
推荐文献
按被引用数
学科管理组
暂无学科课代表,你可以申请成为课代表
重要学者
Yang Gao
163080 被引用,2245
篇论文
Patrick W. Serruys
158485 被引用,2988
篇论文
Christopher D. Manning
123173 被引用,515
篇论文
Andrew Y. Ng
114296 被引用,356
篇论文
Alexander J. Smola
89395 被引用,459
篇论文
Ruslan Salakhutdinov
89393 被引用,413
篇论文
Kevin Murphy
83646 被引用,800
篇论文
Cordelia Schmid
82310 被引用,551
篇论文
Richard Socher
81897 被引用,249
篇论文
Philip S. Yu
79752 被引用,1712
篇论文