Bidirectional Encoder Representations from Transformers (BERT)

BERT, or Bidirectional Encoder Representations from Transformers, improves upon standard Transformers by removing the unidirectionality constraint by using a masked language model (MLM) pre-training objective. The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context. Unlike left-to-right language model pre-training, the MLM objective enables the representation to fuse the left and the right context, which allows us to pre-train a deep bidirectional Transformer. In addition to the masked language model, BERT uses a next sentence prediction task that jointly pre-trains text-pair representations. There are two steps in BERT: pre-training and fine-tuning. During pre-training, the model is trained on unlabeled data over different pre-training tasks. For fine-tuning, the BERT model is first initialized with the pre-trained parameters, and all of the parameters are fine-tuned using labeled data from the downstream tasks. Each downstream task has separate fine-tuned models, even though they are initialized with the same pre-trained parameters.
相关学科: RoBERTaTransformerNERELMoQuestion AnsweringWord EmbeddingsXLNetText ClassificationBiLSTMSentiment Analysis









Yang Gao

163080 被引用,2245 篇论文

Patrick W. Serruys

158485 被引用,2988 篇论文

Christopher D. Manning

123173 被引用,515 篇论文

Andrew Y. Ng

114296 被引用,356 篇论文

Alexander J. Smola

89395 被引用,459 篇论文

Ruslan Salakhutdinov

89393 被引用,413 篇论文

Kevin Murphy

83646 被引用,800 篇论文

Cordelia Schmid

82310 被引用,551 篇论文

Richard Socher

81897 被引用,249 篇论文

Philip S. Yu

79752 被引用,1712 篇论文