Byte Pair Encoding (BPE)

Byte Pair Encoding, or BPE, is a subword segmentation algorithm that encodes rare and unknown words as sequences of subword units. The intuition is that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). Lei Mao has a detailed blog post that explains how this works.
相关学科: WordPieceUnsupervised Machine TranslationEntity TypingSentencePieceWord InflectionsZero-Shot Cross-Lingual TransferLow-Resource Neural Machine TranslationBacktranslationMachine TranslationMorphological Segmentation

学科讨论

讨论Icon

暂无讨论内容,你可以

推荐文献

按被引用数

学科管理组

暂无学科课代表,你可以申请成为课代表

重要学者

Yoshua Bengio

429868 被引用,1063 篇论文

Richard Socher

81897 被引用,249 篇论文

Kyunghyun Cho

65335 被引用,334 篇论文

Wei Zhang

64579 被引用,3192 篇论文

Dacheng Tao

57097 被引用,1414 篇论文

Erkang Wang

49086 被引用,837 篇论文

Michael J. Zaworotko

42632 被引用,545 篇论文

Hermann Ney

39114 被引用,978 篇论文

Shaul Mukamel

36819 被引用,1102 篇论文

Hugo Larochelle

35586 被引用,177 篇论文