Byte Pair Encoding (BPE)
0 订阅
Byte Pair Encoding, or BPE, is a subword segmentation algorithm that encodes rare and unknown words as sequences of subword units. The intuition is that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). Lei Mao has a detailed blog post that explains how this works.
相关学科: WordPieceUnsupervised Machine TranslationEntity TypingSentencePieceWord InflectionsZero-Shot Cross-Lingual TransferLow-Resource Neural Machine TranslationBacktranslationMachine TranslationMorphological Segmentation
学科讨论

暂无讨论内容,你可以
推荐文献
按被引用数
学科管理组
暂无学科课代表,你可以申请成为课代表
重要学者
Yoshua Bengio
429868 被引用,1063
篇论文
Richard Socher
81897 被引用,249
篇论文
Kyunghyun Cho
65335 被引用,334
篇论文
Wei Zhang
64579 被引用,3192
篇论文
Dacheng Tao
57097 被引用,1414
篇论文
Erkang Wang
49086 被引用,837
篇论文
Michael J. Zaworotko
42632 被引用,545
篇论文
Hermann Ney
39114 被引用,978
篇论文
Shaul Mukamel
36819 被引用,1102
篇论文
Hugo Larochelle
35586 被引用,177
篇论文