Extended Transformer Construction (ETC)

Extended Transformer Construction, or ETC, is an extension of the Transformer architecture with a new attention mechanism that extends the original in two main ways: (1) it allows scaling up the input length from 512 to several thousands; and (2) it can ingesting structured inputs instead of just linear sequences. The key ideas that enable ETC to achieve these are a new global-local attention mechanism, coupled with relative position encodings. ETC also allows lifting weights from existing BERT models, saving computational resources while training.
相关学科: Systems and ControlComputational Engineering, Finance and ScienceElectricPerformanceComputational GeometryMASOther Computer ScienceDatabasesNetworking and Internet ArchitectureAWARE

学科讨论

讨论Icon

暂无讨论内容,你可以

推荐文献

按被引用数

学科管理组

暂无学科课代表,你可以申请成为课代表

重要学者

Lotfi A. Zadeh

128623 被引用,362 篇论文

Claude E. Shannon

121827 被引用,63 篇论文

Jiawei Han

121361 被引用,1269 篇论文

Lei Zhang

76690 被引用,2397 篇论文

John R. Anderson

73543 被引用,583 篇论文

Iskander Ibragimov

70649 被引用,984 篇论文

Alessandro Vespignani

67484 被引用,490 篇论文

Colin F. Camerer

66837 被引用,474 篇论文

Anantha P. Chandrakasan

64203 被引用,626 篇论文

Jinde Cao

63560 被引用,1486 篇论文