This website requires JavaScript.

Efficient Long-Text Understanding with Short-Text Models

Maor IvgiUri ShahamJonathan Berant
Aug 2022
摘要
Transformer-based pretrained language models (LMs) are ubiquitous acrossnatural language understanding, but cannot be applied to long sequences such asstories, scientific articles and long documents, due to their quadraticcomplexity. While a myriad of efficient transformer variants have beenproposed, they are typically based on custom implementations that requireexpensive pretraining from scratch. In this work, we propose SLED:SLiding-Encoder and Decoder, a simple approach for processing long sequencesthat re-uses and leverages battle-tested short-text pretrained LMs.Specifically, we partition the input into overlapping chunks, encode each witha short-text LM encoder and use the pretrained decoder to fuse informationacross chunks (fusion-in-decoder). We illustrate through controlled experimentsthat SLED offers a viable strategy for long text understanding and evaluate ourapproach on SCROLLS, a benchmark with seven datasets across a wide range oflanguage understanding tasks. We find that SLED is competitive with specializedmodels that are up to 50x larger and require a dedicated and expensivepretraining step.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答