This website requires JavaScript.

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Longxu DouYan GaoMingyang Pan ...+3 Jian-Guang Lou
Dec 2022
摘要
Text-to-SQL semantic parsing is an important NLP task, which greatlyfacilitates the interaction between users and the database and becomes the keycomponent in many human-computer interaction systems. Much recent progress intext-to-SQL has been driven by large-scale datasets, but most of them arecentered on English. In this work, we present MultiSpider, the largestmultilingual text-to-SQL dataset which covers seven languages (English, German,French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider, wefurther identify the lexical and structural challenges of text-to-SQL (causedby specific language properties and dialect sayings) and their intensity acrossdifferent languages. Experimental results under three typical settings(zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop inaccuracy in non-English languages. Qualitative and quantitative analyses areconducted to understand the reason for the performance drop of each language.Besides the dataset, we also propose a simple schema augmentation frameworkSAVe (Schema-Augmentation-with-Verification), which significantly boosts theoverall performance by about 1.8% and closes the 29.5% performance gap acrosslanguages.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答