This website requires JavaScript.

Learning Cross-lingual Visual Speech Representations

Andreas ZinonosAlexandros HaliassosPingchuan MaStavros PetridisMaja Pantic
Mar 2023
摘要
Cross-lingual self-supervised learning has been a growing research topic inthe last few years. However, current works only explored the use of audiosignals to create representations. In this work, we study cross-lingualself-supervised visual representation learning. We use the recently-proposedRaw Audio-Visual Speech Encoders (RAVEn) framework to pre-train an audio-visualmodel with unlabelled multilingual data, and then fine-tune the visual model onlabelled transcriptions. Our experiments show that: (1) multi-lingual modelswith more data outperform monolingual ones, but, when keeping the amount ofdata fixed, monolingual models tend to reach better performance; (2)multi-lingual outperforms English-only pre-training; (3) using languages whichare more similar yields better results; and (4) fine-tuning on unseen languagesis competitive to using the target language in the pre-training set. We hopeour study inspires future research on non-English-only speech representationlearning.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?