This website requires JavaScript.

Data Roaming and Early Fusion for Composed Image Retrieval

Matan LevyRami Ben-AriNir DarshanDani Lischinski
Mar 2023
摘要
We study the task of Composed Image Retrieval (CoIR), where a query iscomposed of two modalities, image and text, extending the user's expressionability. Previous methods typically address this task by a separate encoding ofeach query modality, followed by late fusion of the extracted features. In thispaper, we propose a new approach, Cross-Attention driven Shift Encoder (CASE),employing early fusion between modalities through a cross-attention module withan additional auxiliary task. We show that our method outperforms the existingstate-of-the-art, on established benchmarks (FashionIQ and CIRR) by a largemargin. However, CoIR datasets are a few orders of magnitude smaller comparedto other vision and language (V&L) datasets, and some suffer from serious flaws(e.g., queries with a redundant modality). We address these shortcomings byintroducing Large Scale Composed Image Retrieval (LaSCo), a new CoIR datasetx10 times larger than current ones. Pre-training on LaSCo yields a furtherperformance boost. We further suggest a new analysis of CoIR datasets andmethods, for detecting modality redundancy or necessity, in queries.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答