This website requires JavaScript.

Multi-View Spectrogram Transformer for Respiratory Sound Classification

Wentao HeYuchen YanJianfeng RenRuibin BaiXudong Jiang
Nov 2023
0被引用
0笔记
摘要原文
Deep neural networks have been applied to audio spectrograms for respiratory sound classification. Existing models often treat the spectrogram as a synthetic image while overlooking its physical characteristics. In this paper, a Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer. Specifically, the proposed MVST splits the mel-spectrogram into different sized patches, representing the multi-view acoustic elements of a respiratory sound. These patches and positional embeddings are then fed into transformer encoders to extract the attentional information among patches through a self-attention mechanism. Finally, a gated fusion scheme is designed to automatically weigh the multi-view features to highlight the best one in a specific scenario. Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds.
展开全部
机器翻译
AI理解论文&经典十问
图表提取
参考文献
发布时间 · 被引用数 · 默认排序
被引用
发布时间 · 被引用数 · 默认排序
社区问答