This website requires JavaScript.

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Rongzhi GuShi-Xiong ZhangYuexian ZouDong Yu
Dec 2022
摘要
Recently, frequency domain all-neural beamforming methods have achievedremarkable progress for multichannel speech separation. In parallel, theintegration of time domain network structure and beamforming also gainssignificant attention. This study proposes a novel all-neural beamformingmethod in time domain and makes an attempt to unify the all-neural beamformingpipelines for time domain and frequency domain multichannel speech separation.The proposed model consists of two modules: separation and beamforming. Bothmodules perform temporal-spectral-spatial modeling and are trained fromend-to-end using a joint loss function. The novelty of this study lies in twofolds. Firstly, a time domain directional feature conditioned on the directionof the target speaker is proposed, which can be jointly optimized within thetime domain architecture to enhance target signal estimation. Secondly, anall-neural beamforming network in time domain is designed to refine thepre-separated results. This module features with parametric time-variantbeamforming coefficient estimation, without explicitly following the derivationof optimal filters that may lead to an upper bound. The proposed method isevaluated on simulated reverberant overlapped speech data derived from theAISHELL-1 corpus. Experimental results demonstrate significant performanceimprovements over frequency domain state-of-the-arts, ideal magnitude masks andexisting time domain neural beamforming methods.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答