当日下午，李锦辉教授在学院105报告厅给大家带来了题目为“A Reverberation-Time-Aware DNN Approach to Speech Dereverberation （一种基于混响-时间-感知深度神经网路的语音去混响方法）”的学术报告，就DNN在语音增强、语音分离和去混响方面的最新研究成果进行了报告。报告会结束后，在座同学们纷纷提出了自己感兴趣的问题，李锦辉教授一一做出了详细的回答。本场报告会使同学们获益良多，尤其是对于语音去混响方法有了更深刻的认识，同时也体会到了去混响技术在语音识别中的重要作用，对今后相关方向的学习和研究过程都有很大的启发性的作用。
报告人简介：Professor Chin-Hui Lee, Georgia Institute of Technology, USA(李锦辉，美国佐治亚理工大学教授，IEEE Fellow，ISCA Fellow，前贝尔实验室资深语音研究员，Bell Labs President's Gold Award获得者，论文超过30000引用，H-index 65。)
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 450 papers and 30 patents, with close to 30,000 citations and an h-index of 65 on Google Scholar for his publications. He received numerous awards, including the Bell Labs President's Gold Award in 1998. He won the SPS's 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition”. In 2012 he gave an ICASSP plenary talk on the future of speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition”. See http://chl.ece.gatech.edu/ for details.
We cast the classical speech dereverberation problem into a regression setting by mapping log power spectral features of reverberant speech to time-delayed features of anechoic speech. Depending on the reverberation time of the acoustic environment we found that different signal processing parameters are needed to deliver a good quality for dereverberated speech. Furthermore, reverberant-time-aware DNN training and decoding procedures can be designed to optimize the dereverberation performance across a wide range of reverberant times. In addition, a single DNN can also be trained to perform simultaneous beamforming and dereverberation for microphone array speech. Furthermore, as a side benefit, using DNN-based speech dereverberation as a pre-processor in the REVERB Challenge automatic speech recognition (ASR) task, we get the lowest word error rate without retraining the dereverberation front-end and the ASR back-end. It is expected the ASR accuracy and robustness could still be improved with joint training of an integrated dereverberation-ASR system.