      2013年12月25日,应谢磊教授邀请,新加坡资讯通信研究院(I2R)研究科学家黄东延博士对西北工业大学计算机学院和陕西省语音与图像处理重点实验室进行了学术访问。上午9:30分,黄东延博士在计算机学院203会议室做了题为“Acoustic Emotion Recognition using Deep Boltzmann Machines”(采用深度玻尔兹曼机进行声学情感识别)的学术报告。报告会由谢磊教授主持,30余名师生参加了报告会。黄东延博士为大家讲解了使用深度波兹曼机进行情感识别的方法。报告会后,黄东延博士与师生展开了讨论,同学们积极提问,黄博士耐心解答。黄东延博士访问期间,参观了陕西省语音与图像信息处理重点实验室在语音与音频处理、图像与视频处理等方面的研究成果,并同实验室师生进行了深入讨论。

      新闻背景:新加坡资讯通讯研究院(Institute for Infocomm Research, I2R)是新加坡在信息技术(IT)方面的顶尖研究所,该所人类语言技术部(Human Language Technology Department)是语音和语言研究国际著名研究机构,在说话人识别、语种识别、语音识别、语音转换、语音合成等方面的研究处于国际领先地位。计算机学院、陕西省语音与图像信息处理重点实验室与I2R-HLT自2009年开展合作研究,HLT部主任李海洲教授是我校客座教授。目前,已有八名研究生多次赴新加坡I2R人类语言研究部(HLT)展开合作研究,取得了丰硕的成果,研究成果发表在语音研究顶级会议ACL、Interspeech、ICASSP和重要国际期刊上,取得了重要影响。目前两位同学正在I2R进行学术访问,黄东延博士的此次来访,必将进一步加强双方的合作关系。

      目报告题目:Acoustic Emotion Recognition using Deep Boltzmann Machines

      报告人简介: Dr. Huang Dong Yan received the B.Sc. degree in control and information engineering and the M.Sc degree in electrical engineering from Xi’an Jiaotong University, Xi’an, China, in 1985 and 1988, respectively, and the Ph.D. degree in signal processing from the Conservatoire National des Arts et Métiers Paris (CNAM), Paris, France, in 1996. She is a Research Scientist with Institute for Infocomm Research (I2R) Singapore. Before joining I2R in Dec. 2002, she was a Senior Research Engineer with the Institute of Microelectronics, Singapore from Dec. 1997 - Dec. 2002. In December 1996, she began her postdoctoral research work on low-delay high-quality audio and speech codec design at UFR de Mathématiques et Informatique, Université René Descartes, Paris V. Her research interests include machine learning, pattern recognition, voice transformation, music information retrieval, speech/singing evaluation and synthesis, classification of paralinguistic information in natural speech and language, and interactive dialog expressive avatars. She has published over 50 papers in refereed international journals and conferences and a linear adaptive predictor developed with members of audio team has been adopted as normative part of MPEG-4 Audio Lossless Coding international standard. She and her team won the 1st prize of the Sleepiness Sub-Challenge of INTERSPEECH 2011 Speaker State Challenge. Dr. Huang is a Senior Member of the Institute of Electrical and Electronic Engineers (IEEE). She served as chairperson of IEEE Singapore WIE (Women-in-engineering) Affinity Group from 2005-2008. She has been on the program committees for several international conferences in the areas of signal processing, multimedia, and human-computer interaction.

      报告摘要: Emotional speech processing is an attracting research area and allows to manage a more natural interaction between humans and machines. Vocal expression is the most natural modality for communicating meaning, emotion and personality and it is characterized by a verbal component (language) and by a non-verbal component (paralinguistic information). In this talk, I present a method to improve the classification recall of a Deep Boltzmann Machine (DBM) on the task of emotion recognition from speech. The task involves the binary classification of four emotion dimensions – arousal, expectancy, power, and valence. The method consists of dividing the features of the input features into separate sets and training each set individually using a Deep Boltzmann Machine algorithm. Afterwards, the results from each set are fused together using a simple fusion. The final fused scores are compared to scores obtained from Support Vector Machine (SVM) classifiers and from the same DBM algorithm without feature separation.




