当日下午，在谢磊教授的陪同下，董明会博士参观了陕西省语音与图像信息处理重点实验室。谢磊教授介绍了实验室近年来在语音合成方面的最新成果及与业界的合作，大家就双方感兴趣的内容进行了深入交流与探讨。董明会博士在学院203报告厅给大家带来了题目为“Linguistic Representation for DNN-based Speech Synthesis （深度神经网络语音合成中的语言特征表示）”的学术报告，就语言特征在基于深度神经网络语音合成应用上的最新成果进行了探讨。报告会结束后，在座同学们纷纷提出了自己感兴趣的问题，董明会博士一一做出了详细的回答。本场报告会使同学们获益良多，尤其是对于语言特征的各种表示方法和语音合成的相关内容受益匪浅，对今后开展相关研究具有启发作用。
报告题目： Linguistic Representation for DNN-based Speech Synthesis （深度神经网络语音合成中的语言特征表示）
Dr. Minghui Dong is currently a research scientist and the head of Voice Analysis and Synthesis Lab in Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A-Star), Singapore. He serves as the vice-president of Chinese and Oriental Languages Information Processing Society (COLIPS), the editor-in-chief of International Journal of Asian Language Processing (IJALP), and a Member-at-Large of Asian Federation of Natural Language Processing (AFNLP). He received bachelor’s degree from University of Science and Technology of China (USTC), master’s degree from Peking University (PKU), and PhD degree from National University of Singapore (NUS) respectively. He joined I2R in Dec 2004. Prior to that, he also worked as a research engineer in Peking University for 3 years and as a researcher in InfoTalk Technology (Singapore) for 3 years.
His research interests include spoken language processing, natural language processing, language resource building and machine learning methods for language processing. He has co-authored more than 80 research papers in leading conferences and journals. He has been actively contributing to Asian and international research communities by serving as different roles in various conferences and organizations. He has been overseeing the running of the IALP conference series and IJALP journal, which promote the interactions between researchers working on the processing of low-resourced languages.
He has been working on Text-to-speech (TTS) system for many years. He led the TTS R&D work and built TTS systems for various local languages (English, Chinese, Malay, etc) on various platforms (Cloud, PC, Smartphone, etc). Recently, he is leading the research of natural language understanding for speech synthesis, deep learning technologies for speech and language processing, personalized and expressive speech synthesis, speech synthesis for low-resourced languages, etc.
报告摘要：In speech synthesis, input text needs to be converted into a proper format that can be used by acoustic models easily. In traditional HMM-based speech synthesis, linguistic features are normally represented with full context labels, which are discrete values. Now, the DNN-based acoustic model is able to accept continuous input easily. This provides more flexibility for linguistic feature representation. One example is to integrate word embedding into linguistic input. In word embedding, words are represented with low dimensional continuous vectors, which can be learned from text corpus by unsupervised learning. This is especially useful for low-resourced languages, in which the linguistic analysis tools, such as part-of-speech tagger and parser are not available. In this talk, I will review some of the efforts on linguistic feature representation in the research community, and also talk about our solution for linguistic processing for low-resourced languages.