Multimedia Information Retrieval:
With the development of multimedia and network technologies,
there is a large amount of multimedia content (e.g. recordings from
TV or radio broadcast, presentations, meetings or lectures) readily
available in the growing global information infrastructures. This
has brought about the urgent demand for automatic multimedia indexing,
retrieval, visualization, organization and management technologies.
It is important for information providers to offer a personalized
aggregation of relevant multimedia content dynamically upon any
user request. Hence we aim at develop technologies that can automate
the processes of:
a. Multimedia fission:
This procedure identifies
basic constituents of media content, e.g. shots in a video file,
stories/topics in an audio/video file, a textual paragraph, a graphic;
as well as their groupings (i.e. higher level constituents), e.g.
a textual story together with its illustrative graphic.
b. Multimedia categorization:
This procedure classifies the identified constituents
(from the previous step) according to appropriate semantic categories.
c. Multimedia fusion:
Given a user¡¯s request, multimedia fusion
aims to fuse relevant multimedia constituents into a usability-optimized
form for information display for a user.
Research on speech recognition covers from the template based
small vocabulary isolated word recognition to Hidden Markov Model
(HMM) based large vocabulary continuous speech recognition. The
hardware implementation methods on the small vocabulary speech recognition
have been authorized with three patents: one national invention
patent ¡®speech control device and method¡¯, and two practical new
type patents ¡®voice control device¡¯ and ¡®speech controlled toy circuit¡¯.
A large vocabulary continuous speech recognition system and a lip
reading system have also been built.
Speaker recognition (SR), or Voiceprint Recognition is to judge
one's identity according to his voice character. In applications
about access control, because of the tremendous distribution of
telephone, the SR technology becomes the most convenient approach
of remote access control. It can also be used in retrieving special
person's talk or marking and indexing audio stream, such as telephone
surveillance and meeting record, etc. The most common features used
in current SR systems are based on low-level temporal spectral (or
a simple function of it, such as the mel cepstrum), which is a fragile
information carrier and is distorted by many things, e.g., channel,
noise, even small amounts of room reverberation, etc. Therefore
we need to find high-level features, such as idiosyncratic word
usage and pronunciation, prosodic patterns, and vocal gestures.
In audio diarization applications, to follow the speakers changes
and to segmenting utterances of different speakers without priori
knowledge are also important issues.
Audio Visual Speech Processing:
Audio visual speech processing is a new research field crossing
speech, image and computer vision, aiming at fusing the audio and
visual information of human speech. The research topics include
audio visual speech recognition and speech units segmentation, audio
visual emotion recognition and expression, audio visual speaker
recognition, as well speech driven, or text driven (emotional) talking
Audio Signal Processing:
Audio signal processing is techniques of processing digital audio
data collected into computer. We focus on digital sound effects,
virtual sound, audio watermark, speech signal enhancement, sound
localization and tracking, etc. The digital sound effects are to
simulate the different sound effects used in multimedia applications,
such as reverberation, echo, pitch/ speed modification, equalization,
etc. Virtual sound is to reproduct the real position of sound source
with fewer channels' information, based on the binaural localization
ability of human. Audio watermarking is one of the technologies
of information hiding, i.e. to embed some unaware information into
audio signal. Speech enhancement is a classical issue while many
problems exist yet. Sound localization and tracking is to find the
position of sound source with microphone array. We are now focus
on speaker localization and tracking indoor.