Natural signal processing systems have the ability to combine impressions from different senses. This ability enables animals and humans to extract information from and understand noisy and complex environments. An example of this can be seen in most human to human interactions where speech, facial expression, smell, gestures, haptics, etc play a role. While each of these modalities have been modelled at high levels of sophistication, multi-media modeling is still in its infancy.

Multimodal signal processing includes the challenge of handling several sources of information at the same time. Topics of particular importance are multi-stream training and decoding, joint modality integration, fusion of multiple decision streams, confidence estimation and conversion between modalities.

The aim of the workshop is to:

  • Introduce problems related to multimodal integration to machine learning researchers.
  • Identify and compare different integration strategies.
  • Determine the major challenges in using more than one modality.
  • Propose novel methods to improve the state of the art.

Speakers will be asked to give a short presentation of their own work including a demonstration on real data. Further more speakers are asked to reserve a minimum of five minutes of their talk to a discussion of the main questions of the workshop. That is, a discussion of the integration strategy used in relation to other possible strategies and a discussion of what major challenges the speaker finds of importance to the field.

The workshop will concentrate on two aspects of multimodal signal processing namely sound and image integration and wearable computing. Further more the physiological/phychological side will be considered in a single talk. The workshop will build on experiences from the "JOINT AMI/PASCAL/IM2/M4 Workshop on Multimodal Interaction and Related Machine Learning Algorithms" (Martigny June 2004) and the "Machine Learning meets the User Interface Workshop" (NIPS 2003).