Summary
Efficient approximate inference in large Hybrid Networks (graphical models with discrete and continuous variables) is one of the major unsolved problems in machine learning, and insight into good solutions would be beneficial in advancing the application of sophisticated machine learning to a wide range of real-world problems.
Such research would benefit potentially applications in Speech Recognition, Visual Object Tracking and Machine Vision, Robotics, Music Scene Analysis, Analysis of complex Times series, understanding and modelling complex computer networks, Condition monitoring, and other complex phenomena.
This theory challenge specifically addresses a central component area of PASCAL, namely Bayesian Statistics and statistical modelling, and is also related to the other central areas of Computational Learning, Statistical Physics and Optimisation techniques.
One aim of this challenge is to bring together leading researchers in graphical models and related areas to develop and improve on existing methods for tackling the fundamental intractability in HNs. We do not believe that there will necessarily emerge a single best approach, although we would expect that successes in one application area should be transferable to related areas. Many leading machine learning researches are currently working on applications that involve HNs, and we invite participants to suggest their own applications. Ideally, this would be in the form of a dataset along the lines of PASCAL.
Graphical Models
Graphical models are a powerful framework for formulating and solving difficult problems in machine learning. Being essentially a marriage between graph and probability theory, they provide a theoretically elegant approach to thinking about complex machine learning problems, and have had widespread success, being now one of the dominant approaches. A great many problems in machine learning can be formulated as latent or hidden variable problems in which an underlying mechanism, which cannot be directly observed, is responsible for generating observable values — based on these observations, we wish to learn something about the fundamental generating process.
Hybrid Networks
Hybrid networks are graphical models which contain both discrete and continuous variables. Often, but not exclusively, natural application areas arise in the temporal domain, for example speech recognition, or visual tracking, in which time plays a fundamental role; one reason for this is that many physical processes are inherently Markovian.
In the continuous case, a widely used model is the Kalman Filter (KF), which is based on a linear dynamical system with Gaussian additive noise. The discrete anologue of the Kalman Filter is the Hidden Markov Model (HMM), in which hidden states are discrete, and the output(visible) states may be discrete or continuous. Such models are used in leading speech recognition software and tracking applications. Both the KF and the HMM are computationally tractable. Recently, there has been a recognition that many machine learning models would naturally have both continuous (as in the KF) and discrete (as in the HMM) hidden variables.
Hybrid Networks (HNs) are such Graphical Models with both continuous and discrete variables . For example, imagine that a musical instrumentis played at a time t, which we can model with a switch variables(t)=1 : future sound generation can be modelled as a hiddenGaussian linear dynamical system with transition dynamicsp(h(t+1)|h(t),s(t)=1). When the musical instrument is at a future time t is turned off, s(t)=0, a different dynamics occurs, p(h(t+1)|h(t),s(t)=0). The observation (visible) process p(v(t)|h(t)) is typically a noise projection of the hidden state h(t), say in the case of sound, to a one dimensional pressure displacement. Based on the observed sequence v(1),…v(T), we may with to infer the switch variables s(1),..s(T). This particular kind of HN is called a switching Kalman Filter.
The Challenge
A fundamental difficulty with Hybrid Networks is their intractability — unfortunately, the marriage of two tractable models, the Kalman Filter and Hidden Markov Model, does not result in a tractable Hybrid Network. Recently, developing approximate inference and learning methods in HNs has been a major research activity, across a large range of fields, including speech, music transcription, condition monitoring, control, robotics, and computer-brain interfaces.
Whist this challenge is largely theoretical in nature, in order to have at least one concrete problem, we will make available a dataset of acoustic recordings, for example of a live piano recording. The challenge would be infer what notes were played and when — that is, to perform a transcription of the performance. We will know the ground truth (since we generated the data), for which we can compare competing solutions. Music transcription is a difficult, largely unsolved problem, although initial attempts using HNs have demonstrated the effectiveness of the HN solution. However, making faster approximation schemes in this area would overcome the current barrier to commercialisation of such techniques.
Although we framed, for concreteness here, a music example, this is merely one instance of many application areas, and we do not expect any bias towards one particular application area.