One of the major problems driving current research in statistical machine learning is the search for ways to exploit highly-structured models that are both expressive and tractable. Bayesian nonparametrics (BNP) provides a framework for developing robust and flexible models that can accurately represent the complex structure in the data. Model flexibility is achieved by assigning priors with unbounded capacity and overfitting is prevented by the Bayesian approach of integrating out all parameters and latent variables. Inference is typically achieves with approximation techniques like Markov chain Monte Carlo and variational Bayes. As a result, the model can automatically infer the necessary amount of complexity required for modeling the given data.
Motivation

Nonparametric Bayesian analysis, first developed in the statistics community, have started attracting much attention after the development of approximate inference techniques. This has led to the development of a variety of models and applications of these models in disciplines such as information retrieval, natural language processing, computer vision, computational biology, cognitive science and signal processing. Furthermore, research on nonparametric Bayesian models has served to enhance the links between statistical machine learning and a number of other mathematical disciplines, including stochastic processes, algorithms, optimization, combinatorics and knowledge representation. As a result, a large community has grown around this topic, machine learning researchers having a substantial contribution in the recent progress in the field. The purpose of this workshop is to bring together researchers from machine learning and statistics to create a forum for discussing recent advances in BNP, to understand better the asymptotic properties of the models, and to inspire research on new techniques for better models and inference algorithms.

This is the fifth in a series of successful workshops on this topic. The first two were at NIPS 2003 and 2005 and the last two were at ICML 2006 and 2008. The field attracts researchers from a broad range of disciplines, ranging from theoretical statisticians and probabilists to people working on specialized applications. It is important that we effectively communicate our advances and needs to better focus our efforts. Theoreticians need to know which methods are used in practice and how, while applied researchers need to learn the latest models and inference algorithms to improve their approaches to problems. It is especially important to bring together statisticians and machine learning researchers, since both communities work on the topic but have complimentary strengths. This workshop aims to enhance the interaction between the two communities which has been initiated by previous workshops within both communities in order to exchange the latest developments in the field and address open problems.

The workshop will focus mainly on two important issues. The first involves practical matters to enable the use of BNP in real world applications, while the second involves theoretical properties of complex Bayesian nonparametric models, in particular asymptotics, e.g. consistency, rates of convergence, and Bernstein von-Mises results. Each focus will be given a specific session during the workshop. We describe both foci in detail below.
Although BNP has attracted much attention in many application domains, its use in real world applications is still limited. The parametric versus nonparametric controversy is still going on, with the complexity of the models and inference techniques discouraging practitioners to use BNP. Demonstrating application domains in which nonparametric Bayesian models clearly does better than the parametric counterparts would give clear motivation to consider the use of these models in practice. More importantly, automation of the application of nonparametric Bayesian models would encourage a wider community to utilize these models. This includes providing easy to follow guidance for the model structure specification and the choice of
hyperparameters. A step towards this direction is the discussion of an objective or empirical Bayes treatment. Additionally, developing general purpose software that can scale up inference techniques to massive datasets would is another step necessary for the wide applicability of these models. There is ongoing work in the community towards these directions. This workshop will help us summarize the current state of the practical use of nonparametric Bayesian models and focus on the requirements of the field to extend its use in other application domains.

Another point of focus for this workshop is the theoretical developments in the field. Current work has established results on asymptotic behaviour of simple BNP models. Most of the results are for simple cases, such as density estimation and Gaussian process regression. There is little or no work on posterior consistency, rates of convergence or Bernstein von Mises results for latent variable models. However, there is a steady development of tools, which are starting to allow us to tackle much more challenging models. The machine learning community has mainly focused on developing complex nonparametric models and their applications without being much concerned about the theoretical properties of these models. We will invite experts to comment and provide guidance on discussion on this topic. We will also invite theoreticians within the NIPS community to participate in this focus.

Organizers

  • Dilan Gorur, Gatsby Computational Neuroscience Unit
  • François Caron, INRIA Bordeaux Sud-Ouest
  • Yee Whye Teh, Gatsby Computational Neuroscience Unit
  • David Dunson, Duke University
  • Zoubin Ghahramani, University of Cambridge
  • Michael I. Jordan, University of California at Berkeley

Data with temporal (or sequential) structure arise in several applications, such as speaker diarization [FSJW08b, FDH08], human action segmentation [ZTH08], network intrusion detection [TRBK06], DNA copy number analysis [LXZ08], and neuron activity modelling [Y07], to name a few.

A particularly recurrent temporal structure in real applications is the so-called change-point model [BH92], where the data may be temporally partitioned into a sequence of segments delimited by change-points, such that a single model holds within each segment whereas different models hold accross segments. Change-point problems may be tackled from two points of view, corresponding to the practical problem at hand: retrospective (or "a posteriori"), aka multiple change-point estimation [F06], where the whole signal is taken at once and the goal is to estimate the change-point locations [BKLMW09], and online (or sequential), aka quickest detection [PH09], where data are observed sequentially and the goal is to quickly detect change-points. We refer to these classes of tasks as temporal segmentation.

An extensive literature has developed in these two viewpoints, in both the statistics (and probability) community [L01], and in the signal processing community [K98]. Many of the optimal algorithms proposed in this literature were developed under rather restrictive assumptions, however: parametric models for distributions, low-dimensional multivariate data, and, in the online case, perfect knowledge of the pre- and post-change distributions.

In applications such as human action segmentation or speaker diarization, data are large-scale, expensive to label, and high-dimensional, therefore requiring approaches that can tackle more complex situations in temporal segmentation. Recent years have witnessed new approaches with broader applicability, essentially by proposing unsupervised [XWSS06, ZTH08], nonparametric [FSJW08b], and scalable temporal segmentation algorithms [FL07, FDH08].

The purpose of this workshop is to bring together experts from the statistics, machine learning, signal processing communities, to address a broad range of applications from robotics to neuroscience, to discuss and cross-fertilize ideas, and to define the current challenges in temporal segmentation. We intend to encourage discussions on the following particular issues: How can traditional statistical approaches for temporal segmentation, essentially generative, be extended to discriminative approaches, allowing us to deal with high-dimensional data? How well do unsupervised approaches for temporal segmentation perform with respect to supervised ones? What are the main statistical and computational issues that arise when addressing large-scale (long) data signals?

Organizers

  • Zaid Harchaoui (primary organizer, zaidh@andrew.cmu.edu)
  • Stephane Canu
  • Olivier Cappe
  • Arthur Gretton
  • Alain Rakotomamonjy
  • Jean-Philippe Vert

The workshop will be held from 30 September to 2 October in Corfu, Greece in conjunction with CLEF 2009 (Cross-Language Evaluation Forum). The actual Morpho Challenge part lasts only one day (Sep 30), but the same registration allows you to also participate in the other CLEF tracks that follow immediately. Access the registration form here and select CLEF. There are only a limited number of seats available, so please let us know asap, if you are coming to the workshop.

This workshop, collocated with the 19th European Conference on Machine Learning and the 12th Conference on Principles and Practice of Knowledge Discovery in Databases in Bled, Slovenia, aims at fostering research in Machine Learning and Data Mining applied to Robotics. All contributions related to

  • Bayesian and probabilistic robotics,
  • reinforcement learning and apprenticeship learning,
  • learning from sensor data and logs (non exhaustive list)

are welcome. All contributions discussing issues such as:

  • how to learn with/without a simulator, and how to face the reality gap
  • how to build controllers that are robust with respect to motor/sensor failures
  • how to derive self-driven criteria/goals for the robot
  • how to deal with swarm robotics, distributed decision making and frugal communication,
Program Committee
Einoshin Suzuki Chair (Kyushu University, Japan)
Michèle Sebag Chair (CNRS & Université Paris-Sud, France)
Shin Ando (Gunma University, Japan)
Jose L. Balcázar (Technical University of Catalonia, Spain)
Aude Billard (Ecole Polytechnique Fédérale de Lausanne, Switzerland)
Nicolas Bredèche (Université Paris-Sud, France)
João Gama (University of Porto, Portugal)
Peter Grünwald (CWI, Netherlands)
Hitoshi Iba (University of Tokyo, Japan)
Kristian Kersting (Fraunhofer IAIS & University of Bonn, Germany)
Jan Peters (Max Planck Institute for Biological Cybernetics, Germany)
Marc Schoenauer (INRIA & Université Paris-Sud, France)
Marc Toussaint (Technical University of Berlin, Germany)
Takashi Washio (Osaka University, Japan)

MLSB09, the Third International Workshop on Machine Learning in Systems Biology will be held in Ljubljana, Slovenia on September 5-6 2009 at the Jozef Stefan Institute.

The aim of this workshop is to contribute to the cross-fertilization between the research in machine learning methods and their applications to systems biology (i.e., complex biological and medical questions) by bringing together method developers and experimentalists. You can download the call for papers from here.

Motivation

Molecular biology and all the biomedical sciences are undergoing a true revolution as a result of the emergence and growing impact of a series of new disciplines/tools sharing the �-omics� suffix in their name. These include in particular genomics, transcriptomics, proteomics and metabolomics, devoted respectively to the examination of the entire systems of genes, transcripts, proteins and metabolites present in a given cell or tissue type.

The availability of these new, highly effective tools for biological exploration is dramatically changing the way one performs research in at least two respects. First, the amount of available experimental data is not a limiting factor any more; on the contrary, there is a plethora of it. Given the research question, the challenge has shifted towards identifying the relevant pieces of information and making sense out of it (a �data mining� issue). Second, rather than focus on components in isolation, we can now try to understand how biological systems behave as a result of the integration and interaction between the individual components that one can now monitor simultaneously (so called �systems biology�).

Taking advantage of this wealth of �genomic� information has become a �conditio sine qua non� for whoever ambitions to remain competitive in molecular biology and in the biomedical sciences in general. Machine learning naturally appears as one of the main drivers of progress in this context, where most of the targets of interest deal with complex structured objects: sequences, 2D and 3D structures or interaction networks. At the same time bioinformatics and systems biology have already induced significant new developments of general interest in machine learning, for example in the context of learning with structured data, graph inference, semi-supervised learning, system identification, and novel combinations of optimization and learning algorithms.

Chairs

  • Saso Dzeroski Jozef Stefan Institute, Slovenia
  • Pierre Geurts GIGA-Research, University of Li�ge, Belgium
  • Juho Rousu Department of Computer Science, University of Helsinki, Finland

Scientific Program Committee

  • Florence d'Alché-Buc (University of Evry, France)
  • Saso Dzeroski (Jozef Stefan Institute, Slovenia)
  • Paolo Frasconi (Universit� degli Studi di Firenze, Italy)
  • Cesare Furlanello (Fondazione Bruno Kessler, Trento, Italy)
  • Pierre Geurts (University of Liège, Belgium)
  • Mark Girolami (University of Glasgow, UK)
  • Dirk Husmeier (Biomathematics & Statistics Scotland, UK)
  • Samuel Kaski (Helsinki University of Technology, Finland)
  • Ross King (Aberystwyth University, UK)
  • Neil Lawrence (University of Manchester, UK)
  • Elena Marchiori (Vrije Universiteit Amsterdam, (The Netherlands)
  • Yves Moreau (Katholieke Universiteit Leuven, Belgium)
  • William Noble (University of Washington, USA)
  • Gunnar Rätsch (FML, Max Planck Society, Tübingen)
  • Juho Rousu (University of Helsinki, Finland)
  • Céline Rouveirol (University of Paris XIII, France)
  • Yvan Saeys (University of Gent, Belgium)
  • Guido Sanguinetti (University of Sheffield, UK)
  • Ljupco Todorovski (University of Ljubljana, Slovenia)
  • Koji Tsuda (Max Planck Institute, Tuebingen)
  • Jean-Philippe Vert (Ecole des Mines, France)
  • Louis Wehenkel (University of Liège, Belgium)
  • Jean-Daniel Zucker (University of Paris XIII, France)
  • Blaz Zupan (University of Ljubljana, Slovenia)

Modern society is increasingly reliant on our capability to automatically detect patterns in vast masses of data. This is affecting not only the way we do business and run our industries, but also is changing the very nature of the scientific method. Every science now has an e-version (computational biology, computational chemistry, etc) and in many cases this involves automatisation of both the production and the analysis of experimental data. The use of computer simulations increases our reliance on automatic analysis of data even further. This process is accelerating.

The distinct scientific communities that are working on various aspects of automatic analysis of data include Combinatorial Pattern Matching, Data Mining, Computational Statistics, Network Analysis, Text Mining, Image Processing, Syntactical Pattern Recognition, Machine Learning, Statistical Pattern Recognition, Computer Vision, and many others.

A unified understanding of the challenges and opportunities ahead is essential for further progress, and is the purpose of this series of workshops / summer-schools: to promote a unified understanding of all the technical and conceptual issues relating to the automatic discovery and exploitation of patterns in data.

The previous 2 editions of this event took place in Erice, 2005 and Bertinoro, 2007. The videos of all lectures are available online.

INTENDED AUDIENCE: The school is intended for PhD students, postdocs, and researchers (both academic and industrial), working in any of the disciplines involved in "the analysis of patterns" and hence including: bioinformatics, data mining, text analysis, machine learning, statistics, optimization, computer vision, stringology, network analysis, etc.

 

This workshop addresses the problem of learning from data that are not independently and identically distrbuted (IID), knowing that IIDness is a common assumption made in statistical machine learning. If this assumption helps to study the properties of learning procedures (e.g. generalization ability), and also guides the building of new algorithms, there are many real world situations where it does not hold. This is particularly the case for many challenging tasks of machine learning that have recently received much attention such as (but not limited to): ranking, active learning, hypothesis testing, learning with graphical models, prediction on graphs, mining (social) networks, multimedia or language processing. The goal of this workshop is to bring together research works aiming at identifying problems where either the assumption of identical distribution or independency, or both, is violated, and where it is anticipated that carefully taking into account the non-IIDness is of primary importance.
Examples of such problems are:

  • Bipartite ranking or, more generally, pairwise classification, where pairing up IID variables entails non-IIDness: while the data may still be identically distributed, it is no longer independent;
  • Active learning, where labels for specific data are requested by the learner: the independence assumption is also violated;
  • Learning with covariate shift, where the training and test marginal distributions of the data differ: the identically distributed assumption does not hold.
  • Online learning with streaming data, when the distribution of the incoming examples changes over time: the examples are not identically distributed.

We see the workshop as a venue not only for the presentation of papers focusing on carefully dealing with non-IID data, but also as a forum for sharing ideas across different application domains. Henceforth, it will be an opportunity for discussions on methods that address non-IIDness from the following standpoints:

  • Theoretical: results on generalization bounds and learnability, contributions that mathematically formalize the types of non-IIDness encountered, results on the extent to which non-IIDness does not harm the validity of theoretical results build on the IID assumption, helpfulness of the online learning framework,
  • Algorithmic: theoretically motivated algorithms designed to handle non-IID data, approaches that make it possible for classical learning results to carry over, online learning procedures,
  • Practical: successful applications of non-IID learning methods to learning from streaming data, web data, biological data, multimedia, natural language, social network mining.

Organizers

  • Massih-Reza Amini, National Research Council, Canada
  • Amaury Habrard, University of Marseille, France
  • Liva Ralaivola, University of Marseille, France
  • Nicolas Usunier, University Pierre et Marie Curie, France 

Program Committee

  • Shai Ben-David, University of Waterloo, Canada
  • Gilles Blanchard, Fraunhofer FIRST (IDA), Germany
  • Stéphan Clémençon, Télécom ParisTech, France
  • François Denis, University of Provence, France
  • Claudio Gentile, University dell'Insubria, Italy
  • Balaji Krishnapuram, Siemens Medical Solutions, USA
  • François Laviolette, Université Laval, Canada
  • Xuejun Liao, Duke University, USA
  • Richard Nock, University Antilles-Guyane, France
  • Daniil Ryabko, Institut National de Recherche en Informatique et Automatique, France
  • Marc Sebban, University of Saint-Etienne, France
  • Ingo Steinwart, Los Alamos National Labs, USA
  • Masashi Sugiyama, Tokyo Institute of Technology, Japan
  • Nicolas Vayatis, École Normale Supérieure de Cachan, France
  • Zhi-Hua Zhou, Nanjing University, China

The aim of this workshop is to disseminate scientific results produced by the SMART project and PASCAL Network of Excellence to the industry and businesses. The event is aimed at industries and businesses with a connection to data analysis, language technologies, machine translation, machine learning, artificial intelligence, and related areas. It will allow industrial researchers to become familiar with the work carried out within the Pascal and the Smart consortia, and to establish connections with their researchers. Pascal has various programmes aimed at linking with industry, including an Industrial Club and a Harvest Programme. The coordinators of both programmes will be present at the event, as well as the coordinators of Pascal and Smart.

Posters, demos, dissemination material will be available at the venue of the industrial outreach day. The format will be informal and flexible, a combination between a poster session and a technology demo session.

SMART (Statistical Multilingual Analysis for Retrieval and Translation) is a 3-year "Specific Target Research Project" (STReP) funded by the European Commission. SMART is an attempt to address different problems of Machine Translation and Cross-Language Information Retrieval and other shortcomings by the methods of modern Statistical Learning.

Co-organizers

 

Different approaches to Brain-Computer Interfaces have been developed, each one with specific solutions that range from understanding and explaining cognitive functions to communicating with real and virtual environments by thought alone.

The Berlin BCI Workshop presents an overview, in-depth tutorials and discussions on the latest research at all levels of interaction. The research presented will cover invasive recording, with its high temporal and spatial resolution, semi-invasive ECoG, non-invasive EEG, with high temporal and low spatial resolution, non-invasive NIRS and fMRI measurement, with partially high spatial and low temporal resolution and potential combinations of the different methods.

The workshop programme includes one day full of tutorials on invasive BCI, electro-physiology and non-invasive BCI. The two other workshop days cover all aspects of invasive and non-invasive EEG, NIRS and fMRI, plus informative results of the "TOBI: Tools for Brain-Computer Interaction" project, aspects of Brain@Work (neurotechnology-basedman-machine interaction for industrial applications) and our newly founded Bernstein Focus: Neurotechnology (Noninvasive Neurotechnologies for Man-Machine Interactions). The poster session following the tutorials will cross over into the BBCI barbecue, smoothing discussions with drinks and food.

This year the Berlin BCI Workshop presents an overview, in-depth tutorials and discussions on the latest research at all levels of interaction. The research presented will cover:

  • invasive recording, with its high temporal and spatial resolution
  • semi-invasive ECoG, non-invasive EEG, with high temporal and low spatial resolution
  • non-invasive NIRS and fMRI measurement, with partially high spatial and low temporal resolution
  • and potential combinations of the different methods

Organisation

  • Klaus-Robert Mueller, TUB and BFNT-B
  • Benjamin Blankertz, TUB and BFNT-B
  • John-Dylan Haynes, BCCN-B
  • Michael Tangermann, TUB
  • Steven Lemm, Fraunhofer FIRST
  • Matthias L. Jugel, BFNT-B
  • Andrea Gerdes, TUB/Workshop Secretary