This page presents two overlapping workshops held jointly by the Centre of Computational Statistics and Machine Learning and the Astrophysics Group of the Department of Physics and Astronomy  at University College London. The workshops aim to bring together computer scientists and cosmologists to explore the application of computational statistics and machine learning techniques to data analysis problems in cosmology.

Local Organising Committee

Scientific Organising Committee

MCS 2011 is the tenth edition in a well-established series of meetings providing an international forum for the discussion of issues in multiple classifier system design.  The aim of the workshop is to bring together researchers from diverse communities concerned with this topic, including neural network, pattern recognition, machine learning and statistics.  Information on the previous MCS editions can be found on the website  http://www.diee.unica.it/mcs.  The special focus of MCS 2011 will be on the application of multiple classifier systems in computer security.

Program Committee

  • J. Benediktsson (Iceland) K. Kryszczuk (Switzerland)
  • G. Brown (United Kingdom) L.I. Kuncheva (United Kingdom)
  • H. Bunke (Switzerland) V. Mottl (Russia)
  • L.P. Cordella (Italy) K. Nandakumar (Singapore)
  • R.P.W. Duin (Netherlands) N. Oza (USA)
  • N. El Gayar (Egypt) E. Pekalska (United Kingdom)
  • G. Fumera (Italy) R. Polikar (USA)
  • C. Furlanello (Italy) J.J. Rodriguez (Spain)
  • J. Ghosh (USA) A. Ross (USA)
  • V. Govindaraju (USA) A. Sharkey (United Kingdom)
  • M. Haindl (Czech Republic) F. Tortorella (Italy)
  • I. Hall (USA) G. Valentini (Italy)
  • T.K. Ho (USA) T. Windeatt (United Kingdom)
  • N. Intrator (Israel) D. Windridge (United Kingdom)
  • P. Kegelmeyer (USA) Z.-H. Zhou (China)

The second edition of the On-line Trading of  Exploration and Exploitation Workshop took place on 2 July 2011 in Bellevue, Washington and was  co-located with the 28th International Conference on Machine Learning. The proceedings contain a selection of papers presented at the workshop.

On-line problems such as website optimisation  require to trade exploration and exploitation in order to learn and optimise an unknown target. For instance, in the Pascal Exploration & Exploitation Challenge 2011, a web server o serves clicks from visitors to a webpage, and aims to maximise the ratio of clicks per page views. The relationship between the visitor-content pairs and clicks is unknown but it can be learnt from past observations.

The content presented to the visitors is either chosen to improve our model of clicks in regions of  uncertainty or it is based on the model that has been built so far. These two distinct motives are referred to as exploration and exploitation. Thus, the web server should explore enough to be able to build an accurate model of the visitor's behaviour, while allowing for sucient exploitation in order to earn clicks.

The problem of trading exploration and exploitation was rst tackled in the "multi-armed bandit" formalism, in which the target observations are compared to rewards obtained when pulling arms of slot-machines. The rst theoretical analysis focused on independent reward distributions for each arm. However, in a real-world scenario, arms (inputs) are rarely independent and modelling the dependencies is essential in order to obtain the best learning rate. Gaussian Processes (GP), for instance, are a powerful modelling tool that has been widely used in bayesian online optimisation, in combination with heuristics such as the Most Probable Improvement for selecting inputs where to sample the function.

However, performance guarantees for the use of GP in a bandit setting were only found in 2010, when used in combination with the Upper Con dence Bound heuristic for trading exploration and exploitation. The exploration/exploitation dilemma is a recurrent topic in many areas of research, e.g. global optimisation, reinforcement learning, tree search, recommender systems or information retrieval. The trading of exploration and exploitation is particularly of high importance in various large-scale applications, such as sponsored search advertising or content-based information retrieval, where the aim is to help users quickly access the information they are looking for.

The workshop provided an opportunity to present, compare and discuss the performance of different exploration/exploitation techniques as well as theoretical analysis of such algorithms. A particular focus of the workshop were large scale applications. The results of the Pascal Exploration & Exploitation
Challenge 2011 were also presented during the workshop.

Motivation

Intelligent beings commonly transfer previously learned “knowledge” to new domains, making them capable of learning new tasks from very few examples. In contrast, many recent approaches to machine learning have been focusing on “brute force” supervised learning from massive amount of labeled data. While this last approach makes a lot of sense practically when such data are available, it does not apply when the available training data are unlabeled for the most part. Further, even when large amounts of labeled data are available, some categories may be more depleted than others. For instance, for Internet documents and images the abundance of examples per category typically follows a power law. The question is whether we can exploit similar data (labeled with different types of labels or completely unlabeled) to improve the performance of a learning machine. This workshop will address a question of fundamental and practical interest in machine learning: the assessment of methods capable of generating data representations that can be reused from task to task. To pave the ground for the workshop, we organized a challenge on unsupervised and transfer learning.

Competition

The  unsupervised and transfer learning challenge just started and will end April 15, 2011. The results of the challenge will be discussed at the workshop and we will invite the best entrants to present their work. Further, we intend to launch a second challenge on supervised transfer learning whose results will be discussed at NIPS 2011. This workshop is not limited to the competition program that we are leading. We encourage researchers to submit papers on the topics of the workshop.

Participation

We invite contributions relevant to unsupervised learning and transfer learning (UTL), including:

Algorithms for UTL, in particular addressing:

  • Learning from unlabeled or partially labeled data
  • Learning from few examples per class, and transfer learning
  • Semi-supervised learning
  • Multi-task learning
  • Covariate shift
  • Deep learning architectures, including convolutional neural network
  • Integrating information from multiple sources
  • Learning data representations
  • Kernel or similarity measure learning

Applications pertinent to the workshop topic, including:

  •  Text processing (in particular from multiple languages)
  • Image or video indexing and retrieval
  • Bioinformatics
  • Robotics
  • Datasets and benchmarks

Program committee

  • David Aha
  • Yoshua Bengio
  • Joachim Buhmann
  • Gideon Dror
  • Isabelle Guyon
  • Quoc Le
  • Vincent Lemaire
  • Alexandru Niculescu-Mizil
  • Gregoire Montavon
  • Atiqur Rahman Ahad
  • Gerard Rinkus
  • Gunnar Raetsch
  • Graham Taylor
  • Prasad Tadepalli
  • Dale Schuurmans
  • Danny Silver

This workshop seeks to excite and inform researchers to tackle the next level of problems in the area of Computer Vision. The idea is to both give Computer Vision researchers access to the latest Machine Learning research, and also to communicate to researchers in the machine learning community some of the latest challenges in computer vision, in order to stimulate the emergence of the next generation of learning techniques. The workshop itself is motivated from several different points of view:

  1. There is a great interest in and take-up of machine learning techniques in the computer vision community. In top vision conferences such as CVPR, machine learning is prevalent: there is widespread use of Bayesian Techniques, Kernel Methods, Structured Prediction, Deep Learning, etc.; and many vision conferences have featured invited speakers from the machine learning community.
  2. Despite the quality of this research and the significant adoption of machine learning techniques, often such techniques are used as "black box" parts of a pipeline, performing traditional tasks such as classification or feature selection, rather than fundamentally taking a learning approach to solving some of the unique problems arising in real-world vision applications.
  3. Beyond object recognition and robot navigation, many interesting problems in computer vision are less well known. These include more  complex tasks such as joint geometric/semantic scene parsing, object discovery, modeling of visual attributes, image aesthetics, etc.
  4. Even within the domain of "classic" recognition systems, we also face significant challenges in scaling up machine learning techniques to millions of images and  thousands of categories (consider for example the  ImageNet data set).
  5. Images often come with extra multi-modal information (social network graphs, user preference, implicit feedback indicators, etc) and this information is often poorly used, or integrated in an ad-hoc fashion.

This workshop therefore seeks to bring together machine learning and computer vision researchers to discuss these challenges, show current progress, highlight open questions and stimulate promising future research.

Organizers

Face-to-face communication is a highly interactive process in which the participants mutually exchange and interpret verbal and nonverbal messages. Both the interpersonal dynamics and the dynamic interactions among an individual's perceptual, cognitive, and motor processes are swift and complex. How people accomplish these feats of coordination is a question of great scientific interest. Models of human communication dynamics also have much potential practical value, for applications including the understanding of communications problems such as autism and the creation of socially intelligent robots able to recognize, predict, and analyze verbal and nonverbal behaviors in real-time interaction with humans.

Modeling human communicative dynamics brings exciting new problems and challenges to the NIPS community. The first goal of this workshop is to raise awareness in the machine learning community of these problems, including some applications needs, the special properties of these input streams, and the modeling challenges. The second goal is to exchange information about methods, techniques, and algorithms suitable for modeling human communication dynamics. After the workshop, depending on interest, we may arrange to publish full-paper versions of selected submissions, possibly as a volume in the JMLR Workshop and Conference papers series.

Topics

We therefore invite submissions of short high-quality papers describing research on Human Communication Dynamics and related topics.  Suitable themes include, but are not limited to:

  • Modeling methods robust to semi-synchronized streams (gestural, lexical, prosodic, etc.)
  • Learning methods robust to the highly variable response lags seen in human interaction
  • Coupled models for the explicit simultaneous modeling of more than one participant
  • Ways to combine symbolic (lexical) and non-symbolic information
  • Learning of models that are valuable for both behavior recognition and behavior synthesis
  • Algorithms robust to training data whose labeling is incomplete or noisy
  • Feature engineering
  • Online learning and adaptation
  • Models of moment-by-moment human interaction that can also work for longer time scales
  • Specific applications and potential applications
  • Failures and problems observed when applying existing methods to such tasks
  • Insights from experimental or other studies of human communication behavior

Organizers

  • Louis-Philippe Morency (University of Southern California)
  • Daniel Gatica-Perez (IDIAP)
  • Nigel Ward (UTEP)

This one-day workshop, associated with the 2010 Neural Information Processing Systems (NIPS) conference, took place on Friday December 10, 2010 in Whistler, British Columbia. The workshop focussed on the practical application of modern Monte Carlo techniques to problems of interest in machine learning and beyond. We had six invited talks, with speakers focusing on the real-world aspects of performing inference with Monte Carlo. We also had a wide range of contributed abstracts, which were presented in the poster session.

Monte Carlo methods have been the dominant form of approximate inference for Bayesian statistics over the last couple of decades. Monte Carlo methods are interesting as a technical topic of research in themselves, as well as enjoying widespread practical use. In a diverse number of application areas Monte Carlo methods have enabled Bayesian inference over classes of statistical models which previously would have been infeasible. Despite this broad and sustained attention, it is often still far from clear how best to set up a Monte Carlo method for a given problem, how to diagnose if it is working well, and how to improve under-performing methods. The impact of these issues is even more pronounced with new emerging applications. This workshop is aimed equally at practitioners and core Monte Carlo researchers. For practitioners we hope to identify what properties of applications are important for selecting, running and checking a Monte Carlo algorithm. Monte Carlo methods are applied to a broad variety of problems. The workshop aims to identify and explore what properties of these disparate areas are important to think about when applying Monte Carlo methods.

Organizers

  • Ryan Prescott Adams, University of Toronto
  • Mark Girolami, University College London
  • Iain Murray, University of Edinburgh

Advisory Panel

  • Arnaud Doucet, University of British Columbia
  • Chris Holmes, University of Oxford
  • Radford M. Neal, University of Toronto
  • Carl Edward Rasmussen, University of Cambridge
  • Gareth O. Roberts, University of Warwick

The field of computational biology has seen dramatic growth over the past few years, both in terms of new available data, new scientific questions, and new challenges for learning and inference. In particular, biological data is often relationally structured and highly diverse, well-suited to approaches that combine multiple weak evidence from heterogeneous sources. These data may include sequenced genomes of a variety of organisms, gene expression data from multiple technologies, protein expression data, protein sequence and 3D structural data, protein interactions, gene ontology and pathway databases, genetic variation data (such as SNPs), and an enormous amount of textual data in the biological and medical literature. New types of scientific and clinical problems require the development of novel supervised and unsupervised learning methods that can use these growing resources.

The goal of this workshop is to present emerging problems and machine learning techniques in computational biology. We invited several speakers from the biology/bioinformatics community who will present current research problems in bioinformatics, and we invite contributed talks on novel learning approaches in computational biology. We encourage contributions describing either progress on new bioinformatics problems or work on established problems using methods that are substantially different from standard approaches. Kernel methods, graphical models, feature selection and other techniques applied to relevant bioinformatics problems would all be appropriate for the workshop.

Optimization is indispensable to many machine learning algorithms. What can we say beyond this obvious realization?

Previous talks at the OPT workshops have covered frameworks for convex programs (D. Bertsekas), the intersection of ML and optimization, especially in the area of SVM training (S. Wright), large-scale learning via stochastic gradient methods and its tradeoffs (L. Bottou, N. Srebro), exploitation of structured sparsity in optimization (Vandenberghe), and randomized methods for extremely large-scale convex optimization (A. Nemirovski), among others.

The ML community's interest in optimization continues to grow. Invited tutorials on optimization will be presented this year at ICML (N. Srebro) and NIPS (S. Wright). The traditional "point of contact" between ML and optimization - SVM - continues to be a driver of research on a number of fronts. Much interest has focused recently on stochastic gradient methods, which can be used in an online setting and in settings where data sets are extremely large and high accuracy is not required. Regularized logistic regression is another area that has produced a recent flurry of activity at the intersection of the two communities. Many aspects of stochastic gradient remain to be explored, for example, different algorithmic variants, customizing to the data set structure, convergence analysis, sampling techniques, software, choice of regularization and tradeoff parameters, parallelism. There also needs to be a better understanding of the limitations of these methods, and what can be done to accelerate them or to detect when to switch to alternative strategies. In the logistic regression setting, use of approximate second-order information has been shown to improve convergence, but many algorithmic issues remain. Detection of combined effects predictors (which lead to a huge increase in the number of variables), use of group regularizers, and dealing with the need to handle very large data sets in real time all present challenges.

We also do NOT ignore the not particularly large scale setting, where one has time to wield substantial computational resources. In this setting, high-accuracy solutions and deep understanding of the lessons contained in the data are needed. Examples valuable to MLers may be exploration of genetic and environmental data to identify risk factors for disease; or problems dealing with setups where the amount of observed data is not huge, but the mathematical model is complex.

Operational Details

  • one day long with morning and afternoon sessions;
  • three invited talks optimization and ML experts
  • discussion: this year we plan to bolster discussion by having an open problems session;
  • contributed talks;
  • an interactive poster session;

Program Committee

 

In spite of its central role and position between physics and biology, chemistry has remained in a somewhat backward state of informatics development compared to its two close relatives, primarily for historical reasons. Computers, open public databases, and large collaborative projects have become the pervasive hallmark of research in physics and biology, but are still at an early stage of development in chemistry. Recently, however, large repositories with millions of small molecules have become freely available, and equally large repositories of chemical reactions have also become available, albeit not freely. These data create a wealth of interesting informatics and machine learning challenges to efficiently store, search, and predict the physical, chemical, and biological properties of small molecules and reactions and chart ``chemical space'', with significant scientific and technological impacts.

Small organic molecules, in particular, with at most a few dozen atoms play a fundamental role in chemistry, biology, biotechnology, and pharmacology. They can be used, for instance, as combinatorial building blocks for chemical synthesis, as molecular probes for perturbing and analyzing biological systems in chemical genomics and systems biology, and for the screening, design, and discovery of new drugs and other useful compounds. Huge arrays of new small molecules can be produced in a relatively short time. Chemoinformatics methods must be able to cope with the inherently graphical, non-vectorial, nature of raw chemical information on small organic molecules and organic reactions, and the vast combinatorial nature of chemical space, containing over 1060 possible small organic molecules.

Recently described grand challenges for chemoinformatics include: (1) overcoming stalled drug discovery; (2) helping to develop green chemistry and address global warming; (3) understanding life from a chemical perspective; and (4) enabling the network of the world\'s chemical and biological information to be accessible and interpretable. This one day workshop will provide a forum to brainstorm about these issues, explore the role and contributions machine learning methods can make to chemistry and chemoinformatics, and hopefully foster new ideas and collaborations.