Physical and economic limitations have forced computer architecture towards parallelism and away from exponential frequency scaling. Meanwhile, increased access to ubiquitous sensing and the web has resulted in an explosion in the size of machine learning tasks. In order to benefit from current and future trends in processor technology we must discover, understand, and exploit the available parallelism in machine learning. This workshop will achieve four key goals:

  • Bring together people with varying approaches to parallelism in machine learning to identify tools, techniques, and algorithmic ideas which have lead to successful parallel learning.
  • Invite researchers from related fields, including parallel algorithms, computer architecture, scientific computing, and distributed systems, who will provide new perspectives to the NIPS community on these problems, and may also benefit from future collaborations with the NIPS audience.
  • Identify the next key challenges and opportunities to parallel learning.
  • Discuss large-scale applications, e.g., those with real time demands, that might benefit from parallel learning.

Prior NIPS workshops have focused on the topic of scaling machine learning, which remains an important developing area. We introduce a new perspective by focusing on how large-scale machine learning algorithms should be informed by future parallel architectures.

Topics of Interest

While we are interested in a wide range of topics associated with large-scale, parallel learning, the following list provides a flavor of some of the key topics:

  • Multicore / Cluster based Learning Techniques
  • Machine Learning on Alternative Hardware (GPUs, Cell Processors, FPGAs, iPhone, ...)
  • Distributed Learning
  • Learning results and techniques on Massive Datasets
  • Large Scale Kernel Methods
  • Fast Online Algorithms for Large Data Sets
  • Parallel Computing Tools and Libraries

Organizers

Now is the time to revisit some of the fundamental grammar/language learning tasks such as grammar acquisition, language acquisition, language change, and the general problem of automatically inferring generic representations of language structure in a data driven manner.

Though the underlying problems have been known to be computationally intractable for the standard representations of the Chomsky hierarchy, such as regular grammars and context free grammars, progress has been made by modifying or restricting these classes to make them more observable. Generalisations of distributional learning have shown promise in unsupervised learning of linguistic structure using tree based representations, or using non-parametric approaches to inference. More radically, significant advances in this domain have been made by switching to different representations such as the work in Clark, Eyrand & Habrard (2008) that addresses the issue of language acquisition, but has the potential to cross-fertilise a wide range of problems that require data driven representations of language. Such approaches are starting to make inroads into one of the fundamental problems of cognitive science: that of learning complex representations that encode meaning. This adds a further motivation for returning to this topic at this point.

Grammar induction was the subject of an intense study in the early days of Computational Learning Theory, with the theory of query learning largely developing out of this research. More recently the study of new methods of representing language and grammars through complex kernels and probabilistic modelling together with algorithms such as structured output learning has enabled machine learning methods to be applied successfully to a range of language related tasks from simple topic classification through parts of speech tagging to statistical machine translation. These methods typically rely on more fluid structures than those derived from formal grammars and yet are able to compete favourably with classical grammatical approaches that require significant input from domain experts, often in the form of annotated data.

Organisers

During the last decade, many areas of Bayesian machine learning have reached a high level of maturity. This has resulted in a variety of theoretically sound and efficient algorithms for learning and inference in the presence of uncertainty. However, in the context of control, robotics, and reinforcement learning, uncertainty has not yet been treated with comparable rigor despite its central role in risk-sensitive control, sensorimotor control, robust control, and cautious control. A consistent treatment of uncertainty is also essential when dealing with stochastic policies, incomplete state information, and exploration strategies.

A typical situation where uncertainty comes into play is when the exact state transition dynamics are unknown and only limited or no expert knowledge is available and/or affordable. One option is to learn a model from data. However, if the model is too far off, this approach can result in arbitrarily bad solutions. This model bias can be sidestepped by the use of flexible model-free methods. The disadvantage of model-free methods is that they do not generalize and
often make less efficient use of data. Therefore, they often need more trials than feasible to solve a problem on a real-world system. A probabilistic model could be used for efficient use of data while alleviating model bias by explicitly representing and incorporating uncertainty.

The use of probabilistic approaches requires (approximate) inference algorithms, where Bayesian machine learning can come into play. Although probabilistic modeling and inference conceptually fit
into this context, they are not widespread in robotics, control, and reinforcement learning. Hence, this workshop aims to bring researchers together to discuss the need, the theoretical properties, and the practical implications of probabilistic methods in control, robotics, and reinforcement learning.

One particular focus will be on probabilistic reinforcement learning approaches that profit recent developments in optimal control which show that the problem can be substantially simplified if certain structure is imposed. The simplifications include linearity of the (Hamilton-Jacobi) Bellman equation. The duality with Bayesian estimation allow for analytical computation of the optimal control laws and closed form expressions of the optimal value functions.

Organizers

  • Marc Peter Deisenroth
  • Bert Kappen
  • Emanuel Todorov
  • Duy Nguyen-Tuong
  • Carl Edward Rasmussen
  • Jan Peters

Clustering is one of the most widely used techniques for exploratory data analysis. In the past five decades, many clustering algorithms have been developed and applied to a wide range of practical problems. There has also been very exciting theoretical work, proving guarantees for algorithms and developing new frameworks for analysis.

Yet in many ways we are only beginning to understand some of the most basic issues in clustering. While there have been some remarkable successes, we believe more is possible. In particular, work addressing issues that are independent of any specific clustering algorithm, objective function, or specific data generative model, is still in its infancy.

In his famous Turing award lecture, Donald Knuth states about Computer Programming that: "It is clearly an art, but many feel that a science is possible and desirable''. In the case of clustering, we believe that an even better and deeper science than what we currently offer is possible and highly desirable.

Goals of the Workshop

This workshop aims at initiating a dialog between theoreticians and practitioners, aiming to bridge the theory-practice gap in this area. The workshop will be built along three main questions:

  1. FROM THEORY TO PRACTICE:
    Which abstract theoretical characterizations / properties / statements about clustering algorithms exist that can be helpful for practitioners and should be adopted in practice?
  2. FROM PRACTICE TO THEORY:
    What concrete questions would practitioners like to see addressed by theoreticians? Can we identify de-facto practices in clustering in need of theoretical grounding? Which obscure (but seemingly needed or useful) practices are in need of rationalization?
  3. FROM ART TO SCIENCE:
    In contrast to supervised learning, where there is general consensus on how to assess the quality of an algorithm, the frameworks for analyzing clustering are only beginning to be developed and clustering is still largely an art. How can we progress towards a deeper understanding of the space of clustering problems and objectives, including the introduction of falsifiable hypotheses and properly designed experimentation? How could one set up a clustering challenge to compare different clustering algorithms? What could be scientific standards to evaluate a clustering algorithm in a paper?

The workshop will also serve as a follow up meeting to the NIPS 2005 “Theoretical  Foundations of clustering” workshop,  a venue for the different research groups working on these issues to take stock, exchange view points and discuss the next challenges in this ambitious quest for theoretical foundations of clustering.

Organizers

  • Shai Ben-David is a CS professor at the University of Waterloo, Canada.
  • Avrim Blum is a professor of CS at Carnegie Mellon University.
  • Ulrike von Luxburg is a Senior Research Scientist at the Max Plank Institute in Tubingen, Germany.
  • Isabelle Guyon is an independent engineering consultant, working from California.
  • Reza Bosagh Zadeh is a graduate student at Carnegie Mellon University.
  • Margareta Ackerman is a graduate student at the University of Waterloo.
  • Robert C. Williamson is the Scientific Director of NICTA and a Professor in the Research School of Information Sciences and Engineering at the Australian National University.
Statistical topic models are a class of Bayesian latent variable models, originally developed for analyzing the semantic content of large document corpora. With the increasing availability of other large, heterogeneous data collections, topic models have been adapted to model data from fields as diverse as computer vision, finance, bioinformatics, cognitive science, music, and the social sciences. While the underlying models are often extremely similar, these communities use topic models in different ways in order to achieve different goals. This one-day workshop will bring together topic modeling researchers from multiple disciplines, providing an opportunity for attendees to meet, present their work and share ideas, as well as inform the wider NIPS community about current research in topic modeling. This workshop will address the following specific goals:
  • Identify and formalize open research areas
  • Propose, explore, and discuss new application areas
  • Discuss how best to facilitate transfer of research ideas between application domains
  • Direct future work and generate new application areas
  • Explore novel modeling approaches and collaborative research directions

Program Committee

  • Edo Airoldi
  • Hal Daumé
  • Tom Dietterich
  • Laura Dietz
  • Jacob Eisenstein
  • Tom Griffiths
  • John Lafferty
  • Li-Jia Li
  • Andrew McCallum
  • David Mimno
  • Dave Newman
  • Padhraic Smyth
  • Erik Sudderth
  • Yee Whye Teh
  • Chong Wang
  • Max Welling
  • Sinead Williamson
  • Frank Wood
  • Jerry Zhu

Organizers

 

 

The field of computational biology has seen dramatic growth over the past few years. A wide range of high-throughput technologies developed in the last decade now enable us to measure parts of a biological system at various resolutions—at the genome, epigenome, transcriptome, and proteome levels. These technologies are now being used to collect data for an ever-increasingly diverse set of problems, ranging from classical problems such as predicting differentially regulated genes between time points and predicting subcellular localization of RNA and proteins, to models that explore complex mechanistic hypotheses bridging the gap between genetics and disease, population genetics and transcriptional regulation. Fully realizing the scientific and clinical potential of these data requires developing novel supervised and unsupervised learning methods that are scalable, can accommodate heterogeneity, are robust to systematic noise and confounding factors, and provide mechanistic insights.

The goals of this workshop are to i) present emerging problems and innovative machine learning techniques in computational biology, and ii) generate discussion on how to best model the intricacies of biological data and synthesize and interpret results in light of the current work in the field. We will invite several rising leaders from the biology/bioinformatics community who will present current research problems in computational biology and lead these discussions based on their own research and experiences. We will also have the usual rigorous screening of contributed talks on novel learning approaches in computational biology. We encourage contributions describing either progress on new bioinformatics problems or work on established problems using methods that are substantially different from established alternatives. Kernel methods, graphical models, feature selection, non-parametric models and other techniques applied to relevant bioinformatics problems would all be appropriate for the workshop. We are particularly keen on considering contributions related to the prediction of functions from genotypes and that target data generated from novel technologies such as gene editing and single cell genomics, though we will consider all submissions that highlight applications of machine learning into computational biology. The targeted audience are people with interest in learning and applications to relevant problems from the life sciences, including NIPS participants without any existing research link to computational biology.

Organizers

Program Committee

  • Alexis Battle, JHU
  • Michael A. Beer, JHU
  • Andreas Beyer, TU Dresden
  • Karsten Borgwardt, ETH Zurich
  • Gal Chechik, Gonda brain center, Bar Ilan University
  • Chao Cheng, Dartmouth Medical School
  • Manfred Claassen, ETH Zurich
  • Florence d'Alche-Buc, Université d'Evry-Val d'Essonne, Genopole
  • Saso Dzeroski, Jozef Stefan Institute
  • Jason Ernst , UCLA
  • Pierre Geurts, University of Liège
  • James Hensman, The University of Sheffield
  • Antti Honkela, University of Helsinki
  • Laurent Jacob, Mines Paris Tech
  • Samuel Kaski, Aalto University
  • Seyoung Kim, CMU
  • David Knowles, Stanford
  • Anshul Kundaje, Stanford
  • Neil Lawrence, University of Sheffield
  • Su-In Lee, University of Washington
  • Shen Li, Mount Sinai, New York
  • Michal Linial, Hebrew University
  • John Marioni, EMBL-EBI
  • Martin Renqiang Min, NEC Labs America
  • Yves Moreau, KU Leuven
  • Alan Moses, University of Toronto
  • Bernard Ng, UBC
  • William Noble, University of Washington
  • Uwe Ohler, MDC Berlin & Humboldt University
  • Yongjin Park, MIT
  • Leopold Parts, University of Toronto
  • Dana Pe'er, Columbia University
  • Nico Pfeifer, Max Planck Institute
  • Magnus Rattray, University of Manchester
  • Simon Rogers, University of Glasgow
  • Juho Rousu, Aalto University
  • Guido Sanguinetti, University of Edinburgh
  • Alexander Schliep, Rutgers University
  • Jean-Philippe Vert, Ecole des Mines de Paris
  • Jinbo Xu, Toyota Technological Institute of Chicago
  • Chun (Jimmie) Ye , UCSF

Undirected graphical models provide a powerful framework for representing dependency structure between random variables. Learning the parameters of undirected models plays a crucial role in solving key problems in many machine learning applications, including natural language processing, visual object recognition, speech perception, information retrieval, computational biology, and many others.

Learning in undirected graphical models of large treewidth is difficult because of the hard inference problem induced by the partition function for maximum likelihood learning, or by finding the MAP assignment for margin-based loss functions. Over the last decade, there has been considerable progress in developing algorithms for approximating the partition function and MAP assignment, both via variational approaches (e.g., belief propagation) and sampling algorithms (e.g., MCMC). More recently, researchers have begun to apply these methods to learning large, densely-connected undirected graphical models that may contain millions of parameters. A notable example is the learning of Deep Belief Networks and Deep Boltzmann Machines, that employ MCMC strategy to greedily learn deep hierarchical models.

The goal of this workshop is to assess the current state of the field and explore new directions in both theoretical foundations and empirical applications. In particular, we shall be interested in discussing the following topics:

  • State of the field: What are the existing methods and what is the relationship between them? Which problems can be solved using existing algorithms and which cannot?
  • The use of approximate inference in learning: There are many algorithms for approximate inference. In principle all of these can be "plugged-into" learning algorithms. What are the relative merits of using one approximation vs. the other (e.g., MCMC approximation vs. a variational one). Are there effective combined strategies?
  • Learning with latent variables: Graphical models with latent (or hidden) variables often possess more expressive power than models with only observed variables. However, introducing hidden variables makes learning far more difficult. Can we develop better optimization and approximation techniques that would allow us to learn parameters in such models more efficiently?
  • Learning in models with deep architectures: Recently, there has been notable progress in learning deep probabilistic models, including Deep Belief Networks and Deep Boltzmann Machines, that contain many layers of hidden variables and millions of parameters. The success of these models heavily relies on the greedy layer-by-layer unsupervised learning of a densely-connected undirected model called a Restricted Boltzmann Machine (RBM). Can we develop efficient and more accurate learning algorithms for RBM's and deep multilayer generative models? How can learning be extended to semi-supervised setting and be made more robust to dealing with highly ambiguous or missing inputs? What sort of theoretical guarantees can be obtained for such greedy learning schemes?
  • Scalability and success in real-world applications: How well do existing approximate learning algorithms scale to large-scale problems including problems in computer vision, bioinformatics, natural language processing, information retrieval? How well do these algorithms perform when applied to modeling high-dimensional real-world distributions (e.g. the distribution of natural images)?
  • Theoretical Foundations: What are the theoretical guarantees of the learning algorithms (e.g. accuracy using the learned parameters with respect to best possible, asymptotic convergence guarantees such as almost sure convergence to the maximum likelihood estimator). What are the tradeoffs between running time and accuracy?
  • Loss functions: In the supervised learning setting, two popular loss functions are log-loss (e.g., in conditional random fields) and margin-based-loss (e.g., in maximum margin Markov networks). In intractable models these approaches result in rather different approximation schemes (since the former requires partition function estimation, whereas the latter only requires MAP estimates). What can be said about the differences between these schemes? When is one model more appropriate than the other? Can margin-based models be applied in the unsupervised case?
  • Structure vs. accuracy: Which graph structures are more amenable to approximations and why? How can structure learning be combined with approximate learning to yield models that are both descriptive and learnable with good accuracy?

Organizers

Over the past decade, brain connectivity has become a central theme in the neuroimaging community. At the same time, causal inference has recently emerged as a major research topic in machine learning. Even though the two research questions are closely related, interactions between the neuroimaging and machine-learning communities have been limited.

The aim of this workshop is to initiate productive interactions between neuroimaging and machine learning by introducing the workshop audience to the different concepts of connectivity/causal inference employed in each of the communities. Special emphasis is placed on discussing commonalities as well as distinctions between various approaches in the context of neuroimaging. Due to the increasing relevance of brain connectivity for analyzing mental states, we also highly welcome contributions discussing applications of brain connectivity measures to real-world problems such as brain-computer interfacing or mental state monitoring.

Topics

We solicit contributions on new approaches to connectivity and/or causal inference for neuroimaging data as well as on applications of connectivity inference to real-world problems. Contributions might address, but are not limited to, the following topics:

  • Effective connectivity & causal inference
    • Dynamic causal modelling
    • Granger causality
    • Structural equation models
    • Causal Bayesian networks
    • Non-Gaussian linear causal models
    • Causal additive noise models
  • Functional connectivity
    • Canonical correlation analysis
    • Phase-locking
    • Imaginary coherence
    • Independent component analysis
  • Applications of brain connectivity to real-world problems
    • Brain-computer interfaces
    • Mental state monitoring

Organization committee

Program committee

Learning from multiple sources denotes the problem of jointly learning from a set of (partially) related learning problems / views / tasks. This general concept underlies several subfields receiving increasing interest from the machine learning community, which differ in terms of the assumptions made about the dependency structure between learning problems. In particular, the concept includes topics such as data fusion, transfer learning, multitask learning, multiview learning, and learning under covariate shift. Several approaches for inferring and exploiting complex relationships between data sources have been presented, including both generative and discriminative approaches.

The workshop will provide a unified forum for cutting edge research on learning from multiple sources; the workshop will examine the general concept, theory and methods, and will also examine robotics as a natural application domain for learning from multiple sources. The workshop will address methodological challenges in the different subtopics and further interaction between them. The intended audience is researchers working in fields of multi-modal learning, data fusion, and robotics.

The workshop includes a morning session focused on the robotics application, and an afternoon session focused on theory/methods.

Organisers

Programme Committee

  • Cedric Archambeau - Xerox Research.
  • Andreas Argyriou - Toyota Technological Institute.
  • Claudio Gentile - Università dell'Insubria.
  • Mark Girolami - University of Glasgow.
  • Samuel Kaski - Helsinki University of Technology.
  • Arto Klami - Helsinki University of Technology.
  • John Shawe-Taylor - University College London.
  • Giorgio Valentini - Università degli Studi di Milan.

 

Accounting for dependencies between outputs has important applications in several areas. In sensor networks, for example, missing signals from temporal failing sensors may be predicted due to correlations with signals acquired from other sensors. In geo-statistics, prediction of the concentration of heavy pollutant metals (for example, Copper concentration), that require expensive procedures to be measured, can be done using inexpensive and oversampled variables (for example, pH data).

Multi-task learning is a general learning framework in which it is assumed that learning multiple tasks simultaneously leads to better modeling results and performance that learning the same tasks individually. Exploiting correlations and dependencies among tasks, it becomes possible to handle common practical situations such as missing data or to increase the amount of potential data when only few amount of data per task is available.

In this workshop we will consider the use of kernel methods for multiple outputs and multi-task learning. The aim of the workshop is to bring together Bayesian and frequentist researchers to establish common ground and shared goals.

Multi-task learning is a general learning framework in which it is assumed that learning multiple tasks simultaneously leads to better modeling results and performance that learning the same tasks individually. Exploiting correlations and dependencies among tasks, it becomes possible to handle common practical situations such as missing data or to increase the amount of potential data when only few amount of data per task is available.

Motivation

In the last few years there has been an increasing amount of work on Multi-task Learning. Hierarchical Bayesian approaches and neural networks have been proposed. More recently, the Gaussian Processes framework has been considered, where the correlations among tasks can be captured by appropriate choices of covariance functions. Many of these choices have been inspired by the geo-statistics literature, in which a similar area is known as cokriging. In the frequentist perspective, regularization theory has provided a natural framework to deal with multi-task problems: assumptions on the relation of the different tasks translate into the design of suitable regularizers. Despite the common traits of the proposed approaches, so far different communities have worked independently. For example it is natural to ask whether the proposed choices of the covariance function can be interpreted from a regularization perspective. Or, in turn, if each regularizer induces a specific form of the covariance/kernel function. By bringing together the latest advances from both communities, we aim at establishing what is the state of the art and the possible future challenges in the context of multiple-task learning.

Organisers