There has been growing interest over the last few years in learning grammars from natural language text (and structured or semi-structured text). The family of techniques enabling such learning is usually called "grammatical inference" or "grammar induction".

The field of grammatical inference is often subdivided into formal grammatical inference, where researchers aim to proof efficient learnability of classes of grammars, and empirical grammatical inference, where the aim is to learn structure from data. In this case the existence of an underlying grammar is just regarded as a hypothesis and what is sought is to better describe the language through some automatically learned rules.

Both formal and empirical grammatical inference have been linked with (computational) linguistics. Formal learnability of grammars has been used in discussions on how people learn language. Some people mention proofs of (non-)learnability of certain classes of grammars as arguments in the empiricist/nativist discussion. On the more practical side, empirical systems that learn grammars have been applied to natural language. Instead of proving whether classes of grammars can be learnt, the aim here is to provide practical learning systems that automatically introduce structure in language. Example fields where initial research has been done are syntactic parsing, morphological analysis of words, and bilingual modeling (or machine translation).

This workshop at EACL 2009 aims to explore the state-of-the-art in these topics. In particular, we aim at bringing formal and empirical grammatical inference researchers closer together with researchers in the field of computational linguistics.

Organisation

  • Menno van Zaanen, Tilburg University, the Netherlands (co-chair)
  • Colin de la Higuera, Université de Saint-Etienne, France (co-chair)

Programme Committee

  • Pieter Adriaans, University of Amsterdam, the Netherlands
  • Srinivas Bangalore, AT&T Labs-Research, USA
  • Leonor Becerra-Bonache, Yale University, USA
  • Rens Bod, University of Amsterdam, The Netherlands
  • Antal van den Bosch, Tilburg University, The Netherlands
  • Alexander Clark, Royal Holloway, University of London, UK
  • Walter Daelemans, University of Antwerp, Belgium
  • Shimon Edelman, Cornell University, USA
  • Jeroen Geertzen, University of Cambridge, UK
  • Jeffrey Heinz, University of Delaware, USA
  • Colin de la Higuera, Université de Saint-Etienne, France (co-chair)
  • Alfons Juan, Universidad Politecnica de Valencia, Spain
  • Frantisek Mraz, Charles University, Czech Republic
  • Georgios Petasis, National Centre for Scientific Research (NCSR) "Demokritos", Greece
  • Khalil Sima'an, University of Amsterdam, The Netherlands
    Richard Sproat, University of Illinois at Urbana-Champaign, USA
  • Menno van Zaanen, Tilburg University, the Netherlands (co-chair)
  • Willem Zuidema, University of Amsterdam, The Netherlands

 

 

The Summer Schools in Logic and Learning bring together two annual summer schools in the area of logic and machine learning: the Logic Summer School and the Machine Learning Summer School. The summer schools will be hosted by the Computer Sciences Laboratory in the Research School of Information Sciences and Engineering at The Australian National University, from the 26 January to 6 February 2009.

The Logic courses will consist of short courses on aspects of pure and applied logic. The Machine Learning courses will consist of short courses on the theory and practice of machine learning, which combine deep theory from areas as diverse as Statistics, Mathematics, Engineering, and Information Technology with many practical and relevant real life applications. The courses will be taught by experts from Australia and overseas. The summer schools this year will also include a special track on Artificial Intelligence (AI), which will feature courses on aspects of both logic and machine learning.

The Logic courses will be held at the Psychology G6 lecture theatre, and the Machine Learning and Artificial Intelligence courses will be held at the Physics lecture theatre at the ANU. In addition to the scheduled courses, time will be set aside each day for practical classes, discussions and software demonstrations

We believe that the wide-spread adoption of open source software policies will have a tremendous impact on the field of machine learning. The goal of this workshop is to further support the current developments in this area and give new impulses to it. Following the success of the inaugural NIPS-MLOSS workshop held at NIPS 2006, the Journal of Machine Learning Research (JMLR) has started a new track for machine learning open source software initiated by the workshop's organizers. Many prominent machine learning researchers have co-authored a position paper advocating the need for open source software in machine learning. Furthermore, the workshop's organizers have set up a community website mloss.org where people can register their software projects, rate existing projects and initiate discussions about projects and related topics. This website currently lists 123 such projects including many prominent projects in the area of machine learning.

The main goal of this workshop is to bring the main practitioners in the area of machine learning open source software together in order to initiate processes which will help to further improve the development of this area. In particular, we have to move beyond a mere collection of more or less unrelated software projects and provide a common foundation to stimulate cooperation and interoperability between different projects. An important step in this direction will be a common data exchange format such that different methods can exchange their results more easily.

This year's workshop sessions will consist of three parts.

We have two invited speakers: John Eaton, the lead developer of Octave and John Hunter, the lead developer of matplotlib.
Researchers are invited to submit their open source project to present it at the workshop.

In discussion sessions, important questions regarding the future development of this area will be discussed. In particular, we will discuss what makes a good machine learning software project and how to improve interoperability between programs. In addition, the question of how to deal with data sets and reproducibility will also be addressed.

Taking advantage of the large number of key research groups which attend NIPS, decisions and agreements taken at the workshop will have the potential to significantly impact the future of machine learning software.

 

Machine learning has traditionally been focused on prediction. Given observations that have been generated by an unknown stochastic dependency, the goal is to infer a law that will be able to correctly predict future observations generated by the same dependency. Statistics, in contrast, has traditionally focused on "data modeling'', i.e., on the estimation of a probability law that has generated the data.

During recent years, the boundaries between the two disciplines have become blurred and both communities have adopted methods from the other, however, it is probably fair to say that neither of them has yet fully embraced the field of causal modeling, i.e., the detection of causal structure underlying the data. This has probably different reasons. Many statisticians would still shun away from developing and discussing formal methods for inferring causal structure, other than through experimentation, as they would traditionally think of such questions as being outside statistical science and internal to any science where statistics is applied. Researchers in machine learning, on the other hand, have too long focused on a limited set of problems, shying away from non i.i.d. data and problems of distribution shifts between training and test set, neglecting the mechanisms underlying the generation of the data, including issues like stochastic dependence, and all too often neglecting statistical tools like hypothesis testing, which are crucial to current methods for causal discovery.

Since the Eighties there has been a community of researchers, mostly from statistics and philosophy, who in spite of the pertaining views described above have developed methods aiming at inferring causal relationships from observational data, building on the pioneering work of Glymour, Scheines, Spirtes, and Pearl. While this community has remained relatively small, it has recently been complemented by a number of researchers from machine learning. This introduces a new viewpoint to the issues at hand, as well as a new set of tools, including algorithms of causal feature selection, nonlinear methods for testing statistical dependencies using reproducing kernel Hilbert spaces, and methods derived from independent component analysis.

Presently, there is a profusion of algorithms being proposed, mostly evaluated on toy problems. One of the main challenges in causal learning consists in developing strategies for an objective evaluation. This includes, for instance, methods how to acquire large representative data sets with known ground truth. This, in turn, raises the question to what extent the regularities observed in these data sets also apply to the relevant data sets where the causal structure is unknown because data sets with known ground truth may not be representative.

Kernel methods are widely used to address a variety of learning tasks including classification, regression, ranking, clustering, and dimensionality reduction. The appropriate choice of a kernel is often left to the user. But, poor selections may lead to sub-optimal performance. Furthermore, searching for an appropriate kernel manually may be a time-consuming and imperfect art. Instead, the kernel selection process can be included as part of the overall learning problem. In this way, better performance guarantees can be given and the kernel selection process can be made automatic. In this workshop, we will be concerned with using sampled data to select or learn a kernel function or kernel matrix appropriate for the specific task at hand. We will discuss several scenarios, including classification, regression, and ranking, where the use of kernels is ubiquitous, and different settings including inductive, transductive, or semi-supervised learning.

We also invite discussions on the closely related fields of features selection and extraction, and are interested in exploring further the connection with these topics. The goal is to cover all questions related to the problem of learning kernels: different problem formulations, the computational efficiency and accuracy of the algorithms that address these problems and their different strengths and weaknesses, and the theoretical guarantees provided. What is the computational complexity? Does it work in practice? The formulation of some other learning problems, e.g. multi-task learning problems, is often very similar.

These problems and their solutions will also be discussed in this workshop.

Organizers

Topics

  • Relations between the luckiness framework, compatibility functions and empirically defined regularization strategies in general.
  • Luckiness and compatibility can be seen as defining a prior in terms of the (unknown but fixed) distribution generating the data. To what extent can this approach be generalised while still ensuring effective learning?
  • Models of prior knowledge that capture both complexity and distribution dependence for powerful learning.
  • Theoretical analysis of the use of additional (empirical) side information in the form of unlabeled data or data labeled by related problems
  • Examples of proper or natural luckiness or compatibility functions in practical learning tasks. How could, for example, luckiness be defined in the context of collaborative filtering?
  • The effect of (empirical) preprocessing of the data not involving the labels as for example in PCA, other data-dependent transformations or cleaning, as well as using label information as for example in PLS or in feature selection and construction based on the training sample.
  • Empirically defined theoretical measures such as Rademacher complexity or sparsity coefficients and their relevance for analyzing empirical hypothesis spaces.

This workshop is intended for researchers interested in the theoretical underpinnings of learning algorithms which do not comply to the standard learning theoretical assumptions.

Workshop Chairs

  • Maria-Florina Balcan
  • Shai Ben-David
  • Avrim Blum
  • Kristiaan Pelckmans
  • John Shawe-Taylor

Data sets with a very large number of explanatory variables are becoming more and more common as features of both applications and theoretical investigations. In economical applications for instance, the revealed preference of market players is observed, and the analyst tries to understand them by a complex model by which the players' behavior can be understood as an indirect observation.  State-of-the art statistical approaches often formulate such models as inverse problems, but the corresponding methods can suffer of the curse of dimensionality: when there are "too many" possible explanatory variables, additional regularization is needed. Inverse problem theory already offers sophisticated regularization methods for smooth models, but is just beginning to integrate sparsity concepts. For high-dimensional linear models, sparsity regularizations have proved to be a convincing  way to tackle the issue both in theory and practice, but there remains  a  vast ground to be explored. Paralleling the statistics community are also recent advances in machine learning methodology and statistical learning theory, where the themes of sparsity and inverse problems have been intertwined.

The workshop will focus on the different ways to attack a same question:  there are many potential models to choose from, but each of them is relatively simple - each model is parameterized by many variables, most of them are zero. Yet, the choice of the right model or regularization parameter is crucial to obtain stable and reliable results.

 

While the machine learning community has primarily focused on analysing the output of a single data source, there has been relatively few attempts to develop a general framework, or heuristics, for analysing several data sources in terms of a shared dependency structure. Learning from multiple data sources (or alternatively, the data fusion problem) is a timely research area. Due to the increasing availability and sophistication of data recording techniques and advances in data analysis algorithms, there exists many scenarios in which it is necessary to model multiple, related data sources, i.e. in fields such as bioinformatics, multi-modal signal processing, information retrieval, sensor networks etc.

The open question is to find approaches to analyse data which consists of more than one set of observations (or view) of the same phenomenon. In general, existing methods use a discriminative approach, where a set of features for each data set is found in order to explicitly optimise some dependency criterion. However, a discriminative approach may result in an ad hoc algorithm, require regularisation to ensure erroneous shared features are not discovered, and it is difficult to incorporate prior knowledge about the shared information. A possible solution is to overcome these problems is a generative probabilistic approach, which models each data stream as a sum of a shared component and a private component that models the within-set variation.

In practice, related data sources may exhibit complex co-variation (for instance, audio and visual streams related to the same video) and therefore it is necessary to develop models that impose structured variation within and between data sources, rather than assuming a so-called 'flat' data structure. Additional methodological challenges include determining what is the 'useful' information to extract from the multiple data sources, and building models for predicting one data source given the others. Finally, as well as learning from multiple data sources in an unsupervised manner, there is the closely related problem of multitask learning, or transfer learning where a task is learned from other related tasks.

 

The workshop will be held in Arhus, Denmark 17 - 19 September 2008 in conjunction with CLEF 2008 (Cross-Language Evaluation Forum). The actual Morpho Challenge part lasts only one day (Sep 17), but the same registration allows you to also participate in the other CLEF sessions that follow immediately. Please note that you are not obliged to also attend ECDL but can register for the workshop only, if you prefer.

The aim of the workshop is to present and discuss recent advances in machine learning approaches to text and natural language processing that capitalize on rich prior knowledge models in these domains.

Topics

The workshop aims at presenting a diversity of viewpoints on prior knowledge for language and text processing:

  • Prior knowledge for language modeling and parsing
  • Topic modeling for document analysis and retrieval
  • Parametric and non-parametric Bayesian models in NLP
  • Graphical models embodying structural knowledge of texts
  • Complex features/kernels that incorporate linguistic knowledge; kernels built from generative models
  • Limitations of purely data-driven learning techniques for text and language applications; performance gains due to incorporation of prior knowledge
  • Typology of different forms of prior knowledge for NLP (knowledge embodied in generative Bayesian models, in MDL models, in ILP/logical models, in linguistic features, in representational frameworks, in grammatical rules…)
  • Formal principles for combining rule-based and data-based approaches to NLP

Organizers

Program committee

  • Guillaume Bouchard, Xerox Research Center Europe
  • Nicola Cancedda, Xerox Research Center Europe
  • Hal Daumé III, University of Utah
  • Marc Dymetman, Xerox Research Center Europe
  • Tom Griffiths, Stanford University
  • Peter Grünwald, Centrum voor Wiskunde en Informatica
  • Kevin Knight, University of Southern California
  • Mark Johnson, Brown University
  • Yee Whye Teh, University College London