Ensembles of supervised learning machines and, in particular, ensembles of classifiers have been established as one of the main research topics in machine learning. Methods for combining unsupervised clusterings have been recently proposed to improve the reliability of clustering algorithms and to assess the validity of discovered clusters. Statistical, algorithmic, representational, computational and practical reasons can explain the success of ensemble methods.

Nevertheless, several problems remain open: for instance in many cases the theoretical reasons of the practical success of several widely used ensemble methods is unclear; the relationships between the diversity and accuracy of base classifiers forming an ensemble and the impact of these characteristics on the effectiveness and the performances of ensemble methods is a controversial question among machine learning researchers; the search of the "best" set of base classifiers or the "best" set of combination methods with respect to the characteristics and the distribution of the data is an open and interesting research line.

Though ensemble methods are subject to intensive research, there are also other open questions, related to real-world applications of such methods. Moreover, innovative applications in the field of unsupervised learning have been recently proposed.

This workshop intends to provide a forum for researchers in the field of Machine Learning and Data Mining to discuss the above and other related topics regarding ensemble methods and their applications.

Programme Committee

  • Nicolo' Cesa-Bianchi, University of Milano, Italy
  • Carlotta Domeniconi, George Mason University, USA
  • Robert Duin, Delft University of Technology, the Netherlands
  • Mark Embrechts, Rensselaer Polytechnic Institute, USA
  • Ana Fred, Technical University of Lisboa, Portugal
  • Joao Gama, University of Porto, Portugal
  • Giorgio Giacinto, University of Cagliari, Italy
  • Larry Hall, University of South Florida, USA
  • Ludmila Kuncheva, University of Wales, UK
  • Francesco Masulli, University of Genova, Italy
  • Petia Radeva, Autonomous University of Barcelona, Spain
  • Juan Jose' Rodriguez, University of Burgos, Spain
  • Fabio Roli, University of Cagliari, Italy
  • Paolo Rosso, Polytechnic University Valencia, Spain
  • Carlo Sansone, Federico II University of Napoli, Italy
  • Jose' Salvador Sanchez, University Jaume I, Spain
  • Grigorios Tsoumakas, Aristotle University of Thessaloniki, Greece
  • Jordi Vitria', Autonomous University of Barcelona, Spain
  • Ioannis Vlahavas, Aristotle University of Thessaloniki, Greece
  • Terry Windeatt, University of Surrey, UK

Molecular biology and all the biomedical sciences are undergoing a true revolution as a result of the emergence and growing impact of a series of new disciplines/tools sharing the "-omics" suffix in their name. These include in particular genomics, transcriptomics, proteomics and metabolomics, devoted respectively to the examination of the entire systems of genes, transcripts, proteins and metabolites present in a given cell or tissue type.

The availability of these new, highly effective tools for biological exploration is dramatically changing the way one performs research in at least two respects. First, the amount of available experimental data is not a limiting factor any more; on the contrary, there is a plethora of it. Given the research question, the challenge has shifted towards identifying the relevant pieces of information and making sense out of it (a "data mining" issue). Second, rather than focus on components in isolation, we can now try to understand how biological systems behave as a result of the integration and interaction between the individual components that one can now monitor simultaneously (so called "systems biology").

Taking advantage of this wealth of "genomic" information has become a conditio sine qua non for whoever ambitions to remain competitive in molecular biology and in the biomedical sciences in general. Machine learning naturally appears as one of the main drivers of progress in this context, where most of the targets of interest deal with complex structured objects: sequences, 2D and 3D structures or interaction networks. At the same time bioinformatics and systems biology have already induced significant new developments of general interest in machine learning, for example in the context of learning with structured data, graph inference, semi-supervised learning, system identification, and novel combinations of optimization and learning algorithms.

Molecular biology and all the biomedical sciences are undergoing a true revolution as a result of the emergence and growing impact of a series of new disciplines/tools sharing the "-omics" suffix in their name. These include in particular genomics, transcriptomics, proteomics and metabolomics, devoted respectively to the examination of the entire systems of genes, transcripts, proteins and metabolites present in a given cell or tissue type.

Scientific Program Committee

  • Nigel Burroughs (University of Warwick, UK)
  • Theo Damoulas (Cornell University, USA)
  • Werner Dubitzky (University of Ulster, UK)
  • Sašo Džeroski (Jožef Stefan Institute, Slovenia)
  • Pierre Geurts (University of Liège, Belgium)
  • Dirk Husmeier (Biomathematics & Statistics Scotland, UK)
  • Samuel Kaski (Helsinki University of Technology, Finland)
  • Ross King (Aberystwyth University, UK)
  • Elena Marchiori (Vrije Universiteit Amsterdam, The Netherlands)
  • Sach Mukherjee (University of Warwick, UK)
  • Mahesan Niranjan (University of Southampton, UK)
  • John Pinney (Imperial College London , UK)
    Magnus Rattray (University of Manchester, UK)
  • Simon Rogers (University of Glasgow, UK)
  • Juho Rousu (University of Helsinki, Finland)
  • Céline Rouveirol (University of Paris XIII, France)
  • Yvan Saeys (University of Gent, Belgium)
  • Guido Sanguinetti (University of Sheffield/University of Edinburgh, UK)
  • Ljupco Todorovski (University of Ljubljana, Slovenia)
  • Koji Tsuda (Max Planck Institute, Tuebingen)
  • Jean-Philippe Vert (Ecole des Mines, France)
  • Jean-Daniel Zucker (University of Paris XIII, France)
  • Blaz Zupan (University of Ljubljana, Slovenia)

The European Workshop on Probabilistic Graphical Models (PGM) is a biennial workshop that brings together researchers interested in all aspects of graphical models for probabilistic reasoning, decision making, and learning. PGM 2010 is the fifth edition of the workshop.

Programme co-chairs

Morpho Challenge 2010 Workshop will be held 2-3 September 2010 at Aalto University School of Science and Technology in Espoo, Finland. The workshop will take place at meeting room A346 in the Computer Science building. (Journey Planner). The building has also a cafeteria which serves inexpensive lunch and cafe.

The twentieth workshop in the series of workshops sponsored by IEEE Signal Processing Society will present the most recent and exciting contributions in machine learning for signal processing through keynote talks as well as special and regular single-track sessions. Papers are solicited that cover various aspects of machine learning for signal processing, as outlined in the following.

 

The next joint Statistical Pattern Recognition and Structural and Syntactic Pattern Recognition Workshops (organised by TC1 and TC2 of the International Association for Pattern Recognition, (IAPR) will be held at Cesme Altin Yunus Hotel, Cesme, Turkey prior to ICPR 2010 (which itself will be held in Istanbul). The joint workshops aim at promoting interaction and collaboration among researchers working in areas covered by TC1 and TC2. We are also keen to attract participants working in fields that make use of statistical, structural or syntactic pattern recognition techniques (e.g. image processing, computer vision, bioinformatics, chemo-informatics, machine learning, document analysis, etc.). Those working in areas which can make methodological contributions to the field, e.g. methematicians, statisticians, physicists etc, are also very welcome.

The workshop will be held in Cesme, which is a seaside resort on the Aegean coast of Turkey. There area has many interesting attractions including excellent beaches, interesting fishing villages, and nearby archaeological remains and historical sites. These include Cesme castle and the remains of the ancient Greek city of Erythrae. Cesme can be reached by bus from the airport at Izmir, which has good flight connections to Istanbul.

 

  • Andrea Torsello (Italy) SSPR
  • Bai Xiao (UK) SSPR
  • Antonio Robles-Kelly (AUS) SSPR
  • Marcello Pelillo (Italy) SSPR
  • Anand Rangarajan (US) SPR
  • Gavin Brown (UK) SPR
  • Oleg Okun (Sweden) SPR
  • Giorgio Valentini (Italy) SPR
  • Marco Loog (NL) SPR
  • Michael Haindl (Czech) SPR
  • Larry Hall (USA) SPR
  • Nikunj Oza (USA) SPR
  • Konstantinos Sirlantzis (UK) SPR
  • Gady Agam (USA) SSPR
  • Mayer Aladjem (Israel) SPR
  • Juan Andrade Cetto (Spain) SSPR
  • Fransesc J. Ferri (Spain) SPR
  • Georgy Gimelfarb (New Zeland) SSPR
  • Colin de La Higuera (France) SSPR
  • Tin Kam Ho (USA) SPR
  • Jose Manuel Inesta (Spain) SSPR
  • Francois Jacquenet (France) SSPR
  • jiaoyi Jiang (Germany) SSPR
  • Jean-Michel Jolion (France) SSPR
  • Mineichi Kudo (Japan) SPR
  • Walter G. Kropatsch (Austria) SSPR
  • Ventzeslav Valev (Bulgaria)
  • Elzbieta Pekalska (Poland) SPR
  • Philip Jackson (UK) SPR
  • Chih_Jen Lin (Taiwan) SPR
  • Jana Novovicova (Czech Republic) SPR
  • John Oommen (Canada) SSPR
  • Sarunas Raudys (Lithuania) SPR
  • Carlo Sansone (Italy) SPR
  • Francesco Tortorella (Italy) SPR
  • Changshui Zhang (China) SSPR
  • Ana Fred (Portugal) SSPR
  • Venu Govindaraju (USA) SPR
  • Adam Krzyzak (Canada) SPR
  • Longin J. Latecki (USA) SSPR
  • Punam Kumar Saha (USA) SPR
  • Zhi Hua Zhou (China) SPR
  • Hirobumi Nishida (Japan) SSPR
  • Alberto Sanfeliu (Spain) SSPR
  • Luc Brun (France) SPR
  • Horst Bunke (Sw) SSPR
  • Sudeep Sarkar (USA) SPR
  • Jairo Rocha (Spain) SPR
  • Francesc Serratosa (Spain) SSPR
  • Sargur Srihari (USA) SPR
  • Mario Vento (Italy) SSPR
  • Sergios Theodoridis (Greece) SPR
  • Terry Caelli (Aus) SSPR
  • David Windridge (UK) SPR
  • James Kwok (Hong Kong) SPR
  • Robert Duin (Netherlands) SPR
  • Tibério Caetano (Australia) SSPR
  • Wenwu Wang (UK) SPR

 

We believe that the wide-spread adoption of open source software policies will have a tremendous impact on the field of machine learning. The goal of this workshop is to further support the current developments in this area and give new impulses to it. Following the success of the inaugural NIPS-MLOSS workshop held at NIPS 2006, the Journal of Machine Learning Research (JMLR) has started a new track for machine learning open source software initiated by the workshop's organizers. Many prominent machine learning researchers have co-authored a position paper advocating the need for open source software in machine learning. To date 11 machine learning open source software projects have been published in JMLR. Furthermore, the workshop's organizers have set up a community website mloss.org where people can register their software projects, rate existing projects and initiate discussions about projects and related topics. This website currently lists 221 such projects including many prominent projects in the area of machine learning.

The main goal of this workshop is to bring the main practitioners in the area of machine learning open source software together in order to initiate processes which will help to further improve the development of this area. In particular, we have to move beyond a mere collection of more or less unrelated software projects and provide a common foundation to stimulate cooperation and interoperability between different projects. An important step in this direction will be a common data exchange format such that different methods can exchange their results more easily.

This year's workshop sessions will consist of two parts.

  • We have two invited speakers: Gary Bradski and Victoria Stodden.
  • Researchers are invited to submit their open source project to present it at the workshop.
  • In discussion sessions, important questions regarding the future development of this area will be discussed. In particular, we will discuss what makes a good machine learning software project and how to improve interoperability between programs. In addition, the question of how to deal with data sets and reproducibility will also be addressed.

Taking advantage of the large number of key research groups which attend ICML, decisions and agreements taken at the workshop will have the potential to significantly impact the future of machine learning software.

Program Committee

  • Jason Weston (Google Research, NY, USA)
  • Leon Bottou (NEC Princeton, USA)
  • Tom Fawcett (Stanford Computational Learning Laboratory, USA)
  • Sebastian Nowozin (Microsoft Research, UK)
  • Konrad Rieck (Technische Universität Berlin, Germany)
  • Lieven Vandenberghe (University of California LA, USA)
  • Joachim Dahl (Aalborg University, Denmark)
  • Torsten Hothorn (Ludwig Maximilians University, Munich, Germany)
  • Asa Ben-Hur (Colorado State University, USA)
  • Klaus-Robert Mueller (Fraunhofer Institute First, Germany)
  • Geoff Holmes (University of Waikato, New Zealand)
  • Peter Reutemann (University of Waikato, New Zealand)
  • Markus Weimer (Yahoo Research, California, USA)
  • Alain Rakotomamonjy (University of Rouen, France)

Organizers

  • Soeren Sonnenburg, Mikio Braun,Technische Universität Berlin
  • Cheng Soon Ong,ETH Zürich
  • Patrik Hoyer, Helsinki Institute for Information Technology

This workshop is about reinforcement learning in large state/action spaces, learning to optimize search, and the relation of these two.

Content-based information retrieval with relevance feedback is a multi-stage process, where at each stage a user selects the item which is closest to the requested information from a set of presented items, until the requested information is found. The task of the search engine is to present items such that the search terminates in few iterations. More generally, interactive search concerns multi-stage processes where a search engine presents some information and as response gets some feedback, which may be partial and noisy. Since the reward for finding the requested information is delayed, learning a good search engine from data can be modeled as a reinforcement problem, but the special structure of the problem needs to be exploited.

Since for realistic search applications the state space is enormous, this learning problem is a difficult one. Although the literature of reinforcement learning offers many powerful algorithms that have been successful in various difficult applications, we find that there is still relatively little understanding about when reinforcement learning might be successful in a realistic application, or what might make reinforcement learning successful in such an application. Furthermore, little work has been done on applying reinforcement learning to optimize interactive search.

Thus this workshop addresses in particular but not exclusively the following two questions:

  • Identify cases when realistically large problems with delayed feedback can be solved successfully, possibly but not necessarily by reinforcement learning algorithms. Such algorithm may need to exploit the special structure of the learning problem. As an example we see content-based information retrieval.
  • Application of learning techniques to develop powerful interactive search algorithms: optimizing a single search or learning across searches, with or without probabilistic assumptions.

A partial list of topics relevant to the workshop contains:

  • reinforcement learning in large state/action spaces,
  • automatic state/action aggregation and hierarchical reinforcement learning,
  • special cases or assumptions which facilitate fast reinforcement learning,
  • reinforcement learning, relevance feedback, and information retrieval,
  • search strategies based on relevance feedback,
  • learning efficient search strategies from multiple search sessions,
  • applications.

The workshop should provide an overview of the major achievements and the main open issues.

Organizers

  • Peter Auer, University of Leoben
  • Samuel Kaski, Aalto University, Helsinki
  • Csaba Szepesvari, University of Alberta

The Workshop on Feature Generation and Selection for Information Retrieval will be held on July 23, 2010, in Geneva, Switzerland, in conjunction with the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2010). The workshop will bring together researchers and practitioners from academia and industry to discuss the latest developments in various aspects of feature generation and selection for textual information retrieval.

Modern information retrieval systems facilitate information access at unprecedented scale and level of sophistication. However, in many cases the underlying representation of text remains quite simple, often limited to using a weighted bag of words. Over the years, several approaches to automatic feature generation have been proposed (such as Latent Semantic Indexing, Explicit Semantic Analysis, Hashing, and Latent Dirichlet Allocation), yet their application in large scale systems still remains the exception rather than the rule. On the other hand, numerous studies in NLP and IR resort to manually crafting features, which is a laborious and expensive process. Such studies often focus on one specific problem, and consequently many features they define are task- or domain-dependent. Consequently, little knowledge transfer is possible to other problem domains. This limits our understanding of how to reliably construct informative features for new tasks.

An area of machine learning concerned with feature generation (or constructive induction) studies methods that endow computers with the ability to modify or enhance the representation language. Feature generation techniques search for new features that describe the target concepts better than the attributes supplied with the training instances. It is worthwhile to note that traditional machine learning data sets, such as those available from the UCI data repository, are only available as feature vectors, while their feature set is essentially fixed. In fact, feature generation for specific UCI benchmark datasets is scorned upon. On the other hand, textual data is almost always available in its raw format (in some case as structured data with sufficient side information). Given the importance of text as a data format, it is well worthwhile designing text-specific feature generation algorithms. Complementary to feature generation, the issue of feature selection arises. It aims to retain only the most informative features, e.g., in order to reduce noise and to avoid overfitting, and is essential when numerous features are automatically constructed. This allows us to deal with features that are correlated, redundant, or uninformative, and hence we may want to decimate them through a principled selection process.

We believe that much can be done in the quest for automatic feature generation for text processing, for example, using large-scale knowledge bases as well as the sheer amounts of textual data easily accessible today. We further believe the time is ripe to bring together researchers from many related areas (including information retrieval, machine learning, statistics, and natural language processing) to address these issues and seek cross-pollination among the different fields.

Topics

Papers from a rich set of empirical, experimental, and theoretical perspectives are invited. Topics of interest for the workshop include but are not limited to:

  • Identifying cases when new features should be constructed
  • Knowledge-based methods (including identification of appropriate knowledge resources)
  • Efficiently utilizing human expertise (akin to active learning, assisted feature construction)
  • (Bayesian) nonparametric distribution models for text (e.g. LDA, hierarchical Pitman-Yor model)
  • Compression and autoencoder algorithms (e.g., information bottleneck, deep belief networks)
  • Feature selection (L1 programming, message passing, dependency measures, submodularity)
  • Cross-language methods for feature generation and selection
  • New types of features, e.g., spatial features to support geographical IR
  • Applications of feature generation in IR (e.g., constructing new features for indexing, ranking)

There is a great deal of interest in analyzing data that is best represented as a graph. Examples include the WWW, social networks, biological networks, communication networks, and many others. The importance of being able to effectively mine and learn from such data is growing, as more and more structured and semi-structured data is becoming available.

Traditionally, a number of subareas have worked with mining and learning from graph structured data, including communities in graph mining, learning from structured data, statistical relational learning, inductive logic programming, and, moving beyond subdisciplines in computer science, social network analysis, and, more broadly network science.

The objective of this workshop is to bring together researchers from a variety of these areas, and discuss commonality and differences in challenges faced, survey some of the different approaches, and provide a forum to present and learn about some of the most cutting edge research in this area. As an outcome, we expect participants to walk away with a better sense of the variety of different tools available for graph mining and learning, and an appreciation for some of the interesting emerging applications for mining and learning from graphs.

Organization Committee

Program Committee

  • Edo Arioldi
  • Tanya Berger-Wolf
  • Hendrik Blockeel
  • Karsten Borgwardt
  • Chris Burges
  • Diane Cook
  • Tina Eliassi-Rad
  • Stephen Fienberg
  • Paolo Frasconi
  • Thomas Gaertner
  • Brian Gallagher
  • Aris Gionis
  • Marko Grobelnik
  • Jiawei Han
  • Susanne Hoche
  • Lawrence Holder
  • Jure Leskovec
  • George Karypis
  • Samuel Kaski
  • Kristian Kersting
  • Dunja Mladenic
  • Alessandro Moschitti
  • Jennifer Neville
  • Massimiliano Pontil
  • Foster Provost
  • Padhraic Smyth
  • Swapna Somasundaran
  • Eric Xing
  • Philip Yu
  • Mohammed Zaki
  • Fabio Massimo Zanzotto
  • Zhongfei (Mark) Zhang