Data mining and knowledge discovery in social networks has advanced significantly over the past several years, due to the availability of a large variety of online and online social network systems. The focus of COMMPER is on two main streams of social networks: community mining and system recommenders.

The first focus of this workshop is on mining communities in social networks and in particular in scientific collaboration networks. Consider, for example, a dataset of scientific publications along with information about each publication and the complete citation network. Many data-analysis questions arise: what are the underlying communities, who are the most influential authors, what are the set-skills of individual authors, what are the observed collaboration patterns, how does interest on popular topics propagates, who does the network evolve in terms of collaborations, topics, citations, and so on. In this workshop we indent to bring domain experts, such as bibliometricians, closer to researchers from the fields of data mining and social networks. The expected outcome is to strengthen the collaboration of these communities aiming at high impact-research contributions and discussions. We aspire that the workshop will lead to the development of new insights and data mining methodologies that could be employed for the analysis of communities, models of human collaboration, topic discovery, evolution of social networks, and more.

People recommenders, the second main topic of this workshop, deal with the problem of finding meaningful relationships among people or organisations. In online social networks, relationships can be friends on Facebook, professional contacts on LinkedIn, dates on an online dating site, jobs or workers on employment websites, or people to follow on Twitter. The nature of these domains makes people-to-people recommender systems to be significantly different from traditional item-to-people recommenders. One basic difference in the people recommender domain is the benefit or requirement of reciprocal relationships. Another difference between these domains is that people recommenders are likely to have rich user profiles available. The goal of this workshop is to build a community around people recommenders and instigate discussion about this emerging area of research for recommender systems. With this workshop, we want to reach out to research done in both academia and industry.

Topics

We encourage that papers submitted to COMMPER focus on, but are not limited to the following topics:

  • analysis of scientific communities;
  • collaboration networks;
  • bibliometrics and data mining;
  • analysis of co-authorship networks;
  • analysis of citation networks;
  • communities in social networks;
  • dynamic networks;
  • formation of teams;
  • learning skills of individuals;
  • topic and community evolution and dynamics;
  • comparative studies of community networks;
  • people recommendation in social networks;
  • community recommendations in social networks;
  • mentor/mentee recommendations in tutoring systems;
  • expert search and expertise recommendation;
  • employee/employer recommendations;
  • online dating recommendations;
  • people search in the enterprise;
  • team recommendations;
  • reviewer assignment;
  • location-aware people recommendation.

Workshop Organizers

  • Panagiotis Papapetrou, Aalto University, Finland.
  • Luiz Augusto Pizzato, University of Sydney, Australia.
  • Aristides Gionis, Yahoo! Research, Spain.
  • Xiongcai Cai, University of New South Wales, Australia.

Program Committee

  • Mike Bain, University of New South Wales, Australia.
  • Shlomo Berkovsky, CSIRO, Australia.
  • Xiongcai Cai, University of New South Wales, Australia.
  • Gemma Garriga, INRIA Lille Nord Europe, France.
  • Ido Guy, IBM Research, Haifa, Israel.
  • Aristides Gionis, Yahoo! Research, Spain.
  • Dimitrios Gunopulos, University of Athens, Greece.
  • Jaakko Hollmen, Aalto University, Finland.
  • Judy Kay, University of Sydney, Australia.
  • Irena Koprinska, University of Sydney, Australia.
  • Ulf Kronman, Swedish Research Council, Sweden.
  • Theodoros Lappas, University of California, Riverside, USA.
  • Ashesh Mahidadia, University of New South Wales, Australia.
  • Richi Nayak, Queensland University of Technology, Australia.
  • Panagiotis Papapetrou, Aalto University, Finland.
  • Irma Pasanen, Aalto University, Finland.
  • Vaclav Petricek, eHarmony.com, USA.
  • Luiz Augusto Pizzato, University of Sydney, Australia.
  • Michalis Potamias, Boston University, USA.
  • Evimaria Terzi, Boston University, USA.
  • Wayne Wobcke, University of New South Wales, Australia.
  • Kalina Yacef, University of Sydney, Australia.
  • Sihem Amer-Yahia, Qatar Computing Research Institute, Qatar.

Introduction

Given two text fragments called 'Text' and 'Hypothesis', Textual Entailment Recognition is the task of determining whether the meaning of the Hypothesis is entailed (can be inferred) from the Text. The goal of the first RTE Challenge was to provide the NLP community with a benchmark to test progress in recognizing textual entailment, and to compare the achievements of different groups. Since its inception in 2004, the RTE Challenges have promoted research in textual entailment recognition as a generic task that captures major semantic inference needs across many natural language processing applications, such as Question Answering (QA), Information Retrieval (IR), Information Extraction (IE), and multi-document Summarization.

After the first three highly successful PASCAL RTE Challenges, RTE became a track at the 2008 Text Analysis Conference, which brought it together with communities working on NLP applications. The interaction has provided the opportunity to apply RTE systems to specific applications and to move the RTE task towards more realistic application settings.

RTE-7 pursues the direction taken in RTE-6, focusing on textual entailment in context, where the entailment decision draws on the larger context available in the targeted application settings.

RTE-7 Tasks

The RTE-7 tasks focus on recognizing textual entailment in two application settings: Summarization and Knowledge Base Population.

  1. Main Task (Summarization setting): Given a corpus and a set of "candidate" sentences retrieved by Lucene from that corpus, RTE systems are required to identify all the sentences from among the candidate sentences that entail a given Hypothesis. The RTE-7 Main Task is based on the TAC Update Summarization Task. In the Update Summarization Task, each topic contains two sets of documents ("A" and "B"), where all the "A" documents chronologically precede all the "B" documents. An RTE-7 Main Task "corpus" consists of 10 "A" documents, while Hypotheses are taken from sentences in the "B" documents.
  2. Novelty Detection Subtask (Summarization setting): In the Novelty Detection variant of the Main Task, systems are required to judge if the information contained in each H (based on text snippets from B summaries) is novel with respect to the information contained in the A documents related to the same topic. If entailing sentences are found for a given H, it means that the content of H is not new; if no entailing sentences are detected, it means that information contained in the H is novel.
  3. KBP Validation Task (Knowledge Base Population setting): Based on the TAC Knowledge Base Population (KBP) Slot-Filling task, the KBP validation task is to determine whether a given relation (Hypothesis) is supported in an associated document (Text). Each slot fill that is proposed by a system for the KBP Slot-Filling task would create one evaluation item for the RTE-KBP Validation Task: The Hypothesis would be a simple sentence created from the slot fill, while the Text would be the source document that was cited as supporting the slot fill.

Schedule

Proposed RTE-7 Schedule
April 29 Main Task: Release of Development Set
April 29 KBP Validation Task: Release of Development Set
June 10 Deadline for TAC 2011 track registration
August 17 KBP Validation Task: Release of Test Set
August 29 Main Task: Release of Test Set
September 8 Main Task: Deadline for task submissions
September 15 Main Task: Release of individual evaluated results
September 16 KBP Validation Task: Deadline for task submissions
September 23 KBP Validation Task: Release of individual evaluated results
September 25 Deadline for TAC 2011 workshop presentation proposals
September 29 Main Task: Deadline for ablation tests submissions
October 6 Main Task: Release of individual ablation test results
October 25 Deadline for system reports (workshop notebook version)
November 14-15 TAC 2011 Workshop

Organizing Committee

  • Luisa Bentivogli, CELCT and FBK, Italy
  • Peter Clark, Vulcan Inc., USA
  • Ido Dagan, Bar Ilan University, Israel
  • Hoa Trang Dang, NIST, USA
  • Danilo Giampiccolo, CELCT, Italy

Modeling shapes that evolve over time, analyzing and interpreting their motion has been a subject of increasing interest of many research communities including the computer vision, the computer graphics and the medical imaging community. Recent evolutions in acquisition technologies including 3D depth cameras (Time-of-Light and Kinect), multi-camera systems, marker based motion capture systems, ultrasound and CT scans have made those communities consider capturing the real scene and their dynamics, create 4D spatio-temporal models, analyze and interpret them. A number of applications including motion capture, dynamic shape modeling and animation, temporally consistent 3D reconstruction, motion analyzes and interpretation have emerged. The purpose of this workshop is to provide a venue for researchers, from various communities, working in the field of modeling dynamic scenes from various modalities to present their work, exchange ideas and identify challenging issues in this domain.

Program Committee

  • Adrian Bartoli, LASMEA Clermont-Ferrand, France
  • Luca Ballan, ETHZ, Switzerland
  • Peter Eisert, Heinrich-Hertz-Institute, Berlin, Germany
  • Jean-Sebastien Franco, INRIA Rhone Alpes, France
  • Ben Glocker, Microsoft Research, Cambridge, UK
  • Juergen Gall, ETH Zurich, Switzerland
  • Oliver Grau, BBC Research, UK
  • Radu Horaud, INRIA Rhone-Alpes, France
  • Ron Kimmel, Technion - Israel Institute of Technology, Israel
  • Yebin Liu, Tsinghua University, China
  • Marcus Magnor, TU Braunschweig, Germany
  • Francesc Moreno-Noguer, Institut de Robotica i Informtica Industrial, Barcelona, Spain
  • Christian Plaggerman, Stanford University, USA
  • Bodo Rosenhahn, University of Hannover, Germany
  • Mathieu Salzmann, TTI Chicago, USA
  • Leonid Sigal, Disney Research Pittsburg, USA
  • Christian Theobalt, MPI Informatik Saarbrucken, Germany
  • Tony Tung, Kyoto University, Japan
  • Raquel Urtasun, TTI Chichago, USA
  • Michael Wand, Saarland University / MPI Informatik, Germany
  • Andrei Zaharescu, AIMETIS Corporation, Waterloo, Canada
  • Darko Zikic, TUM, Germany

Topics

  • 4D acquisition of real-world dynamic scenes
  • Shape recovery of dynamic 3D scenes
  • Temporally consistent tracking of deformable surfaces
  • 4D Representation
  • 4D Reconstruction from Medical Images
  • Deformation surface models
  • Free-viewpoint and 3D video
  • Marker-less human motion capture (articulated and surface based)
  • Monocular and multi-view deformable surface capture
  • Animation and texture transfer
  • Deformable surface tracking for medical applications
  • Deformable Shape Matching
  • Motion analysis
  • Scene Flow
  • Non-rigid and Deformable shape analysis
  • Shape segmentation
  • Learning of model deformations
  • Space-time geometry processing
  • Applications of 3D video and 4D modeling
Neuroimaging techniques produce large amounts of brain images of different natures, allowing researchers and clinicians to gain insights of unprecedented quality on the cerebral anatomy, its connectivity structure and its functions. On the one hand, the development of these techniques provides neuroscientists with a growing amount and variety of data, and thus, a potentially improved understanding of the brain, and on the other hand, it precisely poses the challenge of devising automated methods for a high-level understanding of neuroimages.
These methods would be of importance to decode mental thoughts, understand cortical representations, categorize and classify brain responses, detect abnormalities in the brain, remove noise, take advantage of correlated prior information, help the diagnosis, and so on. Machine learning is probably one of the most promising field of research that would bring new approaches and procedures for automated neuroimaging interpretation.
The main goal of this workshop is precisely to bring together people from the machine learning community and people from the neuroimaging community that are keen to discuss their expertises. Potential outcomes to this workshop are for instance: the formal/machine learning setting of common problems in neuroimaging, the identification of new problems that can be readily tackled using machine learning techniques, the creation of new collaborations.
It is also expected that discussions will build around important challenges of machine learning posed by neuroimaging data such as feature selection in presence of few data, transfer learning, structured prediction...
Among the various themes that are of primary interest for the workshop, time will be devoted to sparsity based methods, feature selection, graph-based representation of image and kernel methods, exploitation of prior and heterogeneous knowledge to build predictive models.

Organizing committee:

  • Liva Ralaivola (LIF, Marseille, France)
  • Sylvain Takerkart (INCM / INT, Marseille, France)
  • Bertrand Thirion (Parietal / Inria, Gif sur Yvette, France)

Pattern Analysis and Statistical Learning cover a wide range of technologies and theoretical frameworks, and significant activity in the past years has resulted in a remarkable convergence and many advances in the theory and principles underlying the field.

Bringing these technologies to real world demanding applications is however often treated as a separate problem, one that does not directly affect the field as a whole. It is instead important to consider the field of Pattern Analysis as fully including all issues involved with the applications of this technology, and hence all issues that arise when deploying, scaling, implementing and using the technology.

We call for constributions in the form of Demos, Case Studies, Working Systems, Real World Applications and Usage Scenarios. Challenges may stem from the violation of common theoretical assumptions, from the specific types of patterns and noise arising in certain scenarios, or from the problem of scaling up the implementation of state of the art algorithms to real world sizes, or from the creation of integrated software systems that contain multiple pattern-analysis components.

We are also interested in new application areas, where Pattern Analysis has been deployed with success, and in issues involving the visualisation and delivery and exploitation of the patterns discovered by PA technologies. Systems working in noisy and unstructured environments and situations are particularly interesting.

The goal is to discuss and reward work aimed at making theory useful and relevant, without requesting the researchers to propose new theoretical methods, but rather requesting to show how they solved the many challenges related to applying these methods to real world scenarios, or how they benefited other fields of research. Getting ideas to work in real scenarios is what this is about.

Organisers

The aim of this workshop is to consolidate research efforts in the area of similarity-based pattern recognition and machine learning and to provide an informal discussion forum for researchers and practitioners interested in this important yet diverse subject.

We aim at covering a wide range of problems and perspectives, from supervised to unsupervised learning, from generative to discriminative models, and from theoretical issues to real-world practical applications.

The workshop will mark the end of the EU FP7 Projects SIMBAD and is a follow-up of the ICML 2010 Workshop on Learning in non-(geo)metric spaces.

Traditional pattern recognition techniques are intimately linked to the notion of "feature spaces." Adopting this view, each object is described in terms of a vector of numerical attributes and is therefore mapped to a point in a Euclidean (geometric) vector space so that the distances between the points reflect the observed (dis)similarities between the respective objects. This kind of representation is attractive because geometric spaces offer powerful analytical as well as computational tools that are simply not available in other representations. Indeed, classical pattern recognition methods are tightly related to geometrical concepts and numerous powerful tools have been developed during the last few decades, starting from the maximal likelihood method in the 1920's, to perceptrons in the 1960's, to kernel machines in the 1990's.

However, the geometric approach suffers from a major intrinsic limitation, which concerns the representational power of vectorial, feature-based descriptions. In fact, there are numerous application domains where either it is not possible to find satisfactory features or they are inefficient for learning purposes. This modeling difficulty typically occurs in cases when experts cannot define features in a straightforward way (e.g., protein descriptors vs. alignments), when data are high dimensional (e.g., images), when features consist of both numerical and categorical variables (e.g., person data, like weight, sex, eye color, etc.), and in the presence of missing or inhomogeneous data. But, probably, this situation arises most commonly when objects are described in terms of structural properties, such as parts and relations between parts, as is the case in shape recognition.

In the last few years, interest around purely similarity-based techniques has grown considerably. For example, within the supervised learning paradigm (where expert-labeled training data is assumed to be available) the well-established kernel-based methods shift the focus from the choice of an appropriate set of features to the choice of a suitable kernel, which is related to object similarities. However, this shift of focus is only partial, as the classical interpretation of the notion of a kernel is that it provides an implicit transformation of the feature space rather than a purely similarity-based representation. Similarly, in the unsupervised domain, there has been an increasing interest around pairwise or even multiway algorithms, such as spectral and graph-theoretic clustering methods, which avoid the use of features altogether.

By departing from vector-space representations one is confronted with the challenging problem of dealing with (dis)similarities that do not necessarily possess the Euclidean behavior or not even obey the requirements of a metric. The lack of the Euclidean and/or metric properties undermines the very foundations of traditional pattern recognition theories and algorithms, and poses totally new theoretical/computational questions and challenges.

The aim of this workshop is to consolidate research efforts in this area, and to provide an informal discussion forum for researchers and practitioners interested in this important yet diverse subject. The discussion will revolve around two main themes, which basically correspond to the two fundamental questions that arise when abandoning the realm of vectorial, feature-based representations, namely:

  • How can one obtain suitable similarity information from data representations that are more powerful than, or simply different from, the vectorial?
  • How can one use similarity information in order to perform learning and classification tasks?

We aim at covering a wide range of problems and perspectives, from supervised to unsupervised learning, from generative to discriminative models, and from theoretical issues to real-world practical applications.

Accordingly, topics of interest include (but are not limited to):

  • Embedding and embeddability
  • Graph spectra and spectral geometry
  • Indefinite and structural kernels
  • Game-theoretic models of pattern recognition
  • Characterization of non-(geo)metric behaviour
  • Foundational issues
  • Measures of (geo)metric violations
  • Learning and combining similarities
  • Multiple-instance learning
  • Applications

Program chairs

  • Marcello Pelillo, University of Venice, Italy
  • Edwin Hancock, University of York, UK

Steering committee

  • Joachim Buhmann, ETH Zurich, Switzerland
  • Robert Duin, Delft University of Technology, The Netherlands
  • Mario Figueiredo, Technical University of Lisbon, Portugal
  • Edwin Hancock, University of York, UK
  • Vittorio Murino, University of Verona, Italy
  • Marcello Pelillo, University of Venice, Italy

Program committee

  • Maria-Florina Balcan, Georgia Institute of Technology, USA
  • Manuele Bicego, University of Verona, Italy
  • Joachim Buhmann, ETH Zurich, Switzerland
  • Horst Bunke, University of Bern, Switzerland
  • Tiberio Caetano, NICTA, Australia
  • Umberto Castellani, University of Verona, Italy
  • Luca Cazzanti, University of Washington, Seattle, USA
  • Nicolò Cesa-Bianchi, University of Milan, Italy
  • Robert Duin, Delft University of Technology, The Netherlands
  • Francisco Escolano, University of Alicante, Spain
  • Mario Figueiredo, Technical University of Lisbon, Portugal
  • Ana Fred, Technical University of Lisbon, Portugal
  • Bernard Haasdonk, University of Stuttgart, Germany
  • Edwin Hancock, University of York, UK
  • Anil Jain, Michigan State University, USA
  • Robert Krauthgamer, Weizmann Institute of Science, Israel
  • Marco Loog, Delft University of Technology, The Netherlands
  • Vittorio Murino, University of Verona, Italy
  • Elzbieta Pekalska, University of Manchester, UK
  • Marcello Pelillo, University of Venice, Italy
  • Massimiliano Pontil, University College London, UK
  • Antonio Robles-Kelly, NICTA, Australia
  • Volker Roth, University of Basel, Switzerland
  • Amnon Shashua, The Hebrew University of Jerusalem, Israel
  • Andrea Torsello, University of Venice, Italy
  • Richard Wilson, University of York, UK

Organizing committee

  • Samuel Rota Bulo' (chair), University of Venice, Italy
  • Nicola Rebagliati, University of Venice, Italy
  • Furqan Aziz, University of York, UK
  • Luca Rossi, University of Venice, Italy
  • Teresa Scantamburlo, University of Venice, Italy

SOR' is the premiere scientific event in the area of operations research, one of the traditional series of the biannual international conferences organized by the Slovenian Society Informatika, Section of Operations Research. It represents a continuity of symposia, which have attracted a growing number of international audience since the first symposium.

SOR provides an international forum for scientific exchange at the frontiers of operations research (OR) in mathematics, statistics, economics, engineering, education, environment, computer science etc. Since OR comprises a large variety of mathematical, statistical and informational theories and methods to analyse complex situations and to contribute to responsible decision making, planning and the efficient use of the resources, we believe, that in the world of increasing complexity and scarce natural resources there will be a growing need for such approaches in many fields of our society.

Program committee

  • L. Zadnik Stirn, University of Ljubljana, Biotechnical Faculty, Ljubljana, Slovenia, Chair
  • J. Žerovnik, University of Ljubljana, Faculty of Mechanical Engineering, Ljubljana, Slovenia, Chair
  • Z. Babić, University of Split, Faculty of Economics, Department for Quantitative Methods, Split, Croatia
  • M. Bastič, University of Maribor, Faculty of Business and Economics, Maribor, Slovenia
  • M. Bogataj, University of Ljubljana, Faculty of Maritime Studies and Transport, Portorož, Slovenia
  • K. Cechlarova, P.J. Šafarik University, Faculty of Science, Košice, Slovakia
  • T. Csendes, University of Szeged, Department of Applied Informatics, Szeged, Hungary
  • V. Čančer, University of Maribor, Faculty of Business and Economics, Maribor, Slovenia
  • S. Drobne, University of Ljubljana, Faculty of Civil Engineering and Geodesy, Ljubljana, Slovenia
  • L. Ferbar, University of Ljubljana, Faculty of Economics, Ljubljana, Slovenia
  • M. Gavalec, University of Hradec Králové, Faculty of Informatics and Management, Hradec Králové, Czech Republic
  • R. W. Grubbström, Linköping University, Linköping Institute of Technology, Linköping, Sweden
  • J. Jablonsky, University of Economics, Faculty of Informatics and Statistics, Praha, Czech Republic
  • P. Köchel, Chemnitz University of Technology, Faculty of Informatics, Chemnitz, Germany
  • J. Kušar, University of Ljubljana, Faculty of Mechanical Engineering, Ljubljana, Slovenia
  • L. Lenart, Institute Jožef Stefan, Ljubljana, Slovenia
  • A. Lisec, University of Ljubljana, Faculty of Civil Engineering and Geodesy, Ljubljana, Slovenia
  • L. Neralić, University of Zagreb, Faculty of Economics & Business, Zagreb, Croatia
  • I. Pesek, University of Maribor, Faculty of Natural Sciences and Mathematics, Maribor, Slovenia
  • J. Povh, Faculty of Information Studies, Novo mesto, Slovenia
  • M. S. Rauner, University of Vienna, Department of Innovation and Technology Management, Vienna, Austria
  • A. Schaerf, University of Udine, Department of Electrical, Management and Mechanical Engineering, Udine, Italy 
  • M. Sniedovich, University of Melbourne, Department of Mathematics and Statistics, Melbourne, Australia
  • K. Šorić, University of Zagreb, Faculty of Economics & Business, Zagreb, Croatia
  • D. Škulj, University of Ljubljana, Faculty of Social Sciences, Ljubljana, Slovenia
  • P. Šparl, University of Maribor, Faculty of Organizational Sciences, Kranj, Slovenia
  • T. Trzaskalik, Karol Adamiecki University of Economics, Department of Operational Research Katowice, Poland
  • B. Zmazek, University of Maribor, Faculty of Natural Sciences and Mathematics, Maribor, Slovenia
  • D. Yuan, Linköping University, Department of Science, Linköping, Sweden

Organizing committee

  • J. Povh, Faculty of Information Studies, Novo mesto, Slovenia, Chair
  • S. Drobne, University of Ljubljana, Faculty of Civil Engineering and Geodesy, Ljubljana, Slovenia
  • A. Lisec, University of Ljubljana, Faculty of Civil Engineering and Geodesy, Ljubljana, Slovenia
  • Klavdija Macedoni, Faculty of Information Studies, Novo mesto, Slovenia
  • Jernej Gabrič, Faculty of Information Studies, Novo mesto, Slovenia
  • L. Zadnik Stirn, University of Ljubljana, Biotechnical Faculty, Ljubljana, Slovenia
  • J. Žerovnik, University of Ljubljana, Faculty of Mechanical Engineering, Ljubljana, Slovenia

During the last ten years, games have become big business, generating higher revenues than Hollywood. Games evolved from single player games to massive multiplayer platforms with hundreds or even millions of players simultaneously that often include complex world simulations.  On the one hand this requires more and more sophisticated methods for automation where fraud detection, story generation, adapting game AI or matchmaking are only a few of the novel challenges that have to be targeted by the industry.  On the other hand, with the introduction of games on social networking sites came the birth of a new type of game-related data source that presents a new and possibly high pay-off application for data mining research.

Submissions

We welcome submissions on all aspects of Machine Learning and Data Mining for and in games, including, but not limited to, papers addressing the following topics:

  • Learning how to play games well for games ranging from deterministic and discrete board games to non-deterministic, continuous, real time, action oriented games.
  • Player/opponent/team modeling and game analysis for goals such as improving artificial players in competitive games, mimicking human players, game or learning curve adaptation, automatic skill-ranking, match-making, or player and team behavior analysis (fraud detection) in multiplayer games.
  • Game adaptivity and automated content or story generation, for example for raising or lowering difficulty levels dependent on the players proficiency and avoiding the emergence of player routines that are guaranteed to beat the game, possibly with attention to user specific constraints and preferences. This topic also includes concerns on game stability and performance guarantees for artificial opponents and issues related to the learning experience and the design of virtual humans in serious games.
  • Novel data mining challenges and/or techniques for data generated through computer games, for example using logs from social, massively multi-player or mobile games to gain insight on human behaviour or understand social and group dynamics amongst players, or learning when and why players will quit a game out of frustration.
  • Data mining and machine learning perspectives in/from the games industry.

We also welcome on topic work-in-progress contributions, position papers, as well as papers discussing potential research directions.  Submissions will be reviewed by program committee members on the basis of relevance, significance, technical quality, and clarity.  All accepted papers will be presented as posters and among them, a few will be selected for oral presentation.

Organizers

  • Tom Croonenborghs, Katholieke Hogeschool Kempen
  • Kurt Driessens, Maastricht University
  • Olana Missura, Fraunhofer IAIS and University of Bonn

Hierarchies are becoming ever more popular for the organization of documents, particularly on the Web (Web directories are an example of such hierarchies). Along with their widespread use comes the need for automated classification of new documents to the categories in the hierarchy. As the size of the hierarchy grows and the number of documents to be classified increases, a number of interesting problems arise. In particular it is one of the rare situations where data sparsity remains an issue despite the vastness of available data. The reasons for this are the simultaneous increase in the number of classes and their hierarchical organization. The latter leads to a very high imbalance between the classes at different levels of the hierarchy. Additionally, the statistical dependence of the classes poses challenges and opportunities for the learning methods

Research on large-scale classification so far has focused on situations involving a large number of documents and/or a large numbers of features, with a limited number of categories. However, this is not the case in hierarchical category systems, such as DMOZ, the International Patent Classification or Wikipedia, where, in addition to the large number of documents and features, a large number of categories exist, in the order of tens or hundreds of thousands. Approaching this problem, either existing large-scale classifiers can be extended, or new methods need to be developed. The goal of this workshop, which follows the first edition held in conjunction with the European Conference on Inforamtion Retrieval (ECIR) in 2010, is to discuss and assess some of these strategies, covering all or part of the issues mentioned above.

Workshop Format

The workshop is intended for one day. All participants will be asked to prepare papers, which will be presented either as oral presentations or posters. Submissions must be written in English, following the LNCS guidelines and must not exceed 12 pages including references and figures. Additionally, the program will include one invited talk and a round-table discussion.

The submissions to the workshop are elicited through an open call for papers and will undergo peer review by the programme committee. We encourage submissions on all aspects of large-scale categorization, from purely theoretical work to practical developments of large-scale categorizers.

Organisers

  • George Paliouras, NCSR "Demokritos", Athens, Greece
  • Eric Gaussier, LIG, Grenoble, France
  • Aris Kosmopoulos, NCSR "Demokritos" & AUEB, Athens, Greece
  • Ion Androutsopoulos, AUEB, Athens, Greece
  • Thierry Artières, LIP6, Paris, France
  • Patrick Gallinari, LIP6, Paris, France