Clustering is one of the most widely used techniques for exploratory data analysis. In the past five decades, many clustering algorithms have been developed and applied to a wide range of practical problems. There has also been very exciting theoretical work, proving guarantees for algorithms and developing new frameworks for analysis.

Yet in many ways we are only beginning to understand some of the most basic issues in clustering. While there have been some remarkable successes, we believe more is possible. In particular, work addressing issues that are independent of any specific clustering algorithm, objective function, or specific data generative model, is still in its infancy.

In his famous Turing award lecture, Donald Knuth states about Computer Programming that: "It is clearly an art, but many feel that a science is possible and desirable''. In the case of clustering, we believe that an even better and deeper science than what we currently offer is possible and highly desirable.

Goals of the Workshop

This workshop aims at initiating a dialog between theoreticians and practitioners, aiming to bridge the theory-practice gap in this area. The workshop will be built along three main questions:

  1. FROM THEORY TO PRACTICE:
    Which abstract theoretical characterizations / properties / statements about clustering algorithms exist that can be helpful for practitioners and should be adopted in practice?
  2. FROM PRACTICE TO THEORY:
    What concrete questions would practitioners like to see addressed by theoreticians? Can we identify de-facto practices in clustering in need of theoretical grounding? Which obscure (but seemingly needed or useful) practices are in need of rationalization?
  3. FROM ART TO SCIENCE:
    In contrast to supervised learning, where there is general consensus on how to assess the quality of an algorithm, the frameworks for analyzing clustering are only beginning to be developed and clustering is still largely an art. How can we progress towards a deeper understanding of the space of clustering problems and objectives, including the introduction of falsifiable hypotheses and properly designed experimentation? How could one set up a clustering challenge to compare different clustering algorithms? What could be scientific standards to evaluate a clustering algorithm in a paper?

The workshop will also serve as a follow up meeting to the NIPS 2005 “Theoretical  Foundations of clustering” workshop,  a venue for the different research groups working on these issues to take stock, exchange view points and discuss the next challenges in this ambitious quest for theoretical foundations of clustering.

Organizers

  • Shai Ben-David is a CS professor at the University of Waterloo, Canada.
  • Avrim Blum is a professor of CS at Carnegie Mellon University.
  • Ulrike von Luxburg is a Senior Research Scientist at the Max Plank Institute in Tubingen, Germany.
  • Isabelle Guyon is an independent engineering consultant, working from California.
  • Reza Bosagh Zadeh is a graduate student at Carnegie Mellon University.
  • Margareta Ackerman is a graduate student at the University of Waterloo.
  • Robert C. Williamson is the Scientific Director of NICTA and a Professor in the Research School of Information Sciences and Engineering at the Australian National University.