Practical Theories for Exploratory Data Mining Workshop

The goal of this ICDM 2012 workshop is to help closing the gap between data mining practice and theory. To this end, we intend to explore what is the essence of exploratory data mining and how to formalize it in a useful but theoretically well-founded way.

The workshop is motivated by a widely perceived discrepancy between theoretical data mining prototypes and practitioners’ requirements. A notable example is frequent pattern mining. Despite its attractive theoretical foundations, the practical use of frequent pattern mining methods has been limited. This is due to a difficulty to overcome issues, such as the pattern explosion problem and a discrepancy between usefulness and frequency. These issues have been addressed to some extent in the past 15 years, through heuristic post-processing steps and through rigorously motivated adaptations. The multitude of possible solution strategies has unfortunately to a large extent undermined the original elegance, and made it hard for practitioners to understand how to use these techniques.

The problem is however not restricted to frequent pattern mining alone. The multitude of available methods for typical exploratory data mining problems such as (subspace) clustering and dimensionality reduction is such that practitioners face a daunting task in selecting a suitable method. Additionally to the usability issues, less attention has been given on pattern mining methods for relational databases. Although most real world databases are relational, most pattern mining research has focused on one-table data.

We believe the core reasons for these difficulties are:

Different users inevitably have different prior beliefs and goals, whereas most exploratory data mining algorithms have a rigid objective function and do not consider this.
Formally comparing the quality of different data mining patterns is hard due to their widely varying nature (e.g. comparing a dimensionality reduction with a frequent itemset), unless their 'interestingness' can be quantified in a comparable manner.
The iterative process of data mining is often not considered.
Data mining in complex relational data is hard to fit into standard data mining prototypes.
More generally, data mining methods tend to be rigid, defined for highly specific tasks, for highly specific and idealized data, and for very specific types of patterns.

The purpose of this workshop will be to serve as a forum of exchanging ideas on how to formalize exploratory data mining in order to make it useful in practice. This workshop will survey (through invited as well as contributed talks and posters) some existing attempts at addressing the problems mentioned above. We particularly encourage papers that present principled theoretical contributions motivated by real world requirements.

Knowledge 4 All Foundation Ltd.