Videos

Videos are here: http://videolectures.net/msht07_paris/

Registration

Registration is free. Still, please email olivier.teytaud@inria.fr to facilitate the organization of the workshop.

Schedule

The preliminary schedule is available at http://www.lri.fr/~teytaud/sched.html.

Accomodation info

(more information coming soon)

(if booking is difficult for language-reasons, please feel free of emailing “teytaud@lri.fr” for taking care of booking)

All hotels below are in the heart of Paris.

  • Hotel du marais tel — not very close but not very far and much less expensive — from 25 euros/night to 35 euros/night
  • Hotel de Lille, 40 r Lille 75007 Paris, minimum 100 euros/night, tel. (very very close to the workshop)

See also the list of hotels here (from 41 euros/night): http://www.dma.ens.fr/~stoltz/MFLT/Accomodation.html; not very far.

Please feel free of requesting some help (email to olivier.teytaud@inria.fr)

Call For Papers

Multiple Simultaneous Hypothesis Testing is a main issue in many areas of information extraction:

  • rule extraction [6],
  • validation of genes influence [3],
  • validation of spatio-temporal patterns extraction (e.g. in brain imaging [7]),
  • other forms of spatial or temporal data (e.g. spatial collocation rule, [8]).
  • other multiple hypothesis testing ([4]),

In all above frameworks, the goal is to extract patterns such that some quantity of interest is significantly greater than some given threshold.

  • in rule extraction, the goal typically is the extraction of rules with confidence, lift and support significantly higher than a given threshold;
  • in multiple hypothesis testing, the goal typically is the extraction of significant comparisons among various averages simultaneously;
  • in spatio-temporal patterns extraction, the goal typically is the extraction of smooth (spatio-temporal) subsets of $ [0,1]^4$ with correlation significantly higher than a given threshold.

Along these lines, a type I error is to extract an entity which does not satisfy the considered constraint while a type II error is to miss an entity which does satisfy the constraint. How to estimate, bound, or (even better !) reduce type I and type II errors are the goals of the proposed challenge.

VC-theory [2], empirical process [5] and various approaches related to simultaneous hypothesis testings [4] are fully relevant, as well as specific approaches, e.g. based on simulations, resamplings or probes [9]. The challenge consists in extending previous results to the field of simultaneous hypothesis testing, or proposing new results specifically related to this topic.

We welcome survey papers related to type I and type II errors, and papers presenting new results, proposing theoretical bounds or smart empirical experiments. In the latter case, the experimental setting as well as the algorithmic principles and explicit criteria must be carefully described and discussed; the use of publicly available software will be much appreciated.

Results combining type I and type II risk are particularly welcome. Asymptotic and non-asymptotic results are equally welcome.

Key words : Empirical process, Learning theory, Multiple hypothesis testing, Rule extraction, Bio-informatics, Statistical Validation of Information Extraction.

Organization

Important dates

  • Diffusion of the challenge : January 11, 2006.
  • Deadline for submissions : February 10, 2007.
  • Notification of acceptance of submitted results : March 2007.
  • Challenge Workshop : Paris, France; May, 15-16th, 2007

Submissions

Submissions (in PS or PDF) should be submitted by email to “olivier.teytaud@inria.fr”

Venue

  • no fee.
  • venue: Université Paris-5, 45 rue des Saints-Pères, Paris (downtown). Close to metro “Saint-Germain-des-Prés”.

Email for any information: olivier.teytaud@inria.fr.

Organizing committee

  • Gérald Gavin (univ. Lyon 1);
  • Sylvain Gelly (univ. Paris-Sud);
  • Yann Guermeur (Cnrs, Loria);
  • Stéphane Lallich (univ. Lyon 2);
  • Jérémie Mary (univ. Lille);
  • Michèle Sebag (Cnrs);
  • Olivier Teytaud (Inria).

Bibliography

1
M. Antony and P.L. Bartlett, Neural network learning : Theoretical Foundations, Cambridge University Press, 1999.
2
V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
3
Merrill D. Birkner, Katherine S. Pollard, Mark J. van der Laan, and Sandrine Dudoit, “Multiple Testing Procedures and Applications to Genomics” (January 2005). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 168. http://www.bepress.com/ucbbiostat/paper168
4
J.C. Hsu, Multiple comparisons: theory and methods, Chapman & Hall, 1996.
5
Van Der Vaart A., Wellner J.A. Weak Convergence and Empirical Processes. Springer series in statistics, 1996.
6
R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of SIGMOD-93, pages 207-216, 1993.
7
D.Pantazis, T.-E. Nichols, S. Baillet, R.-M. Leahy, “A Comparison of Random Field Theory and Permutation Methods for the Statistical Analysis of MEG data”, Neuroimage, 25, 355-368, April, 2005.
8
M. Salmenkivi. Efficient Mining of Correlation Patterns in Spatial Point Data. In Proceedings of PKDD 2006, pages 359-370.
9
H. Stoppiglia, G. Dreyfus, R. Dubois, Y. Oussar. Ranking a random feature for Variable and Feature selection. JMLR 2003.