News Archives

Attend KDD-09 (early reg deadline May 31) – The Data Mining and Knowledge Discovery conf., Paris

KDD-2009: The Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09)

Paris, France
June 28 – July 1, 2009.
http://www.kdd.org/kdd2009/

Register by May 31 to get the early registration rates!
http://www.kdd.org/kdd2009/registration.html

As the premier international conference on Data Mining and Knowledge Discovery, KDD 2009 provides a forum for academic researchers and industry and government innovators to share their results and experiences. Researchers and practitioners will gather to present academic and industrial papers, panels, implemented software demos, posters, workshops, tutorials, and insights from the popular KDD Cup competition.

New this year: a social networking platform where attendees can learn about the proceedings, collaborate on research papers, discuss individual sessions, receive real-time updates on the conference, and help organize social events in Paris using the site.

CONFERENCE VENUE
—————-

For the first time KDD will leave America and come to Europe; KDD 2009 will take place in beautiful downtown Paris, at the Marriott Paris Rive Gauche Hotel, 17 Boulevard St Jacques, 75014 Paris, France.

On Monday, June 29 the Conference reception will be held at the Hotel de Ville of Paris, in the main reception room, the Salle des Fetes, where Paris usually welcomes Heads of States and VIPs.

Please go to http://www.kdd.org/kdd2009/registration.html to register online for the conference. The early conference registration deadline is May 31 Find the hotel reservation code on
http://www.kdd.org/kdd2009/travel.html to enjoy the group rate for the hotel.

This year the organizers are introducing a new option besides full participation: “Workshops, Tutorials,and Evenings” (or, “Nights and Weekends”). This enables you to participate all day Sunday, for the workshops and/or tutorials, as well as the evenings (5pm+ Sunday-Tuesday),
which feature invited industry talks, receptions at the beautiful Paris Town Hall and Marriott Hotel, and technical poster sessions. (The option omits, however, the full technical program during the 3 days, Mon-Wed.)

If you have any registration questions direct them to:

Mandy Mann (mandy.mann at regmaster.com) or +1 407 971 4451.

For other questions, the KDD organizer contact information is at:
http://www.kdd.org/kdd2009/

CONFERENCE HIGHLIGHTS
——————–

a) INVITED SPEAKERS:

This year, KDD features five distinguished invited speakers:
– David J. Hand, “Mismatched Models, Wrong Results, and Dreadful Decisions: On choosing appropriate data mining tools”
– Heikki Mannila, “Randomization Methods in Data Mining”
– Stanley Wasserman, ““Network Science: An Introduction to Recent Statistical Approaches”
– Ravi Kumar, “Mining Web Logs: Applications and Challenges”
– Ashok N. Srivastava, “Data Mining at NASA: from Theory to Applications”

b) TUTORIALS: (all tutorials are free with conference registration)

9 diverse half-day tutorials will be presented on Sunday, June 28th.
http://www.kdd.org/kdd2009/tutorials.html

c) WORKSHOPS (all workshops are free with conference registration)

11 workshops, including 4 challenge workshops, will be held Sunday,
June 28th . http://www.kdd.org/kdd2009/workshops.html

The detailed Conference program will soon be available on the KDD-09 web
site.

d) KDD CUP

Based on challenge data provided by Orange Labs, this year’s competition focuses on predicting customer scores from large marketing databases from the French Telecom company, Orange.
10,000 Euros of prizes and travel grants — generously donated by Orange — will be distributed among the cup winners.
See progress on: http://www.kddcup-orange.com/

ENJOY PARIS
———–

While in Paris, enjoy the city! Register on http://www.kdd.org/kdd2009/travel.html#deals for special deals available to KDD participants on Saturday 27 June, Thursday 2 July, and Friday 3 July.

— KDD organizers http://www.sigkdd.org/kdd2009/organizers.html

Machine Learning Summer School 2009

University of Cambridge, UK
29 August – 10 September 2009
http://mlg.eng.cam.ac.uk/mlss09/

We invite you to apply to attend the 13th Machine Learning Summer School, which will be held at the University of Cambridge. The school will offer lectures and practicals given by leading researchers in the field on a wide range of topics in machine learning. We hope to attract international students, young researchers and industry practitioners with a keen interest in machine learning and a strong mathematical background.

APPLICATION DEADLINE: 1 June 2009

APPLICATION WEBSITE: http://mlg.eng.cam.ac.uk/mlss09/application.htm

We can offer a limited number of travel grants, and encourage students to apply even if they may not be able to meet the full costs of travel and attendance.

Confirmed Lecturers:
Christopher Bishop
Andrew Blake
David Blei
Philip Dawid
Zoubin Ghahramani
Simon Godsill
Geoffrey Hinton
Thomas Hofmann
Michael Jordan
Michael Littman
David MacKay
Thomas P. Minka
Iain Murray
Peter Orbanz
Carl Edward Rasmussen
Bernhard Schölkopf
John Shawe-Taylor
Yee Whye Teh
Josh Tenenbaum
Lieven Vandenberghe

This year’s MLSS is organised by the University of Cambridge, with Microsoft Research and PASCAL.

We look forward to receiving your applications.

Sincerely,

The Organisers
Zoubin Ghahramani, Carl Edward Rasmussen, Christopher M. Bishop, A. Philip Dawid, David J.C. Mackay, Peter Orbanz, Joaquin Quiñonero Candela

PASCAL Visual Object Classes Recognition Challenge 2009

We are running the PASCAL Visual Object Classes Recognition Challenge again this year. As in 2008 there are 20 object classes. Participants can recognize any or all of the classes, and there are classification, detection and pixel-wise segmentation competitions. (New for 2009, segmentation has been promoted from a “taster” to a full competition.)
There is also a “taster” competition on person layout (detecting head, hands, feet).

The development kit (Matlab code for evaluation, and baseline algorithms) and training data is now available at:

http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2009/index.html

where further details are given. The timetable of the challenge is:

* 15 May 2009: Development kit (training and validation data plus
evaluation software) made available.

* 15 June 2009: Test set made available.

* 7 September 2009. Deadline for submission of results.

* 3 October 2009: Workshop in association with ICCV 2009, Kyoto, Japan.

Mark Everingham
Luc Van Gool
Chris Williams
John Winn
Andrew Zisserman

Funded PhD position in machine learning – University of Saint-Etienne (France)

The Machine Learning research group of the University of Saint-Etienne (France) invites applications for a fully funded 3-years PhD position at the Hubert-Curien lab.

Topic of the studentship:
Machine Learning for Image Recognition
(starting date is 1 October 2009)

This project mainly concerns the design of new methods for learning similarities between images that are represented in the form of strings, trees or graphs. The selected candidate will work in the context of the SATTIC (http://labh-curien.univ-st-etienne.fr/wiki-sattic) project, financed by the French National Research Agency. He/She will join the machine learning group composed of about 20 researchers working on the crossroads of machine learning, data mining and information retrieval.

Candidates must have demonstrable interest and expertise in machine learning, statistical theory and have strong programming skills. A background in image processing is encouraged but not required. Applicant should get a Master degree in Computer Science in 2009.

Expressions of interest with a short CV should be sent to: marc.sebban (at) univ-st-etienne.fr by 1 June 2009 at the latest.

Some links:
-Some facts about Saint-Etienne can be found here: http://en.wikipedia.org/wiki/Saint-Étienne.
-Saint Etienne is a medium size city with a quite low cost of living (compared to same size cities in France).
-Paris can be reached from Saint Etienne in less than 3 hours via a direct train. The closest airport is Lyon Saint Exupéry (http://www.lyon.aeroport.fr).
-The city is surrounded by the “regional parc of pilat” in which almost any outdoor activity can be practiced (http://www.parc-naturel-pilat.fr/eng/).
-From a cultural point of view, the art museum in Saint-Etienne holds the second national contemporary art collection, and classic concerts, dance shows, and lyric operas are performed at the Saint-Etienne “Massenet Opera” (http://www.decouvrez-le-votre.com/).

Fourth Summerschool on Advanced Statistics and Data Mining

The Polytechnical Univ. of Madrid organizes a summerschool on “Advanced Statistics and Data Mining” in Madrid between July 6th and July 17th. The summerschool comprises 18 courses divided in 2 weeks.
Attendees may register in each course independently. Registration will be considered upon strict arrival order.

For more information, please, visit
http://www.dia.fi.upm.es/index.php?page=presentation&hl=es_ES or
http://biocomp.cnb.csic.es/~coss/Docencia/ADAM/ADAM.htm.

List of courses and brief description
(Full description at http://biocomp.cnb.csic.es/~coss/Docencia/ADAM/ADAM.htm)

Week 1 (July 6th – July 10th, 2009)

Course 1: Bayesian networks (15 h), Practical sessions: Hugin, Elvira, Weka, LibB
Bayesian networks basics. Inference in Bayesian networks.
Learning Bayesian networks from data

Course 2: Multivariate data analysis (15 h), Practical sessions: MATLAB
Introduction. Data Examination. Principal component analysis (PCA).
Factor Analysis. Multidimensional Scaling (MDS). Correspondence analysis.
Multivariate Analysis of Variance (MANOVA). Canonical correlation.

Course 3: Dimensionality reduction (15 h), Practical sessions: MATLAB
Introduction. Matrix factorization methods. Clustering methods. Projection methods. Applications

Course 4: Supervised pattern recognition (Classification) (15 h), Practical sessions: Weka
Introduction. Assessing the Performance of Supervised Classification Algorithms.
Classification techniques. Combining Classifiers.
Comparing Supervised Classification Algorithms

Course 5: Introduction to MATLAB (15 h)
Overview of the Matlab suite. Data structures and files. Programming in Matlab.
Visualization tools. Some applications in pattern recognition.

Course 6: Datamining, a practical perspective (15h), Practical sessions: MATLAB, R, Weka
Introduction to Data Mining and Knowledge Discovery. Prediction in data mining.
Classification. Association studies. Data mining in free-form texts: text mining.

Course 7: Time series analysis (15 h), Practical sessions: MATLAB
Introduction. Probability models to time series. Regression and Fourier analysis.
Forecasting and Data mining.

Course 8: Neural networks (15 h), Practical sessions: MATLAB
Introduction to the biological models. Nomenclature. Perceptron networks.
The Hebb rule. Foundations of multivariate optimization. Numerical optimization.
Rule of Widrow-Hoff. Backpropagation algorithm.
Practical data modelling with neural networks

Course 9: Introduction to SPSS (15 h)
Introduction. Describing data. Statistical inference. Time series. Sampling.
Classification and regression

Week 2 (July 13th – July 17th, 2009)

Course 10: Regression (15 h), Practical sessions: SPSS
Introduction. Simple Linear Regression Model. Measures of model adequacy.
Multiple Linear Regression. Regression Diagnostics and model violations.
Polynomial regression. Variable selection. Indicator variables as regressors.
Logistic regression. Nonlinear Regression.

Course 11: Practical Statistical Questions (15 h), Practical sessions: study of cases (without computer)
I would like to know the intuitive definition and use of …: The basics.
How do I collect the data? Experimental design.
Now I have data, how do I extract information? Parameter estimation
Can I see any interesting association between two variables, two populations, …?
How can I know if what I see is “true”? Hypothesis testing
How many samples do I need for my test?: Sample size
Can I deduce a model for my data? Other questions?

Course 12: Missing data and outliers (15 h), Practical sessions: R
Missing Data: Typology of missing data; Simple missing-data methods;
Imputation Methods; Diagnostics and Overimputing. Outliers and robust statistics:
Typology of outliers; Influence measures; Robust methods

Course 13: Hidden Markov Models (15 h), Practical sessions:HTK
Introduction. Discrete Hidden Markov Models. Basic algorithms for Hidden Markov Models.
Semicontinuous Hidden Markov Models. Continuous Hidden Markov Models.
Unit selection and clustering. Speaker and Environment Adaptation for HMMs.
Other applications of HMMs

Course 14: Statistical inference (15 h), Practical sessions: SPSS
Introduction. Some basic statistical test. Multiple testing. Introduction to bootstrapping

Course 15: Features Subset Selection (15 h), Practical sessions: MATLAB, R, Weka
Filter approaches. Wrapper methods. Embedded methods.

Course 16: Introduction to R (15 h)
An introductory R session. Data in R. Importing/Exporting data. Programming in R.
R Graphics. Statistical Functions in R

Course 17: Unsupervised pattern recognition (clustering) (15 h), Practical sessions: MATLAB
Introduction. Prototype-based clustering. Density-based clustering.
Graph-based clustering. Cluster evaluation. Miscellanea

Course 18: Evolutionary computation (15 h), Practical sessions: MATLAB
Genetic algorithms. Genetic programming. Robust and self-adapting intelligent systems.
Introduction to Estimation of Distribution Algorithms.
Improvements, extensions and applications of EDAs. Current research in EDAs.

PhD Studentship in Machine Learning

Applications are invited for a PhD position in the Computational Learning and Computational Linguistics group of the Artificial Intelligence Laboratory, in the Computer Science Department of the University of Geneva, available immediately. The successful candidates will pursue research in connection with a project on machine learning funded by the Swiss National Science Foundation, where they will investigate theoretical and practical issues in statistical modelling of structured objects, such as natural language semantic structures. Further information on related research activities of the CLCL group can be found on the home page of James Henderson.

Candidates should have a solid background in both computer science and mathematics, particularly in statistical learning and/or optimization theory. They should have excellent programming skills as well as communication skills in English (and ideally in French). Preference will be given to candidates with a strong interest and/or experience in computational linguistics or natural language processing. A strong academic record, excellent analytical skills, and a clear aptitude for autonomous, creative research will be priority selection criteria.

The position is available immediately. Starting salary will be around 4400 CHF/month (gross) at the PhD level (1 CHF = 0.66 EUR). Applications will be accepted until the position is filled.

Applicants should send their curriculum vitae, academic transcript, a statement of purpose, and names and addresses (with e-mail and telephone number) of at least 2 references to the address below (preferably by e-mail):

James Henderson
CUI – University of Geneva
Battelle bâtiment A
7 route de Drize
CH-1227 Carouge, Switzerland
E-mail: James.Henderson (at) unige.ch

Informal inquiries should be directed to James Henderson at James.Henderson (at) unige.ch.

Post-doc position at INRIA (LEAR team)

The LEAR team at INRIA Grenoble is looking for a qualified post-doctoral researcher with a specialization in Computer Vision and Machine Learning, on the topic of discovering relationships between actions and objects.

The position is offered at the “Rhone-Alpes” Research Unit of INRIA, located near Grenoble and Lyon. The Unit includes more than 600 people, within 26 research teams and 10 support services.

Starting date: Summer 2009

Deadline for applications: June 2009.

Monthly salary after taxes : 1 983 € (medical insurance included)

Contact: Remi.Ronfard (at) inrialpes.fr

Activities

Recently, a number of image ranking approaches were proposed that build upon visual words similarity networks (i.e. [3,4]). These methods explore relationships between object categories by analyzing similarities of the extracted visual features. In the case of video actions, the relationships are more complex as similarities can be observed in the spaces of image features, motion features, and also in the joint space of image and motion features. An approach to discovering relationships in such networks would allow for recognition of objects, motions, and human-object interactions. The initial investigation can be performed along the lines in [3,4].

In order to achieve the above goal, a good feature extraction method has to be developed. Existing spatio-temporal features describe information of a video subvolume of a simple shape. Intuitively, the procedure that discovers the shapes of such subregions should be guided by some general measure of the subregion descriptiveness. Unfortunately, straightforward extensions of the common 2D subregion extraction methods [1] may not be appropriate. Additionally, approaches to obtaining good descriptors of the extracted subregions should be investigated., with special care taken to obtain good view and time-invariant spatio-temporal descriptors.

In order to investigate the relationships between actions and objects, the problem of analyzing human-object interactions should be addressed. It would be of significant practical benefit to have a method for recognizing interactions from an egocentric

camera. Ideally, the approach would discover atomic interactions from sequences of long-term activities. Some of the possible approaches to implement the idea would be to consider the interaction models [2].

Skills and Profile

* PhD degree (preferably in Computer Vision or Machine Learning)

* Solid programming skills; the project involves programming in Matlab and C++

* Solid mathematics knowledge (especially linear algebra and statistics)

* Creative and highly motivated

* Fluent in English, both written and spoken

* Prior knowledge in the areas of action recognition, video retrieval or object recognition is a plus

REFERENCES

[1] A. Oikonomopoulos, I. Patras, and M. Pantic, Human action recognition with spatiotemporal salient points, IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, vol. 36, no. 3, pp. 710-719, 2006.

[2] Hedvig Kjellstrasom, Javier Romero, David Martinez Mercado, and Danica Kragic, Simultaneous visual recognition of manipulation actions and manipulated objects, in ECCV (2), 2008, pp. 336-349.

[3] Gunhee Kim, C. Faloutsos, and M. Hebert,Unsupervised modeling of object categories using link analysis techniques, in CVPR, 2008,pp. 1-8.

[4] Yushi Jing and Shumeet Baluja, Visualrank: Applying pagerank to large-scale image search, TPAMI, vol. 30, no. 11, pp. 1877-1890, 2008.

Research Scientist position – XEROX Research Centre Europe – Grenoble, France

Text and Visual Pattern Analysis

XEROX Research Centre Europe’s Text and Visual Pattern Analysis Area (TVPA) is an expanding team, which specializes in text and visual content understanding. Our mission is the delivery of Xerox’s innovative solutions that make everyday interaction with visual and textual content simple and effective. Our research is the result of combining skills mainly in machine learning, pattern recognition and image analysis. In particular we focus on text and image categorization, image enhancement, quality assessment and document imaging.

Your Job: Research Scientist

As a research scientist in TVPA you will be asked to generate and follow up on new ideas, on build strong competencies and intellectual property in Computer Vision and Pattern Analysis. In particular, you will be pursuing activities around our new research agenda on Applied Visual Aesthetics.

Moreover, you will collaborate in a small, agile team that leads the development of the OMNIA Project. OMNIA is a three year project funded by French Government aiming at developing innovative strategies for multimodal asset retrieval based on three main axes: content, emotion and visual aesthetics.

Research Topics:

* Design of aesthetic measures (light/colour harmony/composition analysis, aesthetic ontology design, user preference regression etc.)

* Image mood analysis (development of features capturing emotional content of visual information, design of classifiers for automatic labeling of assets)

* Assisted content creation and Image personalization (asset selection, features transfer, colour harmonization, etc.)

Responsibilities:

1. Inventing, implementing and evaluating novel imaging software.

2. Studying the state of the art, disseminate results on international conferences and journal papers, fulfill project deliverables.

3. Collaborating with other project partners in order to integrate the research results in a common environment/platform (sharing components, algorithms and methods).

Requirements

– PhD in Computer Science with a strong history of systems building and publishing

-Deep and substantial background on Pattern Recognition/Computer Vision and Image Processing

-Strong English-language written and oral communications skills

Expected start date: Mid June 2009

Type of contract: Temporary position – 18 months

To apply: Please send your CV and cover letter to: luca.marchesotti (at) xrce.xerox.com, xrce-candidates (at) xrce.xerox.com

PhD Position in Machine Translation and Speech Understanding (starting 09/09)

The PORT-MEDIA (ANR CONTINT 2008-2011) is a cooperative project sponsored by the French National Research Agency, between the University of Avignon, the University of Grenoble, the University of Le Mans, CNRS at Nancy and ELRA (European Language Resources Association). PORT-MEDIA will address the multi-domain and multi-lingual robustness and portability of spoken language understanding systems. More specifically, the overall objectives of the project can be summarized as:
– robustness: integration/coupling of the automatic speech recognition component in the spoken language understanding process.
– portability across domains and languages: evaluation of the genericity and adaptability of the approaches implemented in the understanding systems, and development of new techniques inspired by machine translation approaches.
– representation: evaluation of new rich structures for high-level semantic knowledge representation.

The PhD thesis will focus on the multilingual portability of speech understanding systems. For example, the candidate will investigate techniques to fast adapt an understanding system from one language to another and creating low-cost resources with (semi) automatic methods, for instance by using automatic alignment techniques and lightly supervised translations. The main contribution will be to fill the gap between the techniques currently used in the statistical machine
translation and spoken language understanding fields.

The thesis will be co-supervised by Fabrice Lefèvre, Assistant Professor at LIA (University of Avignon) and Laurent Besacier, Assistant Professor at LIG (University of Grenoble). The candidate will spend 18 months at LIG then 18 months at LIA.

The salary of a PhD position is roughly 1,300€ net per month. Applicants should hold a strong university degree entitling them to start a doctorate (Masters/diploma or equivalent) in a relevant discipline (Computer Science, Human Language Technology, Machine Learning, etc). The applicants should be fluent in English. Competence in French is optional, though applicants will be encouraged to acquire this skill during training. All applicants should have very good programming skills.

For further information, please contact Fabrice Lefèvre (Fabrice.Lefevre at univ-avignon.fr) AND Laurent Besacier (Laurent.Besacier at imag.fr).

MCBR-CDS09: CALL FOR PAPERS

CALL FOR PAPERS
MCBR-CDS 2009: Medical Content-based Retrieval for Clinical Decision Support
September 20th, 2009
London, UK
http://www.almaden.ibm.com/cs/projects/aalim/multimodal-decision.html

** Paper Submisions Due May 22th, 2008 **

——————-
Call for Papers
——————-

Diagnostic decision making (using images and other clinical data) is still very much an art for many physicians in their practices today due to a lack of quantitative tools and measurements. Traditionally, decision making has involved using evidence provided by the patient’s data coupled with a physician’s a priori experience of a limited number of similar cases. With advances in electronic patient record systems, a large number of pre-diagnosed patient data sets are now becoming available. These datasets are often multimodal consisting of images (x-ray, CT, MRI), videos and other time series, and textual data (free text reports and structured clinical data). Analyzing these multimodal sources for disease-specific information across patients can reveal important similarities between patients and hence their underlying diseases and potential treatments. Researchers are now beginning to use techniques of content-based retrieval to search for disease-specific information in modalities to find supporting evidence for a disease or to automatically learn associations of symptoms and diseases. Benchmarking frameworks such as ImageCLEF (Image retrieval track in the Cross-Language Evaluation Forum) have expanded over the past five years to include large medical image collections for testing various algorithms for medical image retrieval. This has made comparisons of several techniques for visual, textual, and mixed medical information retrieval as well as for visual classification of medical data possible based on the same data and tasks.
The goal of this workshop is to bring together researchers in medical imaging, medical image retrieval, data mining, text retrieval, and machine learning/AI communities to discuss new techniques of multimodal mining/retrieval and their use in clinical decision support. We are looking for original, high-quality submissions that address innovative research and development in the analysis, search and retrieval of multimodal medical data for use in clinical decision support. Further, to encourage a larger group of image analysis researchers to profit from the databases and evaluations created in the context of ImageCLEF, we will provide access to the medical databases and tasks of ImageCLEF 2009 which has obtained rights from RSNA to use over 70,000 images of the journals Radiology and Radiographics.

Topics for the workshop include but are not limited to:
–Mining of multimodal medical data (X-ray, MRI, CT, echo videos, time series data)
–Machine learning of disease correlations from mining multimodal data
–Algorithms for indexing and retrieval of data from multimodal medical databases
–Disease model-building and clinical decision support systems based on multimodal analysis
–Practical applications of clinical decision support using multimodal data retrieval or analysis
–Algorithms for medical image retrieval

————————–
Paper Submission
————————–
Prospective authors are invited to submit papers of not more than
eight(8) pages including results, figures and references. Please use
the MICCAI author kit to format the papers.

————————
Important Dates
————————
Paper submission deadline: May 22nd, 2009

Notification of acceptance: June 28th, 2009

Camera ready copy : July 20th, 2009

Workshop date: September 20th 2009