Technological advances to profile medical patients have led to a change of paradigm in medical prognoses. Medical diagnostics carried out by medical experts is increasingly complemented by large-scale data collection and quantitative genome-scale molecular measurements. Data that are already available as of today or are to enter medical practice in the near future include personal medical records, genotype information, diagnostic tests, proteomics and other emerging ‘omics’ data types.

This rich source of information forms the basis of future medicine and personalized medicine in particular. Predictive methods for personalized medicine allow to integrate these data specific for each patient (genetics, exams, demographics, imaging, lab, genomic etc.), both for improved prognosis and to design an individual-specific optimal therapy.

However, the statistical and computational approaches behind these analyses are faced with a number of major challenges. For example, it is necessary to identify and correcting for structured influences within the data; dealing with missing data and the statistical challenges that come along with carrying out millions of statistical tests. Also, to render these methods useful in practice computational efficiency and scalability to large-scale datasets are an integral requirement. Finally, any computational approach needs to be tightly integrated with medical practice to be actually used and the experiences gained need to be fed back into future development and improvements. To both address these technical difficulties ahead and to allow for an efficient integration and application in a medical context, it is necessary to bring the communities of statistical method developers, medics and biological investigators together.

Purpose of Workshop

The purpose of this 2nd cross-discipline workshop is to bring together machine learning, statistical genetics and healthcare researchers interested in problems and applications of predictive models in the field of personalized medicine. The goal of the workshop will be to bridge the gap between the theory of predictive models and statistical genetics with respect to medical applications and the pressing needs of the healthcare community. The workshop will promote an exchange of ideas, helping to identify important and challenging applications as well as the discovery of possible synergies. Ideally, we hope that such discussion will lead to interdisciplinary collaborations with resulting collaborative grant submissions. The emphasis will be on the statistical and engineering aspects of predictive models and how it relates to practical medical and biological problems.

Although related in a broad sense, the workshop does not directly overlap with the fields of Bioinformatics and Biostatistics. While predictive modeling for healthcare has been explored by biostatisticians for several decades, the focus of this workshop is on substantially different needs and problems that are better addressed by modern machine learning technologies. For example, how should we organize clinical trials to validate the clinical utility of predictive models for personalized therapy selection? How can we integrate and combine heterogenious data while accounting for confounding influences? How can we ensure computational efficiency that render these methods useful in practice? The focus of this workshop will be methods to address these and related questions.

The focus is not on questions of basic science; rather, we will focus on predictive models that combine available patient data while resolving the technical and statistical challenges through modern machine learning. The workshop program will combine presentations by invited speakers from both machine learning, statistical genetics and personalized medicine fields and by authors of extended abstracts submitted to the workshop. In addition, we will reserve sufficient room for discussion both in the forms of an open panel as well as in the context of poster presentations.