Today's data-driven society is full of large-scale datasets, e.g., images from the web, sequence data from the human genome, graphs representing friendship networks, time-series data of stock prices, speech corpora of news broadcasts, etc. In the context of machine learning, these datasets are often represented by large matrices representing either a set of real-valued features for each datapoint or pairwise similarities between datapoints. Hence, modern learning problems in computer vision, natural language processing, computational biology, and other areas often face the daunting task of storing and operating on matrices with thousands to millions of entries. An attractive solution to this problem involves working with low-rank approximations of the original matrix. Low-rank approximation is at the core of widely used algorithms such as Principle Component Analysis, Multidimensional Scaling, Latent Semantic Indexing, and manifold learning. Furthermore, low-rank matrices appear in a wide variety of applications including lossy data compression, collaborative filtering, image processing, text analysis, matrix completion and metric learning.
In the past several years, there has been a growing body of research devoted to developing randomized methods to efficiently and accurately generate low-rank approximations to large matrices. This topic has been studied in a wide range of disciplines including numerical linear algebra, theoretical computer science and machine learning. Although many of these techniques have extensive theoretical guarantees and empirical success on a limited number of datasets, there nonetheless remains a gap between these (mostly theoretical) advances and practical implementations of large-scale machine learning problems. Within the past few years, however, there have been significant advances in making these theoretical tools practical for large-scale machine learning. For example, high- quality numerical implementations have been developed, ensemble methods have been introduced to improve performance, and these techniques have been used to solve outstanding problems in human genetics and computer vision.
In this workshop, we aim to survey recent developments with an emphasis on usefulness for practical large-scale machine learning problems and to provide a forum for researchers to discuss several important questions associated with randomized low-rank approximation techniques, including: What are the state-of- the-art approximation techniques? How does the heterogeneity of data affect the randomization aspects of these algorithms? Which methods are appropriate for various machine learning tasks? How do these methods work in practice for large-scale machine learning tasks? What is the tradeoff between numerical precision and time/space efficiency in the context of machine learning performance, e.g., classification or clustering accuracy? In summary, we are interested in exploring the impact of low-rank methods for large-scale machine learning. We will study new algorithms, recent theoretical advances and large- scale empirical results, and more broadly we will motivate additional interesting scenarios for use of low-rank approximations for learning tasks.
Workshop Organizers
- Arthur Gretton, University College London - Gatsby
- Michael Mahoney, Stanford University
- Mehryar Mohri, Courant Institute (NYU) and Google Research
- Ameet Talwalkar, UC Berkeley
Program Committee
- Alexandre d'Aspremont, Princeton University
- Christos Boutsidis, Rensselear Polytechnic Institute
- Kamilika Das, NASA Ames Research Center
- Maryam Fazel, Unversity of Washington
- Michael I. Jordan, UC Berkeley
- Sanjiv Kumar, Google Research New York
- James Kwok, Hong Kong University of Science and Technology
- Gunnar Martinsson, University of Colorado Boulder