Funded PhD position in Sequential Learning with Similarities, SequeL team, INRIA, Lille, France

Funded PhD position in Sequential Learning with Similarities, SequeL team, INRIA, Lille, France

Description: The goal of this PhD position is to design and analyze efficient algorithms for decision making under uncertainty. Specifically, this position will focus on cases when the possible actions are somehow related. This extra information can be provided in a form of weighted graph or a similarity metric. The aim is to provide provably optimal algorithms that can be applied in large-scale scenarios, such as movie recommendation or social networks. The purpose is to minimize feedback that we need to give the algorithm in order to make it “intelligent”. In other words, we want the decision-making algorithms that converge fast to nearly-optimal solutions with minimal feedback (samples). For this purpose, we often need to learn about the environment (explore) at the same time as choosing the currently most promising option (exploit). This requires careful allocation of resources, which could be financial costs, CPU time, or a human effort.
Keywords: machine learning, graph-based learning, exploration-exploitation tradeoff, bandit algorithms, learning with similarities, minimal feedback
Objectives: The multi-armed bandit problem is a simple model to study the trade-off between exploration and exploitation. While simple enough, it displays most of the issues that arise in a variety of decision-making problems under uncertainty. The following problems are the most pertinent to the proposed research program:
• Contextual bandits: This setting is an extension of the multi-armed bandit problem where the best decision depends on the information (context) that is repeatedly given to the decision maker [S09]. Selecting relevant web advertisement or the news feed recommendations are the most common applications of this setting [LCLS10].
• Bandits on Graphs: In many problems, the similarities between the decisions are provided in the form of a graph that relates the pairs of the decisions (nodes), potentially with a weight. This can refer to connections in the social networks [CKLB12] or more general combinatorial problems [BL12, YM11]. Another use is in the recommender systems where we want to discover user preferences (assumed to be smooth on a given graph) with minimal number of queries.
• Stochastic optimization: The optimization of a noisy objective function is a very general problem whose difficulty strictly depends on the properties of the function itself (e.g., linear [AHR08], Lipschitz [BMSS08], submodular [HK12]) and the space the function is defined on (e.g., finite support, continuous). What other interesting settings are (functions and spaces) and what set of assumptions is really needed to successfully optimize the function are issues currently under investigation.
Job Description: The PhD candidate will focus on one or more issues related to the problem of learning and decision-making under uncertainty with some similarity information. The PhD candidate will first acquire expertise in different topics of machine learning such as online learning, multi-armed bandit, statistical learning theory, and graph-based learning. The candidate is then expected to contribute to the advancement of the literature on this problem along many different lines: methodological (e.g., definition of general abstract models for a wide range of decision-making problems), theoretical (e.g., near optimality performance guarantees), and algorithmic (e.g., development of novel algorithms for specific decision-making problems). The candidate will work with Michal Valko (, Remi Munos, and other members of the lab. The applicant will also have the opportunity to collaborate with researchers in several countries in Europe and USA.

Requirements: The successful candidate will have a MSc or equivalent degree in computer science with a strong background in theory or in mathematics. Programming skills will be considered a plus. The working language in the lab is English, a good written and oral communication skill are required.
• Application closing date: Spring 2013
• Duration: 3 years (a full time position)
• Starting date: October 1st, 2013
• Supervisors: Michal Valko and Remi Munos
• Place: SequeL, INRIA Lille – Nord Europe
About INRIA: SequeL ( is one of the most dynamic labs at INRIA (, with over 25 researchers and PhD students working on both fundamental and practical aspects of sequential learning problems: from statistical learning, through reinforcement learning, to games. Lille is the capital of the north of France, a metropolis with 1 million inhabitants, with excellent train connection to Brussels (30 min), Paris (1h) and London (1h30). Established in 1967, Inria is the only public research body fully dedicated to computational sciences. Combining computer sciences with mathematics, Inria’s 3,400 researchers strive to invent the digital technologies of the future. Educated at leading international universities, they creatively integrate basic research with applied research and dedicate themselves to solving real problems, collaborating with the main players in public and private research in France and abroad and transferring the fruits of their work to innovative companies. The researchers at Inria published over 4000 articles a year. They are behind over 270 active patents and 105 start-ups. The 171 project teams are distributed in eight research centers located throughout France.

• Duration: 36 months – starting date of the contract : October 2013, 15th
• Salary: 1957,54 € the first two years and 2058,84 € the third year
• Salary after taxes: around 1597,11€ the 1st two years and 1679,76 € the 3rd year (benefits included).
• Possibility of French courses
• Help for housing
• Participation for public transport
• Scientific Resident card and help for husband/wife visa
• [AHR08] Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings of the 21st Annual Conference on Learning Theory (COLT’08), 2008.
• [BL12] N. Cesa-Bianchi and G. Lugosi Combinatorial bandits Journal of Computer and Systems Sciences, 78:1404-1422, 2012.
• [BMSS08] S. Bubeck and R. Munos and G. Stoltz and Cs. Szepesvari. Online Optimization of {X}-armed Bandits. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’08), 2008.
• [BMS09] S. Bubeck and R. Munos and G. Stoltz. Pure Exploration in Multi-Armed Bandits Problems. Proceedings of the 20th International Conference on Algorithmic Learning Theory (ALT’09), 2009.
• [CKLB12] Stephane Caron, Branislav Kveton, Marc Lelarge, and Smriti Bhagat. Leveraging Side Observations in Stochastic Bandits. Uncertainty in Artificial Intelligence, 2012.
• [HK12] Elad Hazan, Satyen Kale. Online Submodular Minimization. Journal of Machine learning Research (JMLR), 13(Oct):2903−2922, 2012.
• [LCLS10] Lihong Li, Wei Chu, John Langford, Robert E Schapire. A Contextual-Bandit Approach to Personalized News Article Recommendation WWW 10, Volume: 173 (2010)
• [S09] Contextual Bandits with Similarity Information Aleksandrs Slivkins Proceedings of the 24th annual Conference On Learning Theory, Issue: June (2009)
• [YM11] J-Y Yu and S. Mannor. Unimodal Bandits. International Conference on Machine Learning (ICML), 2011
This call is posted at
For further information please send an email to as soon as possible.

Michal Valko (equipe SequeL)

Parc scientifique de la Haute Borne, 40 Avenue Halley, Bât A – Park Plaza
59650 Villeneuve d’Ascq, Office #5 Tel : +33 (0)3 59 57 7801
Suivez‐nous sur Twitter et YouTube