The main focus of the workshop is the problem of on-line learning when only limited feedback is available to the learner. In on-line learning, at each time step the learner has to predict the outcome corresponding to the next input based on the feedbacks obtained so far. Unlike the usual supervised problem, in which after each prediction the learner is revealed sufficient information to evaluate the goodness of all predictions he could have made, in many cases only limited feedback may be available to the learner. Depending on the nature of the limitation on the feedback, different classes of problems can be identified:

1. Delayed feedback. The utility of an action (i.e., the prediction) is returned only after a certain amount of time. This is the case of reinforcement learning and on-line control problems where the final outcome
of an action may be available only when a goal is finally achieved.

2. Partial feedback. The feedback is limited to that on the learner's prediction so that no information is available on what would other possible predictions bring. Multi-armed bandits, when only the utility of the pulled arm is returned to the learner, is the classic example for this.

3. Indirect feedback. Neither the true outcome, nor the utility of the prediction is observed. Only an indirect feedback loosely related to the prediction is returned.

The increasing interest in on-line learning with limited feedback is also motivated by a number of applications, such as recommender systems, web advertisement systems, in which the user's feedback is limited to accepting/ignoring the proposed item, and the true label (i.e., the item the user would prefer the most) is never revealed to the learner.

Goals

Although some aspects of on-line learning with limited feedback have been already thoroughly analyzed (e.g., multi-armed bandit problems), many problems are still open. For instance, bandits with large action spaces and side information, learning with delayed reward, on-line optimization, etc., are of primary concern in many recent works on on-line learning. Furthermore, on-line learning with limited feedback has strong connections with a number of other fields of Machine Learning such as active learning, semi-supervised learning, and multi-class classification.
The goal of the workshop is to provide researchers with the possibility to present their current research on these topics and to encourage the discussion about the main open issues and the possible connections between the different sub-fields.In particular, we expect the workshop to shed light on a number of theoretical issues, such as:

  1. how does the performance of learning algorithms scale in either large (e.g., infinity number of arms, either numerable or continuum, or in metric or measurable spaces) or changing action spaces?
  2. how does the performance of learning algorithms scale depending on the smoothness of the function to be optimized (Lipschitz, linear, convex, non convex)?
  3. what are the connections between the MDP reinforcement learning paradigm and the on-line learning problem with delayed feedback?
  4. how to define complexity measures for on-line learning with limited feedback?
  5. is it possible to define a unified view on the problem of learning with delayed, partial, and indirect feedback?

Call for Participation

The organizing committee would like to invite the submission of extended abstracts (three to four pages in the conference format plus appendix if needed) describing research on (but not restricted to) the following topics:

  • adversarial/stochastic bandits
  • bandits with side information (contextual bandits, associative RL)
  • bandits with large and/or changing action spaces
  • on-line learning with delayed feedback
  • on-line learning in MDPs and beyond
  • partial monitoring prediction
  • on-line optimization (Lipschitz, linear, convex, non-convex)
  • on-line learning in games
  • applications

Organisation Committee

  • Jean-Yves Audibert (Certis-Université Paris Est-Ecole des Ponts ParisTech)
  • Peter Auer (University of Leoben)
  • Sebastien Bubeck (INRIA - Team SequeL)
  • Alessandro Lazaric (INRIA - Team SequeL) - (primary contact)Odalric Maillard (INRIA - Team SequeL)
  • Remi Munos (INRIA - Team SequeL)
  • Daniil Ryabko (INRIA - Team SequeL)
  • Csaba Szepesvari (University of Alberta)