Exploration and Exploitation in content optimisation

The two challenges held during PASCAL2 build on the success of the Exploration vs exploitation challenge run in PASCAL1. This challenge considered the standard bandit problem but with response rates changing over time. Despite the apparent simplicity of this challenge it inspired a range of very important developments including the UCT (Upper Confidence Tree) algorithm and its successful application to artificial Go in the award winning MoGo system. The earlier challenge included a £1000 award to the winner. The later challenges built on the earlier challenge in two important respects. Firstly, they considered so-called multi-variate bandits, that is bandits where the visitor/arm combination have associated features that are expected potentially to enable more accurate prediction of the response probability for that combination. Secondly, the data was drawn from a real-world dataset of advertisement (banner) placement on webpages with the response corresponding to click-through by the user. The multi-variate bandit problem represents an important stepping stone towards more complex problems involving delayed feedback, such as reinforcement learning. It involves a single state, but by involving the additional features takes significantly closer to standard supervised learning when compared to the simple bandits considered in the first challenge. The ability to respond accurately and bound performance for such systems is an important step towards a key component that can be integrated into cognitive systems, one of the major goals of the PASCAL network.

Knowledge 4 All Foundation Ltd.