Introduction

Touch Clarity (www.touchclarity.com) provides real­time optimisation of websites. Touch Clarity chooses, from a number of options, the most popular content to display on a page. This decision is made by tracking how many visitors respond to each of the options, by clicking on them. This is a direct commercial application of the multi­armed bandit problem – each of the items which might be shown is a separate bandit, with a separate response rate.

As in the multi­armed bandit problem, there is a trade­off between exploration and exploitation – it is necessary to sometimes serve items other than the most popular in order to measure their response rate with sufficient precision to correctly identify which is the most popular. However, in this application there is a further complication – typically the rates of response to each item will vary over time, so continuous exploration is necessary in order to track this time variation, as old knowledge becomes out ­of­ date. An extreme example of this might be in choosing which news story to serve as the main story on a news page – interest in one story will decrease over time while interest in another will increase. In addition, the interest in several stories might vary in a similar, coherent way – for example a general increase in interest in sports stories at weekends, or in political stories near to an election. So there are typically two types of variation to consider – where response rates vary together, and where response rates vary completely independently.

The Challenge

The challenge aims to stimulate research into the optimal way to trade­off exploration vs exploitation where the response rate for each option is varying over time. Instead of a static data set, data for the challenge is provided in the form of a visitor behaviour simulation engine, written in java, and also made available as a Matlab function.

Entrants should write a program which will repeatedly choose from a range of possible display options, and query the visitor simulation engine through an interface e.g.

public Boolean selectOption(int option)The engine will return true or false, depending on whether the simulated visitor responded to the option that was served to them. The objective is to maximize the number of positive responses returned by the engine. The response rate for each option varies over time. The visitor simulation engine maintains time ­dependent continuous functions for the response rate of each of the options. When the engine is queried with a given option, it calculates the value of the function at that time. The engine returns true with a probability given by this value. The time variable used is the number of requests made to the engine, rather than real time. This is to remove dependencies on the machine hardware, and application performance of the entrants. In this application there is no fixed horizon, that is to say the number of trials is not fixed, and so solutions should not use the number of trials as a parameter.

To prevent the engine from repeating the same behaviour every time it is restarted, the response rate functions are parameterized. On restart, new values for these parameters are randomly generated based on a seed.

The challenge is broken into a number of individual tasks each relating to a particular problem that is frequently encountered in real website data. The engine addresses each of these problems in turn.

Tasks

There are now six tasks, the old tasks are now invalid. In each task there are 5 options. The best achievable response rate is approximately 0.18 for all tasks. All tasks are designed to display the appropriate behaviour over 1M requests. (Although during the judging phase more may be required.)

Task 1. SimulatedVisitorFrequentSwap

To check a candidate algorithm can respond to relatively frequent changes of the best option.

Task 2. SimulatedVisitorLongGaussians

To check a candidate algorithm can respond to changes of the best option over longer time periods.

Task 3. SimulatedVisitorWeeklyVariation

To check a candidate algorithm can handle coherent variation of response rates where there are two sinusoidal components to the variation of differing periods, the longer period being dominant. In addition, there is a gradual change in the relative ranking of the option. This is to ensure there is ongoing exploration throughout the run and that it can handle the consequent comparison of option response rates that have been served at different times. In addition, each option has a randomly chosen start time.

Task 4. SimulatedVisitorDailyVariation

To check a candidate algorithm can handle coherent variation of response rates where there are two sinusoidal components to the variation of differing periods, the shorter period being dominant.

Task 5. SimulatedVisitorWeeklyCloseVariation

To check a candidate algorithm can handle coherent variation where response rates are closely spaced.

Task 6. SimulatedVisitorConstant

We must ensure a candidate algorithm can handle the ideal case. An over exploratory algorithm can degrade to random performance under these conditions.

Organisers

Zakria Hussain, University College, London
Peter Auer, University of Leoben
Nicolò Cesa-Bianchi, Università degli Studi di Milano
Leonard Newnham, Touch Clarity
John Shawe-Taylor, University College, London

Submission of Results

Results can be submitted at any time via the submit tab, a set of random seeds is available here. Run your code using these seeds with each run of length 1M requests.

Judging

During the judging phase of the challenge, a further evaluation will take place for each algorithm. This is to ensure candidate algorithms can perform well over a variety of conditions. A second set of random seeds will be published along with a different run length. Each entrant will be re-evaluated against these seeds and run length. The winner of the challenge will be the entrant who obtains the lowest total regret on all six tasks. There is no longer the option of submitting for one task only.

Touch Clarity reserves the right to disqualify entrants if any of the following rules are broken:

  • No algorithm should use the run length as an input parameter. In a real application optimisation does not terminate after a particular number of requests.
  • Algorithms should NOT use information about the shape of the response functions obtained during the course of a run. The response function information available from the csv files is to assist in improving the performance of algorithms only. The sort of solution we are trying to avoid is “if resp rates constant over first 1000 responses then use USB otherwise use SomeOtherAlgorithm”.

 

Java code download

Java-code1.2.zip (version 1.2 released 21st May 2006)

There is an API written in Java which is distributed as a .jar file (.class executables only) and some baseline Java code that implements some simple multi-armed bandit problem algorithms. You will need at least Java version 1.5 to run the Java code. You can compile the baseline code like so:

javac -classpath challenge1.2.jar applicationtest/*.java applicationtest/simplewindowprovider/*.java

After this you can run ApplicationTest like so:

java -classpath .;challenge1.2.jar -Xmx256m applicationtest/ApplicationTest

OR

java -classpath :challenge1.2.jar -Xmx256m applicationtest/ApplicationTest

The Java code contains provider classes and several (bandit) algorithms written using the provider classes. ApplicationTest calls these algorithms and runs them on all 6 tasks. All participants will need to write their algorithm as a provider class. Please look at the provider classes in the sample Java code to understand how to do this. The code contains very simplistic algorithms for bandit problems when arms do not vary over time. However, please note that the challenge is for the case where the rewards for an arm varies over time.

Version 1.1 contains:

      1. Six new tasks
      2. detailed output every 1000 requests to csv file. This contains:
        1. the underlying response rates of the options in the task, and
        2. the proportion of serves each option is receiving

This allows the user to construct clear graphical views of how an algorithm is performing over time – see for example

UCBWindowProvider.JPG

    .

  1. An efficiency metric has been added to getStatistics(). This ranges from 0 (equivalent to random serving) to 1 (equalling the maximum theoretical success rate).

Version 1.2 contains

  1. UCBWindowProvider – a windowed version of UCB
  2. Improved regret summary
  3. improved commandline args:
    1. -h help
    2. -r number of requests
    3. -s random seed set
    4. -i produce results on interim seed set. The summary printed at the end of the test is sufficient for a submission
  4. better javadoc of applicationtest and challenge1.2.jar

Here is the javadoc1.2 for the challenge1.2.jar package.

Matlab code download

Matlab-code1.2.zip (version 1.2 released 21st May 2006)

The Matlab code is a wrapper for the Java API but does not contain any baseline code like that available for the Java download. The Matlab wrapper calls in the challenge1.2.jar executables and all its functions can be used as in Java. Please read the documentation (preamble) in the matlab file in order to set it up to run correctly.

 

Results of Phase 2 tasks

THE WINNING ALGORITHM WILL BE THE ONE THAT IS RANKED NUMBER 1 IN THE “TOTAL AVERAGE REGRET FOR TASKS 1-6” TABLE USING THE TEST SEEDS.

Total Average Regret for tasks 1-6

Rank
Algorithm
Name
Institution
Total Average Regret
Validation Seeds
Test Seeds
1
MetaUCBFC Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 34599.203 34202.5
2
ProviderDiscountedUCB Thomas Jaksch University of Leoben 35071.0 36073.797
3
ProviderExp4DWrapper Thomas Jaksch University of Leoben 35120.0 36253.9
4
UCB1DiscountTuned Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 37066.2 37117.8
5
MetaUCB Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 35532.4 37282.203
6
UCB1DiscountTuned_C Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 36948.3 37744.7
7
UCB1DiscountTuned_B Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 37356.2 38194.6
8
TaoUcbForgetPascalChallenge Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 39605.4 43969.7
9
UCBWindowProvider Baseline code 180301.81 179246.31
10
ShiftingBanditProvider Baseline code 211875.2 211107.8
11
ProviderUCB Baseline code 222884.5 249174.6
12
RandomProvider Baseline code 322342.7 318739.6

 

Task 1. SimulatedVisitorFrequentSwap

Rank
Algorithm
Name
Institution
Validation Seeds
Test Seeds
Average Regret
STD
Average Regret
STD
1
ProviderExp4DWrapper Thomas Jaksch University of Leoben 9183.5 614.5174 9133.0 508.81235
2
MetaUCBFC Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 11052.9 1126.048 9906.5 780.7754
3
ProviderDiscountedUCB Thomas Jaksch University of Leoben 10198.4 933.2443 10697.9 1380.2686
4
UCB1DiscountTuned_C Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 12059.8 884.5122 11983.7 618.63525
5
TaoUcbForgetPascalChallenge Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 12036.0 658.0817 12057.5 879.0806
6
UCB1DiscountTuned_B Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 11992.7 618.4238 12303.9 546.7468
7
UCB1DiscountTuned Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 12152.8 701.47064 12330.8 766.26135
8
MetaUCB Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 15298.1 1714.1592 15265.4 3004.444
9
ProviderUCB Baseline code 32832.0 942.6005 32392.5 828.2142
10
UCBWindowProvider Baseline code 36098.6 1702.405 35564.0 1964.2234
11
ShiftingBanditProvider Baseline code 40806.8 1312.9364 40947.7 1463.947
12
RandomProvider Baseline code 48835.7 460.32596 49054.1 503.07773

 

Task 2. SimulatedVisitorLongGaussians

Rank
Algorithm
Name
Institution
Validation Seeds
Test Seeds
Average Regret
STD
Average Regret
STD
1
ProviderDiscountedUCB Thomas Jaksch University of Leoben 3032.5 467.23615 3350.2 451.85638
2
UCB1DiscountTuned_C Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 3854.6 393.24185 4020.2 678.94574
3
UCB1DiscountTuned_B Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 3877.9 328.87567 4031.0 400.47943
4
UCB1DiscountTuned Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 3819.4 331.3753 4133.9 631.6608
5
MetaUCBFC Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 4550.2 1560.2434 4561.1 975.12006
6
ProviderExp4DWrapper Thomas Jaksch University of Leoben 4767.9 389.79468 4622.5 283.90424
7
MetaUCB Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 4817.5 1670.4481 5636.5 1254.3213
8
TaoUcbForgetPascalChallenge Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 7352.9 2036.1892 8613.9 1643.2212
9
ShiftingBanditsProvider Baseline code 47240.3 3023.0012 46586.8 3428.574
10
ProviderUCB Baseline code 51547.9 10747.207 68992.3 29760.65
11
UCBWindowProvider Baseline code 64994.6 14387.951 69414.7 25192.545
12
RandomProvider Baseline code 113337.1 7096.978 109707.5 7496.9795

Task 3. SimulatedVisitorWeeklyVariation

Rank
Algorithm
Name
Institution
Validation Seeds
Test Seeds
Average Regret
STD
Average Regret
STD
1
ProviderDiscountedUCB Thomas Jaksch University of Leoben 3947.0 559.38635 3651.8 393.22647
2
UCB1DiscountTuned Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 4385.6 545.81683 4451.5 523.6664
3
UCB1DiscountTuned_B Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 4647.5 515.3112 4533.6 249.49202
4
MetaUCB Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 4534.5 723.832 4558.7 735.7625
5
UCB1DiscountTuned_C Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 4486.4 463.7919 4676.3 703.2815
6
MetaUCBFC Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 4764.3 556.171 5213.6 541.3687
7
ProviderExp4DWrapper Thomas Jaksch University of Leoben 7511.4 1893.0953 7999.3 673.5912
8
TaoUcbForgetPascalChallenge Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 7084.7 950.5174 8206.8 787.7223
9
UCBWindowProvider Baseline code 29055.3 11124.4 21102.5 12444.306
10
ShiftingBanditsProvider Baseline code 40649.8 212.50716 40609.1 220.46388
11
RandomProvider Baseline code 57238.3 308.08838 57185.8 253.00189
12
ProviderUCB Baseline code 61824.0 2355.0454 63349.5 934.28564

 

Task 4. SimulatedVisitorDailyVariation

Rank
Algorithm
Name
Institution
Validation Seeds
Test Seeds
Average Regret
STD
Average Regret
STD
1
UCB1DiscountTuned_B Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 4728.9 883.0477 5070.2 947.57245
2
UCB1DiscountTuned Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 5751.8 1262.6565 5097.0 1017.7952
3
ProviderExp4DWrapper Thomas Jaksch University of Leoben 4884.9 492.63092 5140.2 365.1742
4
UCB1DiscountTuned_C Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 4751.9 1495.7024 5253.9 1112.1284
5
MetaUCBFC Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 5870.1 617.5396 6179.0 697.5103
6
MetaUCB Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 5657.9 712.6532 6188.6 1034.8442
7
TaoUcbForgetPascalChallenge Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 5946.6 786.82513 7897.1 4283.818
8
ProviderDiscountedUCB Thomas Jaksch University of Leoben 7253.1 774.3874 8043.3 1215.4939
9
UCBWindowProvider Baseline code 30567.3 17000.268 34363.6 19650.63
10
ShiftingBanditsProvider Baseline code 42619.6 205.62112 42662.3 194.08247
11
RandomProvider Baseline code 57109.8 238.72942 57112.7 220.19237
12
ProviderUCB Baseline code 54405.2 9520.59 61281.9 6575.8525

 

Task 5. SimulatedVisitorWeeklyCloseVariation

Rank
Algorithm
Name
Institution
Validation Seeds
Test Seeds
Average Regret
STD
Average Regret
STD
1
UCB1DiscountTuned Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 4947.8 375.51675 4981.1 498.89618
2
ProviderDiscountedUCB Thomas Jaksch University of Leoben 5137.5 466.3166 5052.9 450.37573
3
MetaUCB Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 4656.4 555.3968 5088.1 739.60144
4
MetaUCBFC Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 5280.0 654.064 5162.6 756.14844
5
UCB1DiscountTuned_C Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 5440.7 326.4149 5243.8 421.37152
6
UCB1DiscountTuned_B Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 5584.6 286.54074 5634.7 401.08328
7
TaoUcbForgetPascalChallenge Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 6813.1 804.2061 6722.4 845.91504
8
ProviderExp4DWrapper Thomas Jaksch University of Leoben 7119.8 1833.6073 7557.7 1736.7368
9
UCBWindowProvider Baseline code 15240.1 2971.088 14502.9 4624.025
10
ShiftingBanditsProvider Baseline code 21690.1 364.62173 21514.2 245.27754
11
ProviderUCB Baseline code 21909.6 2055.3625 22709.6 3313.902
12
RandomProvider Baseline code 25764.2 339.47043 25654.9 343.23346

 

Task 6. SimulatedVisitorConstant

Rank
Algorithm
Name
Institution
Validation Seeds
Test Seeds
Average Regret
STD
Average Regret
STD
1
ProviderUCB Baseline code 365.8 74.90705 448.8 128.11522
2
TaoUcbForgetPascalChallenge Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 372.1 82.51525 472.0 142.9662
3
MetaUCB Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 568.0 313.2266 544.9 257.33353
4
ProviderExp4DWrapper Thomas Jaksch University of Leoben 1652.5 228.85185 1801.2 176.14943
5
MetaUCBFC Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly Univ Paris XI Orsay – lri 3081.7 170.6159 3179.7 340.06668
6
UCBWindowProvider Baseline code 4345.9 250.39633 4298.6 278.8549
7
ProviderDiscountedUCB Thomas Jaksch University of Leoben 5502.5 491.91965 5277.7 426.59714
8
UCB1DiscountTuned Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 6008.8 453.98135 6123.5 396.33356
9
UCB1DiscountTuned_C Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 6354.9 437.75983 6566.8 449.95624
10
UCB1DiscountTuned_B Flóra Adorján & György Ottucsák Budapest University of Technology and Economics 6524.6 347.51184 6621.2 335.8319
11
ShiftingBanditProvider Baseline code 18868.6 104.250984 18787.7 135.26276
12
RandomProvider Baseline code 20057.6 104.72313 20024.6 152.13971

Old Phase 1 tasks

These are the old tasks from phase 1 and are no longer used for phase 2.

Task 1

Rank Algorithm Name Institution STD Average Regret
1 UCBg Levente Kocsis mta sztaki 34.7 52.1
2 ProviderUCB (Baseline) 24.28 65.6
3 Mean/variance regret Craig Saunders University of Southampton 61.28 89.3
4 ShiftingBanditProvider (Baseline) 21.58 141.2
5 RandomProvider (Baseline) 25.42 151.4

Task 2

Rank Algorithm Name Institution STD Average Regret
1 UCBgper Levente Kocsis mta sztaki 803.1 627.4
2 ShiftingBanditsProvider (Baseline) 206.53 1839.3
3 Stupid Switcher Craig Saunders University of Southampton 334.27 3669.3
4 Experts Prediction Cedric Hartland Laboratoire de Recherche en Informatique 1283.36 6318.6
5 ProviderUCB (Baseline) 4045.41 13270.7
6 UCBW Levente Kocsis mta sztaki 2298.62 18430.1
7 UCBg Levente Kocsis mta sztaki 8367.2 23552.7
8 RandomProvider (Baseline) 3153.57 27434.2

Task 3

Rank Algorithm Name Institution STD Average Regret
1 Mean/variance regret Craig Saunders University of Southampton 4.95 10.3
2 UCBS Levente Kocsis mta sztaki 10.39 15.3
3 UCBg Levente Kocsis mta sztaki 17.03 26.3
4 ProviderUCB (Baseline) 11.85 80.7
5 ShiftingBanditsProvider (Baseline) 40.24 765.2
6 RandomProvider (Baseline) 41.14 892.5