Introduction

Touch Clarity (www.touchclarity.com) provides realtime optimisation of websites. Touch Clarity chooses, from a number of options, the most popular content to display on a page. This decision is made by tracking how many visitors respond to each of the options, by clicking on them. This is a direct commercial application of the multiarmed bandit problem – each of the items which might be shown is a separate bandit, with a separate response rate.

As in the multiarmed bandit problem, there is a tradeoff between exploration and exploitation – it is necessary to sometimes serve items other than the most popular in order to measure their response rate with sufficient precision to correctly identify which is the most popular. However, in this application there is a further complication – typically the rates of response to each item will vary over time, so continuous exploration is necessary in order to track this time variation, as old knowledge becomes out of date. An extreme example of this might be in choosing which news story to serve as the main story on a news page – interest in one story will decrease over time while interest in another will increase. In addition, the interest in several stories might vary in a similar, coherent way – for example a general increase in interest in sports stories at weekends, or in political stories near to an election. So there are typically two types of variation to consider – where response rates vary together, and where response rates vary completely independently.

The Challenge

The challenge aims to stimulate research into the optimal way to tradeoff exploration vs exploitation where the response rate for each option is varying over time. Instead of a static data set, data for the challenge is provided in the form of a visitor behaviour simulation engine, written in java, and also made available as a Matlab function.

Entrants should write a program which will repeatedly choose from a range of possible display options, and query the visitor simulation engine through an interface e.g.

public Boolean selectOption(int option)The engine will return true or false, depending on whether the simulated visitor responded to the option that was served to them. The objective is to maximize the number of positive responses returned by the engine. The response rate for each option varies over time. The visitor simulation engine maintains time dependent continuous functions for the response rate of each of the options. When the engine is queried with a given option, it calculates the value of the function at that time. The engine returns true with a probability given by this value. The time variable used is the number of requests made to the engine, rather than real time. This is to remove dependencies on the machine hardware, and application performance of the entrants. In this application there is no fixed horizon, that is to say the number of trials is not fixed, and so solutions should not use the number of trials as a parameter.

To prevent the engine from repeating the same behaviour every time it is restarted, the response rate functions are parameterized. On restart, new values for these parameters are randomly generated based on a seed.

The challenge is broken into a number of individual tasks each relating to a particular problem that is frequently encountered in real website data. The engine addresses each of these problems in turn.

Tasks

There are now six tasks, the old tasks are now invalid. In each task there are 5 options. The best achievable response rate is approximately 0.18 for all tasks. All tasks are designed to display the appropriate behaviour over 1M requests. (Although during the judging phase more may be required.)

Task 1. SimulatedVisitorFrequentSwap

To check a candidate algorithm can respond to relatively frequent changes of the best option.

Task 2. SimulatedVisitorLongGaussians

To check a candidate algorithm can respond to changes of the best option over longer time periods.

Task 3. SimulatedVisitorWeeklyVariation

To check a candidate algorithm can handle coherent variation of response rates where there are two sinusoidal components to the variation of differing periods, the longer period being dominant. In addition, there is a gradual change in the relative ranking of the option. This is to ensure there is ongoing exploration throughout the run and that it can handle the consequent comparison of option response rates that have been served at different times. In addition, each option has a randomly chosen start time.

Task 4. SimulatedVisitorDailyVariation

To check a candidate algorithm can handle coherent variation of response rates where there are two sinusoidal components to the variation of differing periods, the shorter period being dominant.

Task 5. SimulatedVisitorWeeklyCloseVariation

To check a candidate algorithm can handle coherent variation where response rates are closely spaced.

Task 6. SimulatedVisitorConstant

We must ensure a candidate algorithm can handle the ideal case. An over exploratory algorithm can degrade to random performance under these conditions.

Organisers

Zakria Hussain, University College, London
Peter Auer, University of Leoben
Nicolò Cesa-Bianchi, Università degli Studi di Milano
Leonard Newnham, Touch Clarity
John Shawe-Taylor, University College, London

Submission of Results

Results can be submitted at any time via the submit tab, a set of random seeds is available here. Run your code using these seeds with each run of length 1M requests.

Judging

During the judging phase of the challenge, a further evaluation will take place for each algorithm. This is to ensure candidate algorithms can perform well over a variety of conditions. A second set of random seeds will be published along with a different run length. Each entrant will be re-evaluated against these seeds and run length. The winner of the challenge will be the entrant who obtains the lowest total regret on all six tasks. There is no longer the option of submitting for one task only.

Touch Clarity reserves the right to disqualify entrants if any of the following rules are broken:

No algorithm should use the run length as an input parameter. In a real application optimisation does not terminate after a particular number of requests.
Algorithms should NOT use information about the shape of the response functions obtained during the course of a run. The response function information available from the csv files is to assist in improving the performance of algorithms only. The sort of solution we are trying to avoid is “if resp rates constant over first 1000 responses then use USB otherwise use SomeOtherAlgorithm”.

Java code download

Java-code1.2.zip (version 1.2 released 21st May 2006)

There is an API written in Java which is distributed as a .jar file (.class executables only) and some baseline Java code that implements some simple multi-armed bandit problem algorithms. You will need at least Java version 1.5 to run the Java code. You can compile the baseline code like so:

javac -classpath challenge1.2.jar applicationtest/*.java applicationtest/simplewindowprovider/*.java

After this you can run ApplicationTest like so:

java -classpath .;challenge1.2.jar -Xmx256m applicationtest/ApplicationTest

java -classpath :challenge1.2.jar -Xmx256m applicationtest/ApplicationTest

The Java code contains provider classes and several (bandit) algorithms written using the provider classes. ApplicationTest calls these algorithms and runs them on all 6 tasks. All participants will need to write their algorithm as a provider class. Please look at the provider classes in the sample Java code to understand how to do this. The code contains very simplistic algorithms for bandit problems when arms do not vary over time. However, please note that the challenge is for the case where the rewards for an arm varies over time.

Version 1.1 contains:

Six new tasks
detailed output every 1000 requests to csv file. This contains:
1. the underlying response rates of the options in the task, and
2. the proportion of serves each option is receiving

This allows the user to construct clear graphical views of how an algorithm is performing over time – see for example

UCBWindowProvider.JPG

An efficiency metric has been added to getStatistics(). This ranges from 0 (equivalent to random serving) to 1 (equalling the maximum theoretical success rate).

Version 1.2 contains

UCBWindowProvider – a windowed version of UCB
Improved regret summary
improved commandline args:
1. -h help
2. -r number of requests
3. -s random seed set
4. -i produce results on interim seed set. The summary printed at the end of the test is sufficient for a submission
better javadoc of applicationtest and challenge1.2.jar

Here is the javadoc1.2 for the challenge1.2.jar package.

Matlab code download

Matlab-code1.2.zip (version 1.2 released 21st May 2006)

The Matlab code is a wrapper for the Java API but does not contain any baseline code like that available for the Java download. The Matlab wrapper calls in the challenge1.2.jar executables and all its functions can be used as in Java. Please read the documentation (preamble) in the matlab file in order to set it up to run correctly.

Results of Phase 2 tasks

THE WINNING ALGORITHM WILL BE THE ONE THAT IS RANKED NUMBER 1 IN THE “TOTAL AVERAGE REGRET FOR TASKS 1-6” TABLE USING THE TEST SEEDS.

Total Average Regret for tasks 1-6

Rank	Algorithm	Name	Institution	Total Average Regret
Rank	Algorithm	Name	Institution	Validation Seeds	Test Seeds
1	MetaUCBFC	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	34599.203	34202.5
2	ProviderDiscountedUCB	Thomas Jaksch	University of Leoben	35071.0	36073.797
3	ProviderExp4DWrapper	Thomas Jaksch	University of Leoben	35120.0	36253.9
4	UCB1DiscountTuned	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	37066.2	37117.8
5	MetaUCB	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	35532.4	37282.203
6	UCB1DiscountTuned_C	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	36948.3	37744.7
7	UCB1DiscountTuned_B	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	37356.2	38194.6
8	TaoUcbForgetPascalChallenge	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	39605.4	43969.7
9	UCBWindowProvider	Baseline code	–	180301.81	179246.31
10	ShiftingBanditProvider	Baseline code	–	211875.2	211107.8
11	ProviderUCB	Baseline code	–	222884.5	249174.6
12	RandomProvider	Baseline code	–	322342.7	318739.6

Task 1. SimulatedVisitorFrequentSwap

Rank	Algorithm	Name	Institution	Validation Seeds		Test Seeds
Rank	Algorithm	Name	Institution	Average Regret	STD	Average Regret	STD
1	ProviderExp4DWrapper	Thomas Jaksch	University of Leoben	9183.5	614.5174	9133.0	508.81235
2	MetaUCBFC	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	11052.9	1126.048	9906.5	780.7754
3	ProviderDiscountedUCB	Thomas Jaksch	University of Leoben	10198.4	933.2443	10697.9	1380.2686
4	UCB1DiscountTuned_C	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	12059.8	884.5122	11983.7	618.63525
5	TaoUcbForgetPascalChallenge	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	12036.0	658.0817	12057.5	879.0806
6	UCB1DiscountTuned_B	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	11992.7	618.4238	12303.9	546.7468
7	UCB1DiscountTuned	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	12152.8	701.47064	12330.8	766.26135
8	MetaUCB	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	15298.1	1714.1592	15265.4	3004.444
9	ProviderUCB	Baseline code	–	32832.0	942.6005	32392.5	828.2142
10	UCBWindowProvider	Baseline code	–	36098.6	1702.405	35564.0	1964.2234
11	ShiftingBanditProvider	Baseline code	–	40806.8	1312.9364	40947.7	1463.947
12	RandomProvider	Baseline code	–	48835.7	460.32596	49054.1	503.07773

Task 2. SimulatedVisitorLongGaussians

Rank	Algorithm	Name	Institution	Validation Seeds		Test Seeds
Rank	Algorithm	Name	Institution	Average Regret	STD	Average Regret	STD
1	ProviderDiscountedUCB	Thomas Jaksch	University of Leoben	3032.5	467.23615	3350.2	451.85638
2	UCB1DiscountTuned_C	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	3854.6	393.24185	4020.2	678.94574
3	UCB1DiscountTuned_B	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	3877.9	328.87567	4031.0	400.47943
4	UCB1DiscountTuned	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	3819.4	331.3753	4133.9	631.6608
5	MetaUCBFC	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	4550.2	1560.2434	4561.1	975.12006
6	ProviderExp4DWrapper	Thomas Jaksch	University of Leoben	4767.9	389.79468	4622.5	283.90424
7	MetaUCB	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	4817.5	1670.4481	5636.5	1254.3213
8	TaoUcbForgetPascalChallenge	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	7352.9	2036.1892	8613.9	1643.2212
9	ShiftingBanditsProvider	Baseline code	–	47240.3	3023.0012	46586.8	3428.574
10	ProviderUCB	Baseline code	–	51547.9	10747.207	68992.3	29760.65
11	UCBWindowProvider	Baseline code	–	64994.6	14387.951	69414.7	25192.545
12	RandomProvider	Baseline code	–	113337.1	7096.978	109707.5	7496.9795

Task 3. SimulatedVisitorWeeklyVariation

Rank	Algorithm	Name	Institution	Validation Seeds		Test Seeds
Rank	Algorithm	Name	Institution	Average Regret	STD	Average Regret	STD
1	ProviderDiscountedUCB	Thomas Jaksch	University of Leoben	3947.0	559.38635	3651.8	393.22647
2	UCB1DiscountTuned	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	4385.6	545.81683	4451.5	523.6664
3	UCB1DiscountTuned_B	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	4647.5	515.3112	4533.6	249.49202
4	MetaUCB	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	4534.5	723.832	4558.7	735.7625
5	UCB1DiscountTuned_C	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	4486.4	463.7919	4676.3	703.2815
6	MetaUCBFC	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	4764.3	556.171	5213.6	541.3687
7	ProviderExp4DWrapper	Thomas Jaksch	University of Leoben	7511.4	1893.0953	7999.3	673.5912
8	TaoUcbForgetPascalChallenge	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	7084.7	950.5174	8206.8	787.7223
9	UCBWindowProvider	Baseline code	–	29055.3	11124.4	21102.5	12444.306
10	ShiftingBanditsProvider	Baseline code	–	40649.8	212.50716	40609.1	220.46388
11	RandomProvider	Baseline code	–	57238.3	308.08838	57185.8	253.00189
12	ProviderUCB	Baseline code	–	61824.0	2355.0454	63349.5	934.28564

Task 4. SimulatedVisitorDailyVariation

Rank	Algorithm	Name	Institution	Validation Seeds		Test Seeds
Rank	Algorithm	Name	Institution	Average Regret	STD	Average Regret	STD
1	UCB1DiscountTuned_B	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	4728.9	883.0477	5070.2	947.57245
2	UCB1DiscountTuned	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	5751.8	1262.6565	5097.0	1017.7952
3	ProviderExp4DWrapper	Thomas Jaksch	University of Leoben	4884.9	492.63092	5140.2	365.1742
4	UCB1DiscountTuned_C	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	4751.9	1495.7024	5253.9	1112.1284
5	MetaUCBFC	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	5870.1	617.5396	6179.0	697.5103
6	MetaUCB	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	5657.9	712.6532	6188.6	1034.8442
7	TaoUcbForgetPascalChallenge	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	5946.6	786.82513	7897.1	4283.818
8	ProviderDiscountedUCB	Thomas Jaksch	University of Leoben	7253.1	774.3874	8043.3	1215.4939
9	UCBWindowProvider	Baseline code	–	30567.3	17000.268	34363.6	19650.63
10	ShiftingBanditsProvider	Baseline code	–	42619.6	205.62112	42662.3	194.08247
11	RandomProvider	Baseline code	–	57109.8	238.72942	57112.7	220.19237
12	ProviderUCB	Baseline code	–	54405.2	9520.59	61281.9	6575.8525

Task 5. SimulatedVisitorWeeklyCloseVariation

Rank	Algorithm	Name	Institution	Validation Seeds		Test Seeds
Rank	Algorithm	Name	Institution	Average Regret	STD	Average Regret	STD
1	UCB1DiscountTuned	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	4947.8	375.51675	4981.1	498.89618
2	ProviderDiscountedUCB	Thomas Jaksch	University of Leoben	5137.5	466.3166	5052.9	450.37573
3	MetaUCB	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	4656.4	555.3968	5088.1	739.60144
4	MetaUCBFC	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	5280.0	654.064	5162.6	756.14844
5	UCB1DiscountTuned_C	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	5440.7	326.4149	5243.8	421.37152
6	UCB1DiscountTuned_B	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	5584.6	286.54074	5634.7	401.08328
7	TaoUcbForgetPascalChallenge	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	6813.1	804.2061	6722.4	845.91504
8	ProviderExp4DWrapper	Thomas Jaksch	University of Leoben	7119.8	1833.6073	7557.7	1736.7368
9	UCBWindowProvider	Baseline code	–	15240.1	2971.088	14502.9	4624.025
10	ShiftingBanditsProvider	Baseline code	–	21690.1	364.62173	21514.2	245.27754
11	ProviderUCB	Baseline code	–	21909.6	2055.3625	22709.6	3313.902
12	RandomProvider	Baseline code	–	25764.2	339.47043	25654.9	343.23346

Task 6. SimulatedVisitorConstant

Rank	Algorithm	Name	Institution	Validation Seeds		Test Seeds
Rank	Algorithm	Name	Institution	Average Regret	STD	Average Regret	STD
1	ProviderUCB	Baseline code	–	365.8	74.90705	448.8	128.11522
2	TaoUcbForgetPascalChallenge	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	372.1	82.51525	472.0	142.9662
3	MetaUCB	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	568.0	313.2266	544.9	257.33353
4	ProviderExp4DWrapper	Thomas Jaksch	University of Leoben	1652.5	228.85185	1801.2	176.14943
5	MetaUCBFC	Nicolas Baskiotis, Cedric Hartland & Sylvain Gelly	Univ Paris XI Orsay – lri	3081.7	170.6159	3179.7	340.06668
6	UCBWindowProvider	Baseline code	–	4345.9	250.39633	4298.6	278.8549
7	ProviderDiscountedUCB	Thomas Jaksch	University of Leoben	5502.5	491.91965	5277.7	426.59714
8	UCB1DiscountTuned	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	6008.8	453.98135	6123.5	396.33356
9	UCB1DiscountTuned_C	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	6354.9	437.75983	6566.8	449.95624
10	UCB1DiscountTuned_B	Flóra Adorján & György Ottucsák	Budapest University of Technology and Economics	6524.6	347.51184	6621.2	335.8319
11	ShiftingBanditProvider	Baseline code	–	18868.6	104.250984	18787.7	135.26276
12	RandomProvider	Baseline code	–	20057.6	104.72313	20024.6	152.13971

Old Phase 1 tasks

These are the old tasks from phase 1 and are no longer used for phase 2.

Task 1

Rank	Algorithm	Name	Institution	STD	Average Regret
1	UCBg	Levente Kocsis	mta sztaki	34.7	52.1
2	ProviderUCB (Baseline)	–	–	24.28	65.6
3	Mean/variance regret	Craig Saunders	University of Southampton	61.28	89.3
4	ShiftingBanditProvider (Baseline)	–	–	21.58	141.2
5	RandomProvider (Baseline)	–	–	25.42	151.4

Task 2

Rank	Algorithm	Name	Institution	STD	Average Regret
1	UCBgper	Levente Kocsis	mta sztaki	803.1	627.4
2	ShiftingBanditsProvider (Baseline)	–	–	206.53	1839.3
3	Stupid Switcher	Craig Saunders	University of Southampton	334.27	3669.3
4	Experts Prediction	Cedric Hartland	Laboratoire de Recherche en Informatique	1283.36	6318.6
5	ProviderUCB (Baseline)	–	–	4045.41	13270.7
6	UCBW	Levente Kocsis	mta sztaki	2298.62	18430.1
7	UCBg	Levente Kocsis	mta sztaki	8367.2	23552.7
8	RandomProvider (Baseline)	–	–	3153.57	27434.2

Task 3

Rank	Algorithm	Name	Institution	STD	Average Regret
1	Mean/variance regret	Craig Saunders	University of Southampton	4.95	10.3
2	UCBS	Levente Kocsis	mta sztaki	10.39	15.3
3	UCBg	Levente Kocsis	mta sztaki	17.03	26.3
4	ProviderUCB (Baseline)	–	–	11.85	80.7
5	ShiftingBanditsProvider (Baseline)	–	–	40.24	765.2
6	RandomProvider (Baseline)	–	–	41.14	892.5

Exploration vs. Exploitation Challenge

Introduction

The Challenge

Tasks

Task 1. SimulatedVisitorFrequentSwap

Task 2. SimulatedVisitorLongGaussians

Task 3. SimulatedVisitorWeeklyVariation

Task 4. SimulatedVisitorDailyVariation

Task 5. SimulatedVisitorWeeklyCloseVariation

Task 6. SimulatedVisitorConstant

Organisers

Submission of Results

Judging

Java code download

Matlab code download

Results of Phase 2 tasks

Total Average Regret for tasks 1-6

Task 1. SimulatedVisitorFrequentSwap

Task 2. SimulatedVisitorLongGaussians

Task 3. SimulatedVisitorWeeklyVariation

Task 4. SimulatedVisitorDailyVariation

Task 5. SimulatedVisitorWeeklyCloseVariation

Task 6. SimulatedVisitorConstant

Old Phase 1 tasks

Task 1

Task 2

Task 3

Knowledge 4 All Foundation Ltd.