Zulu: an interactive learning competition | Knowledge 4 All Foundation Ltd.

General questions [and terms]

Why is the shared task called Zulu?

The reason for which learning languages by active learning is an important topic in the field of computational linguistics is that it is not always possible to rely on statistics. This is particularly the case for those languages of which large corpora are not available. This question is known sometimes as working on under-resourced languages.

In a couple of sentences, what is it about?

Zulu is an active learning task. The participants are asked to build algorithms that can learn languages by querying a server via membership queries. The targets are regular languages, and for each task a fixed number of queries is allowed.

Can anyone participate?

Yes. Anyone can. One should remember that the winner of the 1998 Abbadingo competition (Nick Price) was just a computer scientist not affiliated to any research lab! Typically, students that have learnt about Dana Angluin’s L* algorithm should be interested. But we hope that researchers in computational linguistics, machine learning or any field which needs grammatical inference will be interested.

What is a task? A task owner?

A task is set when a new target automaton is created. During the training phase anyone can create a new task of which he becomes the task owner. The task owner is the first to try to solve a given task. His score appears on the web and can be used as a guideline by the other players.

What is the baseline algorithm?

The baseline algorithm is a simple adaptation of Angluin’s 1987 L* algorithm. It makes membership queries in order to obtain a table which is both closed and complete. At this point it samples some strings (using a distribution based on the size of the current hypothesis), and tests (again via membership queries) the labelling of the sampled strings against the labelling returned by the oracle. If all strings are perfectly labelled, the algorithm decides that (under the important assumption that the chosen distribution was good enough) the error cannot be too large. It can now test. If some example appears to be badly labelled, the algorithm can use this as a counterexample from an equivalence query.
More details concerning the baseline algorithm can be found here.
The baseline algorithm has two purposes. It is used to compute the number of queries allowed for the task, but it can also be used by any user as a starting point to participate in the competition.

Technical questions

Do I have to use Java?

No. The baseline is written in Java, but since the programme will run on your own machine, you can choose anything. Please refer to the Develop your algorithm in your language section.

Why do I not see my score with the score of the task owner?

Because this is only training phase. People might get things wrong and would cease to participate if their first trials were consistently beaten by everyone else. Furthermore, the second time one plays one cannot avoid different types of hill climbing: you could rely on the results from queries done by another algorithm, or you know the exact size.
But you are encouraged to discuss with the task-owner if you want to know more.

Then why play a task that I don’t own?

Because sometimes it can seem challenging to see how well our algorithm is progressing.

How are the DFA generated?

We are using here the same programs that were used by Gowachin. Details can be found on the webpage, and in the associated publications [lang1998rao]. To make things simple, we start by randomly drawing a number n at most as large as the parameter the task owner proposes. We then construct a random graph with slightly more elements than n and randomly decide that some states are final and others are not. Then minimisation takes place.

How are the data from the test set generated?

Again we follow the protocol from Abbadingo and Gowachin. A maximum length k is defined that depends on the thickness of the DFA. Then strings are randomly drawn from $\Sigma^{\le k}$

Why am I given 1723 queries?

This number is computed as follows: when a target is created, the baseline algorithm is run and when this algorithm is capable of doing less than 40% errors, the number of queries made up to that point is allocated to the task.

Why am I not given the real number of states of the target?

Two reasons: the first is that this is not about identification, and the second is that since L* makes use of equivalence queries, and we are not allowed these, he will have to do some sampling instead. But since he does not know what the real distribution is, this is a problem (and it is easy to show that in theory there is no good answer).

About the competition itself

What will the actual competition be like?

During the competition phase, the task owner will be the organisers. They will provide a series of tasks .

But I can see all sorts of loopholes and ways to collude. Are you not scared of cheating?

No. Obviously, before proclaiming who the winner is, the organisers will ask for the code and the sources. They will then run the winner’s code on their machines.

But why not do this for all players?

Because that would oblige us to select some very strict format. And here the idea is not to impose limitation on how the algorithms treat the information they gather, but only on how much information they gather.

What do we win?

The actual competition will probably provide a small financial prize for the winner. Furthermore, with the help of our scientific committee, we should be able to propose that the winner publishes his solution in the Machine Learning Journal.
We also hope to hold a workshop in the autumn of 2010.

About the theory, the DFA, active learning

What is the relevant theory about all this?

There are a number of papers one might read: Angluin’s papers on query learning, her description of L*, the chapter in Vazirani’s book [Il faut des references]

But… is it useful to learn DFA from membership queries ?

This may come as a surprise, but it is. Here are some examples.

Siemens
Squirrel

Odds and ends

I don’t find my question here. How can I ask it?

Simple: send us an e-mail. If the question (and the answer) are likely to be of interest to others, we will add it to this list.

I don’t think this is the right competition. Why are we not learning X from Y and count Z? (Here X can be any type of languages with all sorts of alphabet sizes),Y can be any sort of queries or information and Z any type of measure of the quantity of resources needed for learning)

Indeed, there is room for improvement. The feeling was that if active learning was an important question, we did not know enough about learning strategies, when the learner could interactively interrogate its environment. And in this case, the crucial resource is not the (CPU) time that the learning program is going to spend, but the number of times it uses the expensive resource, the oracle.
In all cases, since we have noted that this question is an important one, we will gladly discuss this in the forum.

What is best? To score 60% with 1000 queries or 70% with 2000?

Another good question. In preliminary experiments we have noted an (albeit uneven) exponential progression of the number of queries with the rate of success. For the moment, we fix the number of queries. If you can do better with less. Good, because it probably means that on another task you will score higher.

Lang, K.J. and Pearlmutter, B.A. and Price, R.A. (1998) Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm, Lecture Notes in Computer Science

Some applications of active learning in the case of finite automata

The earliest task was that of map building in robotics: the (simplified) map is a graph and the outputs in each state are what the robot may encounter in a state. A particularity here is that the robot cannot be reset: the learning algorithm is to learn from just one very long string and all its prefixes.

References

R. L. Rivest and R. E. Schapire. Inference of finite automata sequences. Information and Computation , 103:299-347, 1993.

Thomas Dean , Dana Angluin, Kenneth Basye , Sean P. Engelson , Leslie Pack Kaelbling , Evangelos Kokkevis , Oded Maron : Inferring Finite Automata with Stochastic Output Functions and an Application to Map Learning. AAAI 1992 : 208-214

A task somehow related is that of discivering the rational strategy followed by an adversary. This line was looked into in a number of papers related to agent technologies.

References

D. Carmel and S. Markovitch. Model-based learning of interaction strategies in multi-agent systems. Journal of Experimental and Theoretical Artificial Intelligence , 10(3):309-332, 1998.

D. Carmel and S. Markovitch. Exploration strategies for model-based learning in multiagent systems. Autonomous Agents and Multi-agent Systems , 2(2):141-172, 1999.

One task on which grammatical inference is proving to be particularly useful is that of wrapper induction: The idea is to find in a web page (or various of the same type) all arguments of a special sort. In this context, the role of the Oracle is played by the human user.

References

J. Carme, R. Gilleron, A. Lemay, and J. Niehren. Interactive learning of node selecting tree transducer. In Ijcai Workshop on Grammatical Inference , 2005.

Julien Carme , Rémi Gilleron , Aurélien Lemay , Joachim Niehren: Interactive learning of node selecting tree transducer. Machine Learning 66 (1): 33-67 (2007)

In different tasks linked with checking if the specifications of a system or a piece of hardware are met, the item to be checked is used as an oracle. Queries are made in order to build a model of the item, and then the learned model can be checked against the specifications. Obviously, there is an issue with the distribution used to simulate the equivalence query in this context to check through interaction with the actual chips used as an Oracle that the specifications are met..

References

L. Bréhélin, O. Gascuel, and G. Caraux. Hidden Markov models with patterns to learn boolean vector sequences and application to the built-in self-test for integrated circuits. Pattern Analysis and Machine Intelligence , 23(9):997-1008, 2001.

T. Berg, O. Grinchtein, B. Jonsson, M. Leucker, H. Raffelt, and B. Steffen. On the correspondence between conformance testing and regular inference. In Proceedings of Fundamental Approaches to Software Engineering, 8th International Conference, FASE 2005 , volume 3442 of Lncs , pages 175-189. Springer-Verlag, 2005.

H. Raffelt and B. Steffen. Learnlib: A library for automata learning and experimentation. In Proceedings of Fase 2006 , volume 3922 of Lncs , pages 377-380. Springer-Verlag, 2006.