The challenge considers the problem of inducing a grammar directly from natural language text. Resultant grammars can then be used to discriminate between strings that are part of the language (i.e., are grammatically well formed) and those that are not. This has long been a fundamental problem in Computational Linguistics and Natural Language Processing, drawing from theoretical Computer Science and Machine Learning. The popularity of the task is driven by two different motivations. Firstly, it can help us to better understand the cognitive process of language acquisition in humans. Secondly, it can help with portability of NLP applications into new domains and new languages. Most NLP algorithms rely on syntactic parse structure created by supervised parsers, however training data in the form of treebanks only exist for a few languages and for specific domains, thus limiting the portability of these algorithms. The challenge we are proposing aims to foster continuing research in grammar induction, while also opening up the problem to more ambitious settings, including a wider variety of languages, removing the reliance on part-of-speech and, critically, providing a thorough evaluation. The data that we provided was collated from existing treebanks in a variety of different languages, domains and linguistic formalisms. This gives a diverse range of data upon which to test grammar induction algorithms yielding a deeper insight into the accuracy and shortcomings of different algorithms. Where possible, we intend to compile multiple annotations for the same sentences such that the effect of the choice of linguistic formalism or annotation procedure can be offset in the evaluation. Overall this test set forms a significant resource for the evaluation of parsers and grammar induction algorithms, and help to reduce the NLP field’s continuing reliance on the Penn Treebank.