The RTE challenges have run annually for several rounds to great success. The task consists of recognizing that the meaning of a textual statement, termed H (the hypothesis), can be inferred by the content of a given text, termed T (the text). Given a set of pairs of Ts and Hs as input, the systems must recognize whether each T entails the corresponding H, classifying whether:

  • T entails H
  • T contradicts H, or shows it false
  • the veracity of H is unknown on the basis of T.

A human-annotated development set is first released to allow investigation, tuning and training of systems, which are then evaluated on a gold-standard test set. In later rounds of the challenge, the given texts were made substantially longer, usually corresponding to a coherent portion of the document such as a paragraph or a group of closely related sentences. Texts come from a variety of unedited sources. Thus, systems are required to handle real text forms that may include typographical errors and ungrammatical sentences. A novel Entailment Search pilot task was also introduced.