Description
Ewe and Fongbe are Niger–Congo languages, part of a cluster of related languages commonly called Gbe. Fongbe is the major Gbe language of Benin (with approximately 4.1 million speakers), while Ewe is spoken in Togo and southeastern Ghana by approximately 4.5 million people as a first language and by a million others as a second language. They are closely related tonal languages, and both contain diacritics that can make them difficult to study, understand, and translate.
Although those languages are at the core of the economic and social life of at least 3 major West African capital cities (namely Cotonou, Lome and Accra), they are today mostly spoken and very rarely written. Due to that fact (among other reasons), there is very little official or formal communication in those languages, leaving non-French/English speakers often unable to access critical facilities like education, banking, and healthcare. This challenge is part of an initiative that wishes to bring down the barriers between African local language speakers and modern society.
The objective of this challenge is to create a machine translation system capable of converting text from French into Fongbe or Ewe. You may train one model per language or create a single model for both. You may not use any external data, so a key component of this competition is finding a way to work with the available data efficiently.
This is a pioneer competition as far as low-resourced West African languages are concerned. A good solution would be a model that can be improved upon or used by researchers across the world to create APIs that can be integrated into day-to-day tools like ATMs, delivery applications etc., and help bridge the gap between rural West Africa and the modernized services.
This competition is one of five NLP challenges we will be hosting on Zindi as part of AI4D’s ongoing African language NLP project, and is a continuation of the African language dataset challenges we hosted earlier this year. You can read more about the work here.
About Takwimu Lab (takwimulab.gitlab.io)
TakwimuLab is an association of francophone west african who are professionals and enthusiasts about AI technologies. Our goal is to spread awareness about the challenges AI can help solve in our communities, disseminate knowledge and build solutions that can resolve real issues in our countries. Takwimu Lab is based in Benin.
Data
This is a parallel corpus dataset for machine translation from French to Ewe and French to Fongbe, languages from Togo and Benin respectively. It contains roughly 23 000 French to Ewe and 53 000 French to Fongbe parallel sentences, collected from blogs, tales, newspapers, daily conversations, webpages and annotated for neural machine translation. The collected sentences were preprocessed and aligned manually.
Variable definitions
- ID : Unique identifier of the text
- French : Text in French
- Target_Laguauge: The target language
- Target : Text in Fongbe or Ewe
Files available for download:
- Train.csv – contains parallel sentences for training your model or models. There are 77,177 rows, of which 53,366 are French-Fongbe and 23,811 are French-Ewe
- Test.csv- resembles Train.csv but without the Target column. This is the dataset on which you will apply your model(s).
- SampleSubmission.csv – shows the submission format for this competition, with the ID column mirroring that of Test.csv and the ‘Target’ column containing your translation in Ewe or Fongbe. The order of the rows does not matter, but the names of the ‘ID’ must be correct.