Description
Twi is arguably the most recognizable Akan Language natively spoken in parts of southern and central Ghana, as well as parts of Cote d’Ivoire. By some estimates it has approximately 20 million native speakers [1]. It is a tonal language. It comprises at least four distinct dialects, namely Asante, Akuapem, Fante and Bono. Asante is arguably the most widely spoken and common dialect.
Pertinence
In practice, knowing this language alone allows one to navigate most parts of Ghana. You are likely to find someone who at the very least understands the language in every part of Ghana.
Example Sentences
English | Twi |
What is going on here? | Ɛdeɛn na ɛrekɔ so wɔ ha? |
Wake up | Sɔre |
She comes here every Friday | Ɔba ha Fiada biara |
Learn to be wise | Sua nyansa |
Prior Work
The website and app Kasahorow [2] has a rather limited set of translations. The JW300 dataset [3] has some (over ½ million) extremely noisy English to (Akuapem) Twi parallel translation sentence pairs. A noisy Wikipedia is available [4], but the volume and quality leave much to be desired [4]. Some 700 sentence pairs are available in the TC Akan Corpus [9].
A recent study [5], which investigated the quality of these data sources in the context of FastText embeddings constructed on Twi, found them to be woefully insufficient. It is the only modern computing study of Twi that we are aware of. We have since replicated and slightly improved these FastText embeddings [6], trained and shared a variety of embeddings from the Transformers/BERT family through the HuggingFace model repo [7] and crowd sourced close to 1000 manually curated translation pairs. We have also developed a fairly decent English-Twi translator (transformer-based seq2seq model) which we are hoping to refine on the data that this collaboration yields. You can find more information on our official and github pages [8].
Researcher Profile: Paul Azunre
Paul Azunre holds a PhD in Computer Science from MIT and has served as a Principal Investigator on several DARPA research programs. He founded Algorine, a Research Lab dedicated to advancing AI/ML and identifying scenarios where they can have a significant social impact. Paul also co-founded NLP Ghana, an open source initiative focused on using NLP and Transfer Learning with Ghanaian and other low-resource languages. He frequently contributes to peer-reviewed journals and has served as a program committee member at some ICML workshops in AutoML and NLP. He is the author of the “Transfer Learning for NLP” book recently published by Manning Publications.
Researcher Profile: Lawrence Adu-Gyamfi
A subsea installation engineer by profession with a background in Aerospace engineering. Currently devoting the rest of my off-work time to contributing to the activities of NLP Ghana, assisting with the collection of data, preprocessing them and making them ready for use in the models we are testing internally. Serving as the NLP Ghana Director of Product, overseeing how the different teams of NLP Ghana work together.
Researcher Profile:Esther Appiah
Esther Appiah holds a BA in Modern Languages from the Kwame Nkrumah University of Science and Technology with a Diploma in French Studies from the Université D’Abomey Calavi, Centre Beninois des Langues Étrangères (CEBELAE) in Benin. She is currently pursuing an MPhil in Theoretical Linguistics at UiT, Norway. Her language specialties include French, English and Akan. She has a vast experience spanning various sectors/industries on language use and interface with core tasks on writing, proofreading, translation and researching. She works with the Ghana NLP as a data researcher and ultimately hopes to specialise in Computational Linguistics to help streamline NLP processes in underrepresented African languages in the digital space.
Researcher Profile: Felix Akwerh
Felix is currently enrolled in a Masters program in Computer Science at the Kwame Nkrumah University of Science and Technology. He augments his education with online classes and Machine Learning events. He is actively involved in the development of natural language processing with Ghana NLP. He co-authored a paper on Artificial Intelligence in Construction for submission. He holds a Bsc in Mathematics at the Kwame Nkrumah University. He worked with the UITS-KNUST where he helped build a transport system and other software projects. His research interest lies in Machine Learning and NLP, specifically in neural conversational models.
Researcher Profile: Salomey Osei
Salomey holds a Master of Philosophy in Applied Mathematics and an Msc in both Industrial Mathematics and Machine Intelligence. She is a recipient of Google and Facebook Scholarship, MasterCard Foundation Scholarship amongst others. She is the team lead for unsupervised methods for Ghana NLP and a co organizer for Women in Machine Learning and Data Science Accra chapter (WiMLDS). She is also passionate about mentoring students, especially females in STEM and her long term goal is to share her knowledge with others by lecturing.
Researcher Profile: Samuel Owusu
Samuel Owusu is currently working as a data scientist for the Ministry of Finance, Ghana. He holds a BSc in Information Technology from Ghana Technology University College. He was a team member of the group that won 1st prize of Ghana’s maiden national hackathon organised by the World Bank and Ministry of Water Resources and Sanitation. His Research interest lies in NLP – Automatic Speech Recognition for low resourced languages. He is involved in developing open source curriculums in Machine Learning and Computer Science for young girls. Samuel is a life-long learner.
Researcher Profile: Cynthia Amoaba
Cynthia Amoaba is a high school graduate from Chemu Senior High School and a student at the University For Development Studies. She’s an Ambassador and founder of the first Women In Stem (WiSTEM) chapter in Ghana.She also founded the STEM club in her high school and looks forward to extending it to schools in deprived areas. Currently, she tutors high school students in her community in Physics and Maths and helps train school dropouts in beads and soap making. She’s a science enthusiast and looks forward to learning more through her involvement in the development of NLP with Ghana-NLP.
Researcher Profile: Salomey Afua Add
Salomey Afua Addo is the founder of Lighted Hope, a Non Governmental Organization that seeks to promote literacy and coding skills among children living in slums in Ghana. She holds an MSc in Mathematical Sciences from the African Institute for Mathematical Sciences and a certificate in business management from the European School of Management and Technology, Berlin. She is the coding instructor for The Love Academy in the USA. Currently, she serves as a volunteer at Ghana NLP, and she plays a vital role in collecting and preprocessing data for the data team at Ghana NLP. Salomey Afua Addo lives a purpose driven life.
Researcher Profile: Edwin Buabeng-Munkoh
Edwin Buabeng-Munkoh is currently working as a Software Engineer at Huawei Technologies Ghana Limited. He holds a BSC in Computer Engineering from Kwame Nkrumah University of Science and Technology. He is enrolled in the Data Science Mentorship program with Notitia AI. He is actively involved in the development of natural language processing with GhanaNLP. He serves as a volunteer at Ghana NLP where he helps with preprocessing data for the data team. Along with his daily work he has enrolled and completed multiple online courses on Data Science, AI and NLP. His research interest lies in Machine Learning, NLP and Computer Vision. He plans to help build a world where language is not a barrier in education and good healthcare
Researcher Profile:Nana Boateng
Nana Boateng holds a PhD. in Statistics from The University of Memphis. He has three masters degrees in Statistics, Mathematics and Economics. He has worked as a Data Scientist for Companies such as Fiat Chrysler Automobiles, Nice Systems Inc and Baptist Memorial Hospital. He is interested in application of mathematics, statistics and economics principles in solving problems in healthcare, finance and several other industries. He has several peer-reviewed publications to his name. He is the founder of Rest Analytics which advises companies on how to apply machine learning to increase efficiency and productivity. He contributes to GhanaNLP in the area of supervised learning.
Partners
References
- https://en.wikipedia.org/wiki/Twi
- https://www.kasahorow.org/
- Z. Agic et. al., JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages, ACL Proceedings
- https://ak.wikipedia.org/
- J. Alibi et al., Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi, LREC Proceedings 2020
- https://medium.com/swlh/ghana-nlp-computational-mapping-of-ghanaian-languages-edf60c56bcce
- https://huggingface.co/Ghana-NLP
- https://ghananlp.github.io/
- https://www.researchgate.net/publication/323998547_TypeCraft_Akan_Corpus_Release_10
Disclaimer
The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.