Abstract
Existing speech recognition services are only available in major languages. Currently, neither Amazon’s Alexa, Apple’s Siri, nor Google Home, the main players in the global voice assistants market, support a single native African language. They also tend to work better for men than women and struggle to understand people with different accents, all of which is a result of biases within the data on which they are trained.
Project Description and Specifications
Currently, Igbo language has no public open-source speech dataset despite having 42 million speakers, even beyond the boundaries of Nigeria. Similar cases occur in Yoruba and Hausa languages and other African languages. By leveraging the Common Voice platform, which was launched to help address biases and subsequent inequalities in voice data, by incorporating community events and incentive mechanisms (much like the ‘Umuganda’ of Rwanda), we plan to curate 1000 hours of diverse (in terms of gender, age, dialect) speech recordings each for Igbo, Hausa and Yoruba languages. This, we believe, will be a step in contributing to the inclusion of African languages in speech technologies. For Igbo and Yoruba, which are not fully localized on Common Voice, we will first complete the localization process before proceeding with the recording. Localization involves translating project tools and material on the Common Voice platform to be understood by contributors in their language.
Anticipated Use Cases and Benefits
Existing speech recognition services are not available in many African languages, and the speakers of these languages are excluded from the benefits of voice-enabled technologies. This dataset will no doubt pave the way for speech technologies – like speech-to-text, text-to-speech, speech translation and modelling – for these African languages, which hitherto had little or no public dataset. For one, it can be used as a training and/or evaluation dataset for speech processing tasks. For another, the availability of such dataset will enable easy creation of voice-enabled services that can be targeted at the indigenous grassroot communities. For example, during the COVID-19 pandemic period in Rwanda, Digital Umuganda was able to use their large curated Kinyarwanda speech-text data on Common Voice to create a health chatbot that helped provide necessary health information to the local Rwandan communities. Thus, this project will engender inclusiveness of most of Africa’s grassroot population, who speak these languages in areas of health, education and information.
Personnel
- Chris Emezue is a Masters student at the Technical University of Munich, studying Mathematics in Data Science. He has worked extensively on (and contributed at Masakhane to) a number of projects in AfricaNLP (like MMTAfrica, OkwuGbe). He has worked as a natural language processing (NLP) researcher at Siemens AI Lab, LMU, and HuggingFace (in March 2022).
- Adaeze Adigwe is a PhD student at the University of Helsinki, Finland researching on Deep Learning Models for Speech Synthesis within Conversational A.I. Applications. She also extends her research work in the capacity of a speech scientist at ReadSpeaker in the Netherlands. Her past academic background includes a Masters in Computer Science at Columbia University and Bachelors in Electrical Engineering at Northeastern University. Her research interests include speech and language processing with a focus on prosody, spoken dialogue systems, and low-resource languages.
- David Adelani (NLP researcher, https://dadelani.github.io/ ) is a PhD student in computer science at Saarland University, Germany. He led the development of MasakhaNER (Adelani et al. 2021) – a named entity recognition dataset for 10 African languages. The dataset is being expanded to 20 languages supported by Lacuna Fund.
- Shamsuddeen Muhammad is a PhD candidate at the University of Porto, Portugal. He is a faculty member at the Faculty of Computer Science and Information Technology, Bayero University, Kano-Nigeria. He is also a researcher at Masakhane and the Laboratory of Artificial Intelligence and Decision Support, Portugal. His research interest focuses on NLP for African low resource languages. His open-source community, HausaNLP, has connections to many local groups and seasoned researchers on Hausa language.
- Professor Gloria Monica Tobechukwu Emezue, commonly known as G.M.T Emezue, is a professor of English at the Alex Ekwueme Federal University, Nigeria (AE-FUNAI). As a literary critic and linguist, her major research interests include post-colonial studies, interfaces between the digital and human languages and Literature. She pioneered the Igbo Village project as well as the Igbo Day Cultural festival at AE-FUNAI. Part of her present research that connects with artificial intelligence is the Jidenka Machine Modelling (accepted at the MLCD Workshop in NeurIPS 2021), a project which she and other scholars from around the world have undertaken in order to develop a ML model that can create African literature.