Abstract
Conversational AI and dialog systems tools have become ubiquitous and have been very useful for many practical applications, for example, planning for travel, communication with medical chatbots, and basic household activities like setting alarm or switching on/off the light bulb.
However, these tools are only available for high resource languages like English or French because of the lack of important datasets to power these technologies in many low-resource languages, especially African languages.
Two important tasks needed to power conversation AI systems are intent detection and slot-filling tasks that are required by the dialog system manager to understand and reply to users’ requests.
In this project, we intend to create conversational AI datasets for intent detection and slot-filling tasks needed by voice assistants like Amazon Alexa and Google Home.
In parallel to that, we intend to expand benchmark datasets that are available for African languages to cover more linguistically oriented tasks like commonsense reasoning and natural language inference since they are popular tasks (in multilingual NLU benchmark datasets) needed to develop multilingual pre-trained language models for African languages.
Personnel
- David Adelani (Principal Investigator), is a PhD student in computer science at Saarland University, Germany. He led the development of MasakhaNER (Adelani et al. 2021) – a named entity recognition dataset for 10 African languages. The dataset is being expanded to 20 languages supported by Lacuna
- Andiswa Bukula (Co-Investigator) is a Digital Humanities researcher at the South African Centre for Digital Language Resources, with a speciality in isiXhosa. She was also an assistant lecturer for isiXhosa at the Nelson Mandela Metropolitan University. Andiswa is a PhD candidate at Rhodes University and is focusing her research on the influence of language technologies on the effectiveness of multilingualism in Higher Education.
- Annie En-Shiun Lee (Collaborator) is an assistant professor (teaching stream) at the Computer Science Department at the University of Toronto. She received her PhD from the University of Waterloo and has been a visiting researcher at the Fields Institute and Chinese University of Hong Kong as well as a research scientist in industry. Her research focuses on finding patterns in society and in nature. More specifically, she is interested in exploring data for discovering patterns and their structures in order to uncover the underlying knowledge.