Motivation

The internet is an important source of information for many people, and today’s social media platforms continue to shape how people access and act on health information.

Social media platforms serve as channels for top-down communication from health officials to the public, for peer sharing of health information across users, and for finding communities with shared health goals and challenges.

Nonetheless, along with these useful interactions come false information that has the potential to cause people to take harmful measures, reject factual updates from authorities, and upend the work of local health institutions.

Events such as the 2014 Ebola epidemic and today’s COVID-19 pandemic bring to light the need for social media platforms to facilitate access to accurate and reliable information.

In this project, we aim to use a mixed-method approach to study the use of social media as an information channel during the ongoing COVID-19 pandemic in Nigeria – what accounts shared news that was later found to be false, how did false news spread within the network before they were corrected, why do people share certain kinds of information, and what strategies can we learn to navigate the spread of health misinformation online within developing and under-developing countries.

Outcomes

The overall objective of this project is to study the dynamics of the spread of factual and false information in online social networks in Nigeria during a pandemic.

Our study will combine approaches in social network feature engineering and analysis, machine learning (ML), and natural language processing (NLP) with qualitative insights from social network users.

These findings will help online platforms, journalists, the general public, and health institutions in Nigeria identify ways that health misinformation is spread online and rethink what strategies can be employed to mitigate the danger it poses.

Our results will include code (in Python and/or R), social media data analyses, anonymized survey data, visualizations, a blog post, and a research publication.

We will release new code or point to existing open-source resources that will be used for our analyses. These will be hosted on Github to allow independent reruns. We will refer to the privacy policies of specific online communities regarding sharing identifiable data. We will release blog posts and visualizations with simple readable information for a wider audience.

We also aim to publish our findings at conference venues interested in the interaction between technology and society, and how both factors influence themselves e.g. CHI Human Factors in Computing Systems), CSCW (Computer Supported Cooperative Work), The Web Conference, WSDM (Web Search and Data Mining), etc.

Long term vision

We hope that this research will support on-the-ground healthcare work by helping to inform how workers interact with the public, and how to address the publics’ constantly changing perception of what is true or not.

We hope to contribute to the joint effort of journalists and government officials to stop the spread of the virus in Nigeria and other developing countries. Our approach will be useful for studying other forms of misinformation in future health crises and/or political events (i.e. elections).

User perceptions of these events are very much shaped by social media, however, is currently understudied in many African countries. Additionally, research on social media echo chambers and political polarization are widely studied in America but not in the African context.

Previous research in this space is focused on diseases like ebola1 but had not specifically focused on Africa2, or HIV but did not include a qualitative analysis3.

Description

Artificial intelligent system for predictors of early detection of maternal, neonatal and child health risks and their timely management.

Rationale

The idea we propose is to build an artificial intelligent (AI) system for informing on predictors of early detection of maternal, neonatal and child health risks and their timely management. Tanzania is among the countries with the highest maternal mortality rates (MMR) in the world. The estimated MMR according to the 2015-2016 Tanzania Demographics and Health Survey (DHS) was 556 per 100,000.

In fact, according to the Partnership for Maternal, Newborn and Child Health (PMNCH), the maternal mortality in Tanzania has changes only slightly over the years contrary to child mortality rates which were 99 deaths per 1000 in 1999 and had become 68 per 1000 by 2005. Therefore, much needs to be done in preventing maternal mortality.

The Ministry of Health Community Development Gender  Elderly and Children (MoHCDGEC) has the District Health Information System (DHIS-2) which digitizes health data at the district level, as well as the Integrated Diseases Surveillance and Response (IDSR) for capturing weekly data on key conditions and diseases. In addition, there is the National Bureau of Statistics (NBS), which conducts and supplies data from the Demographic and Health Survey (DHS). In this proposal we intend to make use of this data present in local, national and international databases and artificial intelligence (AI) tools to build decision-making support systems.

This will involve training AI system i.e. using machine learning algorithm to be able to identify predictors of early risk detection and early risk management. Prior research in integrating AI in health care system in detecting MMR have demonstrated that AI can bring paradigm shift in reducing MMR by predicting the pregnancy outcome.

In the Tanzanian context, the most important determinant of MMR is the timing of detection of risks and their timely management. Full understanding of this aspect will be vital to the fight to reduce MMR as well as neonatal and child deaths. Therefore, our central hypothesis is that computer-based decision procedures, under the broad umbrella of artificial intelligence (AI), can assist in reducing MMR and generally improving health care in poor resources environment through detection of predictors of early risk detection and management.

The uniqueness of our hypothesis is that it will address the crux of the maternal, neonate and child deaths problem, which is what causes untimely detection and management of risks? Understanding of the predictors will help in redesigning the health care practice, management and financing around this area. The decision support tools from this proposal will be applicable in a wider scale, from members of the households, to clinicians, researchers, policymakers and maternal, neonate and child health activists.

Outcomes

The project will involve developing two AI systems (1) An AI system to be used at National level in Tanzania and (2) AI system to be used at hospital level to predict individual cases. However, for this phase (Phase one) we will focus more in the first objective of developing AI system at National level.

Therefore, gathering of data will be divided in two phases. This phase (Phase one) will involve three National platform: District Health Information System (DHIS), Integrated Diseases Surveillance and Response (IDSR)and Demographic and Health Survey (DHS) which is under National Bureau of Statistics (NBS).

DHIS and IDSR were developed in silos and so they do not communicate, they have different sets of indicators. DHIS is under the custodian of the Ministry of Health Community Development Gender, Elderly and Children (MoHCDGEC) and is an electronic tool for digitizing data at the district level. While IDSR is used for capturing weekly data on key conditions and diseases.

Secondary data concerning maternal and neonatal and child risks filtered and cleaned from DHIS, IDSR and NBC will be generated. These data will be used to generate spectrum of factor and their weights for determining the timing of risk detection and their management at national level.

Moreover, in phase two of this project individual routine data to be collected from health facilities will be used to extract factors associated with MMR threat, in order to determine the likelihood of MMR maternal, neonatal and child health risks before and after the pregnancy. This will assist the hospital management to act and intervene at individual level.

Long-term vision

Once the models have been selected they will continue being tested with more incoming data. The second phase of this project will involve development of AI system of early detection of maternal, neonatal and child health risks at hospital level, which will also be integrated to the developed predictions models and data sources at national level.

If this pilot study will show positive results. A future project will involve testing and scale up the developed tool to be used as a control intervention schemes in other areas and even to scale up for other diseases.

Personnel

Dr. Gladness G. Mwanga holds a PhD in Information and Communication Science and Engineering focusing on decision-support tools using Machine learning. She is a mentor in Data science at NOTTECH Lab. In the past four years she has been working on a data science projects that gave her experience in building AI systems in solving various problems within the society. One of them she managed to develop machine learning models to predict decisions to be made by dairy farmers, identifying factors that can influence their decisions and forecast on farmers demands regarding to their specific needs or services in four Eastern Africa countries (Ethiopia, Kenya, Tanzania and Uganda). Gladness also has four years’ experience of working as a research assistant at The Nelson Mandela African Institution of Science and Technology and a consultant in developing ICT based platforms, visualization tools and oversee all activities to ensure a successful project. She’s is going to lead this project and assist in the development of an AI system.

Mr. Timothy Wikedzi is a Senior Software Engineer and a mentor at Nottech Lab, he has intensive experience in building and managing large scale software solutions. Since 2018 he has been a part of the Core team that built and support services for ShowClix Inc an Event Management and Ticketing company based in Pittsburgh USA. Prior to that Mr. Timothy has worked as Lead Tech Consultant in projects that helped to build tools and services for various Organizations in Tanzania, UK, and the USA. Timothy’s areas of interest are in building scalable solution, secured web applications, building fast and efficient systems, forming and leading teams behind software products. He  therefore brings in skills in system development.

Mr. Scott Businge is a Senior Software Engineer and a python mentor at NOTTECH lab, but He has specialized in DevOps and Software engineering with Python and Golang. He has diverse practical
experience and abilities both in software development and Operations. He is also, committed to automation, systems optimisation, security, immediate software delivery practices and monitoring processes. He has previously worked with big tech companies in Africa such Andela which offers world class Software Engineering solutions to clients around the world. He therefore brings in skills in system development using advanced python and launching of the system (DevOps).

Description

The goal of this project is to develop a computer-vision based non-intrusive automatic data collection mechanism to collect images and give insights about ecological succession on coral reefs in the Vamizi Island, allowing biologists to analyze data in real-time and infer on animals life story, behavior and population in Mozambican waters.

Rationale

Coral reefs are among the world’s most diverse ecosystems, with more than 800 species of corals providing habitat and shelter for approximately 25% of global marine life, although they cover less than 0.1% of the ocean floor. Coral reefs are also extremely valuable ecosystems providing livelihood for 1 billion people, and generate 2.7 trillions US Dollars from fisheries, coastal protection, tourism and recreation each year worldwide.

Nevertheless, coral reefs are rapidly declining due to various global and local factors such as overfishing, climate change, ocean acidification, pollution and unsustainable coastal development.

In this context, technological resources have been used for monitoring and analysing the state of coral reefs, and to allow biologists to obtain data in real-time to know about animals’ life story, behaviour, population, and survivorship, collecting valuable data that informs sound decision-making and management/conservation efforts.

Different studies show various approaches for collecting data for marine biodiversity conservation purposes, such as using Remotely Operated Vehicles, Autonomous Underwater Vehicles, and fixed underwater video cameras equipped with Video Analytics Services Platforms.

Most of these studies developed deep learning tools for rapid and large-scale automatic collection and annotation of marine data. However, these studies suggested that to improve current solutions, convolutional neural networks have to be optimised and backup power supplies must be improved.

Moreover, some studies also consider applying infrared cameras, which would enable night-time video capture to create a complete picture of the coral ecosystem. In Africa, however, little or no research has focused on these approaches to apply advanced technology to research marine ecology conservation.

Outcomes

In the long-term, resolving this question will help gain insight on the ecological processes around artificial reefs (particularly important in the context of the oil and gas developments occurring in Mozambique and which will warrant the implementation of reef restoration measures).

Further, this system will be helpful to develop many other research projects which require long periods of observation in remote reefs where permanent and nighttime access is limited. Additionally, this project will create capacity in the young mozambican research community regarding the application of Artificial Intelligence technologies to tackle marine conservation issues.

Vision

This project is an opportunity to pioneer the development of new technologies that will ultimately support conservation effort through enhanced data collection and processing.

The vision is to improve data collection capacity by building on top of already existing systems, namely by developing a different mechanism to provide power supply capable of maintaining such systems in coral reefs located more than a few kilometres from shore by using floating solar panels instead.

In the long-run, the project will be replicated for different coral reefs to allow biologists to obtain data in real-time and learn about animals’ life story, behaviour, and population dynamics. In addition, multiple units would be deployed at several locations to allow for more comprehensive research or monitoring reefs from various angles.

Personnel

Erwan Sola, PhD ( Project Lead), Investigator in the Marine Ecology Department, Faculty of Natural Science, Lúrio University, Mozambique. Experience in project coordination. Coral biology specialist. Extensive fieldwork experience on coral reefs. He will contribute to concept development, project coordination and ecological data analysis.

Luís Pina, MSc Computer Engineering Department, Faculty of  Engineering, Lúrio University, Mozambique. Luis Pina has his Master degree in Information Technology, with experience in developing classification models. He will contribute to this project through data pre-processing and developing classification models. Also, he will be involved in developing the object detection model.

Tiago Azevedo, PhD Candidate Department of Computer Science and Technology, University of Cambridge, United Kingdom. 4th-year Computer Science PhD student, with experience in developing Deep Learning and Machine Learning models in real-world settings. He will contribute to this project through support in coding the object detection model.

Lourenço Matandire , BSc Mechanical Engineering Department, Faculty of Engineering, Lúrio University, Mozambique. Lourenco Matandire is a Mechatronics Engineering that will be responsible for creating and providing the assessment to the Flexible Underwater Observatory (FUO) and managing its power supply.

Boaventura Manhique, BSc Computer Engineering Department, Faculty of Engineering, Lúrio University, Mozambique. Bonaventure Manhique is a Computer Engineering in Networking with a deep understanding of electronics. He will be responsible for maintaining and managing all means of communication and information sharing between the FUO and the biologists.

What is the overall scope of the project?

The project will deliver three main components from research in natural language processing, dataset creation, and policy creation:

  1. Fellowship for African AI researchers focused on African languages, based on previously funded work on language datasets. This work contributes to a roadmap for better integration of African languages on digital platforms in aid of lowering the barrier for African participation in the digital economy,
  2. Improvement of the representation of AI research carried out on African languages by creating resources for a variety of NLP tasks and in a variety of African languages that will enable good, data-driven results in AI research,
  3. Attract an African community of native speakers as contributors of language resources and language technology tools to adopt and support Masakhane NLP, a platform for sharing, maintaining and making use of language resources and tools; establishing widely agreed benchmarks for NLP tasks and stimulating competition between methods and systems,
  4. Be used as a model case to inform African evidence-based policymaking concerning Artificial Intelligence and will be included in UNESCO’s AI Decision maker’s Essential to inform policymakers.

This project is lead by 29 researchers, covers 9 African languages, spoken across 22 countries, reaching 300 million speakers

What are the results at the moment?

Result 1: African Language Datasets

Below, we detail the languages of datasets created. Delivering the project Cracking the Language Barrier for a Multilingual Africa[1] includes a Fellowship for Low Resource African Languages in order to develop datasets and strengthen capacities and innovation by building specific datasets[2]:

  1. Ewe language[3] and Fongbe language[4] parallel text dataset for NM Translation
  2. Yoruba language[5] Machine Translation dataset
  3. Chichewa language[6] document classification datasets
  4. Wolof language[7] (link) text-to-speech dataset for
  5. Kiswahili language[8] (link) document classification datasets
  6. Tunisian Arabizi language[9] sentiment analysis dataset
  7. Swahili: News Classification Dataset[10]
  8. Twi language[11]
  9. Luganda language[12]

Result 2: African Language Dataset Challenges

These AI4D datasets are further made into competitions of five NLP challenges hosted on Zindi as part of this AI4D’s ongoing African language NLP project, which is a continuation of the African language dataset challenges we hosted in 2020[13]. The competition engagement in the challenges shows a total of 153,926 unique page views across 111 countries for the implementation of the challenges: Tunizi Arabizi, Yoruba, Ewe & Fongbe, Chichewa. Wolof went live on 12 Feb 2020 so statistics are not available yet. The current statistics of each challenge are the following:

  • AI4D iCompass Social Media Sentiment Analysis for Tunisian Arabizi objective[17]  (20 November 2020—29 March 2021): 539 Data scientists enrolled, 213 Data scientists on the leaderboard, 3 630 Submissions, Accuracy score: 0.94
  • AI4D Malawi News Classification Challenge challenge[18] (22 January—10 May): 218 Data scientists enrolled, 69 Data scientists on the leaderboard, 686 Submissions, Accuracy score: 0.64
  • AI4D Takwimu Lab – Machine Translation Challenge challenge[19] (18 December 2020—26 April 2021): 134 Data scientists enrolled, 11 Data scientists on the leaderboard, 142 Submissions, BLEU score: 0.35
  • AI4D Yorùbá Machine Translation Challenge challenge[20] (4 December 2020—12 April 2021): 314 Data scientists enrolled, 33 Data scientists on the leaderboard, 285 Submissions, BLEU score: 0.43
  • AI4D Baamtu Datamation – Automatic Speech Recognition in WOLOF challenge[21]  (12 February—24 May)

Result 3: Speech-to-text Platform for African Languages

The objective of this third part of the project is to build a Wolof text-to-speech system, to be extended into a general platform for all African languages as part of the Masakhane platform. The project will exploit a dataset of 40000 Wolof phrases uttered by two actors. This open-source dataset is a deliverable of a previous project. The project will be conducted following four phases:

  1. Evaluation of the quality of the dataset
  2. Implementation of a machine learning model mapping Wolof texts into their corresponding utterances
  3. Quantitative and qualitative evaluation of the implemented model’s performances
  4. Development of an API exposing  implemented text to speech model

What are we specifically trying to achieve?

Advances in speech and language technologies now enable tools such as voice search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English or Chinese.

Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This work details the AI4D – African Language Program, a 3-part project that will:

  1. Incentivise the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge
  2. Support research fellows for a period of 3-4 months to create datasets annotated for NLP tasks
  3. Host competitive Machine Learning challenges on the basis of these datasets.

 

Project range for Low Resource African Languages

How do Language Technologies relate to UNESCO?

Languages, with their complex implications for identity, cultural diversity, spirituality, communication, social integration, education and development, are of crucial importance for people, prosperity and the planet.

People not only embed in languages their history, traditions, memory, traditional knowledge, unique modes of thinking, meaning and expression, but more importantly they also construct their future through them.

In this context, Language Technologies (LT), greatly contribute to the promotion of linguistic diversity and multilingualism. These technologies are moving outside research laboratories into numerous applications in many different areas. UNESCO’s International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide, organized in December 2019, underlined spelling/grammar checkers up to speech and speaker recognition, machine translation for text and audio, speech synthesis, and spoken dialogue among others as important areas for enabling linguistic diversity and multilingualism.

In addition, the Los Pinos Declaration on the Decade of Indigenous Languages (2022-2032) calls for the design and access to sustainable, accessible, workable and affordable language technologies and places indigenous peoples at the centre of its recommendations under the slogan “Nothing for us without us.”

What is the African Languages Programme?

The AI4D – African Language Program was conceptualized as part of a roadmap to work towards better integration of African languages on digital platforms, in aid of lowering the barrier of entry for African participation in the digital economy. It was organised in 3 key phases.

Phase 1: Language Dataset Challenges

The AI4D Language Dataset Challenges were framed to focus on data collection, in response to the challenges of low availability of input data for African languages and the poor discoverability of resources that do exist, thus hindering the ability of researchers to do machine translation [14], and other NLP tasks.

We put together a panel of judges and performed both qualitative and quantitative evaluations, based on datasheets [2] submitted with each datasets and the datasets themselves, to identify outstanding submissions. The two rounds of this challenge yielded 52 dataset submissions with 13 winning prizes.

Some of the key observations identified from this phase of the project include [3]:

  • Teams composed of individuals from relevant multi-disciplinary backgrounds, including computer scientists, professional translators and linguists, were able to create and annotate datasets that captured fundamental lexical and semantic nuances of languages.
  • The challenge framing allowed for anyone to participate. While useful as an exercise in evaluating the interest in such a challenge, high quality submissions came from teams who had been exposed to NLP research work.
  • Since the challenge was evaluated monthly, we often received disparate submissions from the same teams. Instead, one large dataset built over a couple of months would have been the ideal outcome.

These observations motivated the design of a subsequent phase of the project, the Fellowship.

Phase 2: Language Dataset Fellowships

From the top teams that participated in the challenges, we invited nine to take part in a subsequent phase of the program, a 3-4 month Fellowship Program. This provided research grants for teams to invest in resources, gave enough time for collaborative consultations to determine the sizes of expected datasets, the downstream NLP tasks they would be annotated for as well as mentorship and advisory requirements.

The datasets developed through this process cover a variety of languages and NLP tasks; Machine Translation datasets (Ewe, Fonge, Yoruba, Luganda, Twi and the 11 official languages of South Africa), a Text-to-Speech dataset (Wolof), a Sentiment Analysis dataset (Tunisian Arabizi), a Keyword Spotting dataset (Luganda) and Document Classification datasets (Chichewa and Kiswahili).

The Fellowship also presented a platform to tackle some opportunities identified to support future work in African, and low resource, language dataset creation, including research and analysis of the legal implications of obtaining textual, visual and audio data from a variety of online sources, and the development of copyright, intellectual property and data protection guidelines for NLP researchers.

These guidelines will be published in addition to research papers from the individual fellows on their particular dataset development work.

Phase 3: Machine Learning Competitions

This final phase, which is still in progress, will involve design and split of the datasets into train, development and test sets; the preparation of datasheets to document the motivation, composition, collection process and recommended uses of the datasets and hosting ML competitions on Zindi, an African data science competition platform, so as to engage the wider NLP community.

What is our research ambition?

 The objective of this work is to create good quality African language datasets for a variety of Natural Language Processing and Speech Processing tasks. This work is in support of Masakhane, a research effort for NLP for African languages. Masakhane as a resource is open source, continent-wide, distributed and online. It is a community of researchers working on a wide variety of African languages, some of whom are funded to build datasets via this AI4D fellowship.

Some of the challenges for the development of NLP for African languages identified by researchers in Africa include (Martinus and Abbott 2019):

  • Low availability of resources (input data) for African languages that hinders the ability for researchers to do machine translation.
  • Discoverability: The resources for African languages that do exist are hard to find. Often these resources are not available under open access licenses thus reducing the ability of research institutions to work together and share knowledge on language datasets to strengthen innovation.
  • Reproducibility: The data and code of existing research are rarely shared, which means researchers cannot reproduce the results properly.
  • Lack of benchmarks: Due to the low discoverability and the lack of research in the field, there are no publicly available benchmarks or leader boards to compare new machine translation techniques to old ones.

What kind of impact do we want to achieve?

 The project aims at addressing some of the challenges identified above through:

  • Development of datasets of African languages that maybe used in some countries only or have transboundary usage that can be used for strengthening access to information and spur innovation based on NLP technologies
  • Enhancement of capacities among young researchers for the development of open languages datasets and language tech applications through development of guidelines and training through open educational resources in collaboration with national institutions
  • Development of a multi stakeholder network for strengthening research on language technology based on AI techniques for African languages
AI4D language profiles for Low Resource African Languages
AI4D language profiles for Low Resource African Languages

How do we plan on implementing this project?

Supported by AI4D-Africa and the University of Pretoria’s Data Science for Social Impact Research Group and Knowledge 4 All Foundation, the first month will involve consultations alongside the teams to determine a number of factors:

  • Dataset
    • Language
    • Downstream task scoping
    • Expected sizes of datasets
    • Preparation and documentation process
  • Deliverables
    • Monthly targets and check-ins for guidance
    • Workshop/Conference paper for publication documenting the process

The teams will then have 3 months to further flesh out their datasets, after which, they will be expected to prepare write-ups of their research for publication. Once the dataset creation phase is complete, the datasets will be used in ML challenges hosted in Zindi and evaluated on the downstream task that each dataset has been prepared for. UNESCO funding will be used to promote the work of Fellows 1-5, and GIZ funding will be used to promote Fellows 6-8.

Scale and range for AI4D project in Low Resource African Languages
Scale and range for AI4D project in Low Resource African Languages

Who are the Fellows?

  • Fellow 1: Amelia Taylor – Chichewa language
  • Fellow 2: Takwimu Lab – Fongbe and Ewe language
  • Fellow 3: David Adelani – Yoruba language
  • Fellow 4: Baamtu Datamation –Wolof language
  • Fellow 5: iCompass – Tunisian Arabizi language
  • Fellow 6: Davis David – Kiswahili language
  • Fellow 7: Ari Ramkilowan and Masabata Mokgesi-Selinga – the 11 national South African languages
  • Reseach Fellow 8: Makerere University – Luganda language
  • Fellow 9: GhanaNLP – Twi language

What are the partners involved?

This work has been sponsored through a partnership between several organisations, listed below in non-alphabetical rder;

  • Knowledge 4 All Foundation
  • AI4D Africa Initiative
  • Zindi platform
  • Data Science for Social Impact Research Group, University of Pretoria (DFSI)
  • Centre for Intellectual Property and Information Technology (CIPIT), Strathmore University
  • United Nations Educational, Scientific and Cultural Organization (UNESCO)
  • GIZ – Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH
  • IDRC – International Development Research Centre
  • UNESCO Chair in Artificial Intelligence at University College London

References

[1] Cracking the Language Barrier for a Multilingual Africa https://www.k4all.org/project/language-dataset-fellowship/

[2] African Natural Language Processing https://zenodo.org/communities/africanlp/?page=1&size=20

[3] Ewe language https://www.k4all.org/project/ewe-database/

[4] Fongbe language https://www.k4all.org/project/database-fongbe/

[5] Yoruba language https://www.k4all.org/project/database-yoruba/

[6] Chichewa language https://www.k4all.org/project/database-chichewa/

[7] Wolof language https://www.k4all.org/project/database-wolof/

[8] Kiswahili language https://www.k4all.org/project/database-kiswahili/

[9] Tunisian Arabizi language https://www.k4all.org/project/database-tunisian-arabizi/

[10] Swahili language https://www.k4all.org/project/database-kiswahili/

[11] Twi language https://www.k4all.org/project/twi-language/

[12] Luganda language https://www.k4all.org/project/luganda-language/

[16] Zindi and AI4D build language datasets for African NLP https://zindi.medium.com/zindi-and-ai4d-build-language-datasets-for-african-nlp-34a4d0ea129

[17] AI4D iCompass Social Media Sentiment Analysis for Tunisian Arabizi https://zindi.africa/competitions/ai4d-icompass-social-media-sentiment-analysis-for-tunisian-arabizi

[18] AI4D Malawi News Classification Challenge https://zindi.africa/competitions/ai4d-malawi-news-classification-challenge

[19] AI4D Takwimu Lab – Machine Translation Challenge https://zindi.africa/competitions/ai4d-takwimu-lab-machine-translation-challenge

[20] AI4D Yorùbá Machine Translation Challenge https://zindi.africa/competitions/ai4d-yoruba-machine-translation-challenge

[21] AI4D Baamtu Datamation – Automatic Speech Recognition in WOLOF https://zindi.africa/competitions/ai4d-baamtu-datamation-automatic-speech-recognition-in-wolof

[1] Martinus, J. Webster, J. Moonsamy, M. S. Jnr, R. Moosa, and R. Fairon. Neural machine translation for south africa’s official languages. arXiv preprint arXiv:2005.06609, 2020.

[2] Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H. Daumé III, and K. Crawford. Datasheets for datasets. arXiv preprint arXiv:1803.09010, 2018.

[3] Siminyu, S. Freshia, J. Abbott, and V. Marivate. Ai4d–african language dataset challenge. arXiv preprint arXiv:2007.11865, 2020.

Partners

Partners in Cracking the Language Barrier for a Multilingual Africa
Partners in Cracking the Language Barrier for a Multilingual Africa

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

Language profile: Ewe

Language profile for Ewe
Language profile for Ewe

Overview

Ewe (Èʋe or Èʋegbe [èβeɡ͡be]) is a Niger–Congo language spoken in Togo and southeastern Ghana by approximately 4.5 million people as a first language and a million or so more as a second language.[1] Ewe is part of a cluster of related languages commonly called Gbe; the other major Gbe language is Fon of Benin. Like many African languages, Ewe is tonal.

Pertinence

In Togo and Ghana where ewe is heavily spoken, it is the main communication medium in major economic hubs especially in Togo where it is the most spoken language in the capital city Lome and also one of the two national languages of the country. In Ghana, ewe is part of the 11 government-sponsored languages apart from the official language english.[2] In 2020, the majority of children growing up in major cities in the Togo still picks up as their first language, a dialect of ewe depending on the region of the country they are from. The majority of the speakers at this day can speak ewe but not write it with the appropriate alphabet and orthography. The written communications in ewe usually happen using the english/french alphabet to write the sounds made by the words. Some schools in the capital offer ewe courses at secondary school level but those are generally optional and focus only on basics.

Nevertheless, in Togo, while the communication in schools and formally registered companies takes place in french, ewe remains the most used language in critical settings such as :

  • Market places
  • Medical centers
  • In apprenticeship for a major array of occupations such as hairdressing, tailoring, engine repairing, carpenting, agriculture among other manual jobs that make up 90% of the jobs and 30% of the GDP of the country. [3]
  • at police stations
  • in banking or telecommunications agencies
  • in shops and restaurants

Existing Work

Apart from sparse efforts of actors in the academic and literary fields and from some associations, there has not been any federated effort from the togolese government. Wycliffe-togo [4] is however one of the most prominent associations in the country organizing events and doing work to promote local languages. There exist a few ewe-english/french dictionaries online but the most popular ones remain the glosbe dictionary on the web [5], the Kasahorow Evegbe English Dictionary [6] and the mobile Ewe Dictionary [7] on the Android play store. In the academic world, a lot of work has been done especially regarding the tone, the syntax but also on other aspects such as the anthropological, lexicographical and phonological domains by both foreign and local (ghanain) researchers. [1]

Example of sentence in Ewe

Ewe : Ne ati aɖe le nya dim ɣesiaɣi le fíá wo ŋuti la, mumu ye le dzrom.

English : A tree which provokes axes wishes to be cut down.

Researcher Profile: Kevin Degila

Kevin is a Machine Learning Research Engineer at Konta, an AI startup based in Casablanca. he holds an engineering degree in Big Data and AI and it’s currently enrolled in a PhD program focused on business document understanding at Chouaib Doukkali University. In his day to day activities, Kevin train, deploy and monitor in production machine learning models. With his friends, they lead TakwimuLab, an organisation working on training the next young, french speaking, west africans talents in AI and solving real-life problems with their AI skills. In his spare time, Kevin also create programming and AI educational content on Youtube and play video games.

Researcher Profile: Momboladji Balogoun

Momboladji BALOGOUN is the Data Analyst of Gozem, a company providing ride-hailing and other services in West and Central Africa. He is a former Data Scientist at Rintio, an IT startup based in Benin, that uses data and AI to create business solutions for other enterprises. Momboladji holds a M.Sc. degree in Applied Statistics from ICMPA UNESCO Chair, Cotonou, and migrated to the Data Science field after having attended a regional Big Data Bootcamp in his country Benin. He aims to pursue a Ph.D. program on low resources languages speech to speech translation. Bola created Takwimu LAB in August 2019, and he leads it currently with 3 other friends in order to promote Data Science in their countries, but also the creation and the use of AI to solve real-life problems in their communities. His hobbies are: Reading, Documentaries, and Tourism.

Researcher Profile: Godson Kalipe

Godson started in the IT field with software engineering with a specialization on mobile applications. After his bachelor in 2015, he worked for a year as web and mobile application developer before joining a master in India in Big Data Analytics. His master thesis consisted comparative analysis of international news impact on economic indicators of African countries using news Data, Google Cloud storage and visualization assets. After his Master,

in 2019, he gained a first experience as Data Engineer creating data ingestion pipelines for real time sensor data at Activa Inc, India. He parallely has been working with Takwimu Lab on various projects with the aim of bringing AI powered solutions to common african problems and make the field more popular in the west African francophone industry.

Researcher Profile: Jamiil Toure

Jamiil is a design engineer in electrical engineering from Ecole Polytechnique d’Abomey-Calavi (EPAC), Benin in 2015 and a master graduate in mathematical sciences from African School of Mathematical Sciences (AIMS) Senegal in 2018. Passionate of languages and Natural Language Processing (NLP), he contributes to the Masakhane project by working on the creation of a dataset for the language Dendi.

Meanwhile, he complements his education on NLP concepts via online courses, events, conferences for a future research career in NLP. With his friends at Takwimu Lab they work at creating active learning and working environments to foster the applications and usages of AI to tackle real-life problems. Currently, Jamiil is a consultant in Big Data at Cepei – a think tank based in Bogota that promotes dialogue, debate, knowledge and multi-stakeholder participation in global agendas and sustainable development.

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

 

Language profile: Fongbe

Language profile for Fongbe
Language profile for Fongbe

Overview

Fon or fɔ̀ngbè is a low resource language, part of the Eastern Gbe language cluster and

belongs to the Volta–Niger branch of the Niger–Congo languages. Fongbe is spoken in Nigeria, Togo and mainly in Benin by approximately 4.1 million speakers. Like the other Gbe languages, Fongbe is an analytic language with an SVO basic word order. It’s also a tonal language and contains diacritics which makes it difficult to study. [1]

The standardized Fongbe language is part of the Fongbe cluster of languages inside the Eastern Gbe languages. In that cluster, there are other languages like Goun, Maxi, Weme, Kpase which share a lot of vocabulary with the Fongbe language. Standard Fongbe is the primary target of language planning efforts in Benin, although separate efforts exist for Goun, Gen, and other languages of the country. To date, there are about 53 different dialects of the Fon language spoken throughout Benin.

Pertinence

Fongbe holds a special place in the socio economic scene in Benin. It’s the most used language in markets, health care centers, social gatherings, churches, banks, etc.. Most of the ads and some programs on National Television are in Fongbe. French used to be the only language of education in Benin, but in the second decade of the twenty first century, the government is experimenting with teaching some subjects in Benin schools in the country’s local languages, among them Fongbe.

Example of Fongbe Text:

Fongbe : Mǐ kplɔ́n bo xlɛ́ ɖɔ mǐ yí wǎn nú mɛ ɖevo lɛ

English : We have learned to show love to others [3]

Existing Work

Some previous work has been done on the language. There are doctorate thesis, books, French to Fongbe and Fongbe to French dictionaries, blogs and others. Those are sources for written fongbe language.

Researcher Profile: Kevin Degila

Kevin is a Machine Learning Research Engineer at Konta, an AI startup based in Casablanca. he holds an engineering degree in Big Data and AI and it’s currently enrolled in a PhD program focused on business document understanding at Chouaib Doukkali University. In his day to day activities, Kevin train, deploy and monitor in production machine learning models. With his friends, they lead TakwimuLab, an organisation working on training the next young, french speaking, west africans talents in AI and solving real-life problems with their AI skills. In his spare time, Kevin also create programming and AI educational content on Youtube and play video games.

Researcher Profile: Momboladji Balogoun

Momboladji BALOGOUN is the Data Analyst of Gozem, a company providing ride-hailing and other services in West and Central Africa. He is a former Data Scientist at Rintio, an IT startup based in Benin, that uses data and AI to create business solutions for other enterprises. Momboladji holds a M.Sc. degree in Applied Statistics from ICMPA UNESCO Chair, Cotonou, and migrated to the Data Science field after having attended a regional Big Data Bootcamp in his country Benin. He aims to pursue a Ph.D. program on low resources languages speech to speech translation. Bola created Takwimu LAB in August 2019, and he leads it currently with 3 other friends in order to promote Data Science in their countries, but also the creation and the use of AI to solve real-life problems in their communities. His hobbies are: Reading, Documentaries, and Tourism.

Researcher Profile: Godson Kalipe

Godson started in the IT field with software engineering with a specialization on mobile applications. After his bachelor in 2015, he worked for a year as web and mobile application developer before joining a master in India in Big Data Analytics. His master thesis consisted comparative analysis of international news impact on economic indicators of African countries using news Data, Google Cloud storage and visualization assets. After his Master,

in 2019, he gained a first experience as Data Engineer creating data ingestion pipelines for real time sensor data at Activa Inc, India. He parallely has been working with Takwimu Lab on various projects with the aim of bringing AI powered solutions to common african problems and make the field more popular in the west African francophone industry.

Researcher Profile: Jamiil Toure

Jamiil is a design engineer in electrical engineering from Ecole Polytechnique d’Abomey-Calavi (EPAC), Benin in 2015 and a master graduate in mathematical sciences from African School of Mathematical Sciences (AIMS) Senegal in 2018. Passionate of languages and Natural Language Processing (NLP), he contributes to the Masakhane project by working on the creation of a dataset for the language Dendi.

Meanwhile, he complements his education on NLP concepts via online courses, events, conferences for a future research career in NLP. With his friends at Takwimu Lab they work at creating active learning and working environments to foster the applications and usages of AI to tackle real-life problems. Currently, Jamiil is a consultant in Big Data at Cepei – a think tank based in Bogota that promotes dialogue, debate, knowledge and multi-stakeholder participation in global agendas and sustainable development.

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

 

Language profile: Yoruba

Language profile for Yoruba
Language profile for Yoruba

Overview

The Yorùbá language is the third most spoken language in Africa, and is native to the south-western Nigeria and the Republic of Benin in West Africa (as shown in Figure 1). It is one of the national languages in Nigeria, Benin and Togo, and it is also spoken in other countries like Ghana, Côte d’Ivoire, Sierra Leone, Cuba, Brazil and by a significant Yorùbá diaspora population in the US and United Kingdom mostly from the Nigerian ancestry. The language belongs to the Niger-Congo family, and is spoken by over 40 million native speakers [1].

Yorùbá has several dialects but the written language has been standardized by the 1974 Joint Consultative Committee on Education [2], it has 25 letters without the Latin characters (c, q, v, x and z) and with additional characters (ẹ , gb, ṣ , and ọ). There are 18 consonants (b, d, f, g, gb, j, k, l, m, n, p, r, s, s., t, w y), and 7 oral vowels (a, e, ẹ , i, o, ọ , u). Yorùbá is a tonal language with three tones: low, middle and high.

These tones are represented by the grave (“\”), optional macron (“- ”) and acute (“/”) accents respectively. These tones are applied on vowels and syllabic nasals, but the mid tone is usually ignored in writings. The tones are represented in written texts along with a modified Latin alphabet. A few alphabets have underdots (i.e. “ẹ ”, “ọ ”, and “ṣ”), we refer to the tonal marks and underdots as diacritics. It is important to note that tone information is needed for correct pronunciation and to have the meaning of a word [2, 3].

As noted in [4], most of the Yorùbá texts found in websites or public domain repositories either use the correct Yorùbá orthography or replace diacriticized characters with un-diacriticized ones.

Oftentimes, articles written online including news articles1 like BBC and VON ignore diacritics. Ignoring diacritics makes it difficult to identify or pronounce words except they are in a context.  For example, owó (money), ọwọ̀  (broom), òwò (business), ọ̀wọ̀ (honour), ọwọ́  (hand), and ọ̀wọ́ (group) will be mapped to owo without diacritics.

Existing work

Due to the problem with the diacritics in Yorùbá language, it has greatly reduced the amount of available parallel texts that can be used for many NLP tasks like machine translation. This has led to research on automatically applying diacritics to Yorùbá texts [5, 6], but the problem has not been completely solved. We will divide the existing work on Yorùbá language into four categories:

Automatic Diacritics Application

The main idea for the automatic diacritic application (ADA) model is to predict the correct diacritics of a word based on the context it appears. We can make use of a sequence-to-sequence deep learning model like Long Short Term Memory networks (LSTM) [7] to achieve this task.

The task is similar to a machine translation task where we need to translate from a source language to a target language, ADA takes a source text that is non-diacriticized (e.g “bi o tile je pe egbeegberun ti pada sile”) and outputs target texts with diacritics (e.g. “bí ó tilẹ̀ jẹ́ pé ẹgbẹẹgbẹ̀rún ti padà síléé”). The first attempt of applying deep learning models to Yorùbá ADA was by Iroro Orife [5].

They proposed a soft-attention seq2seq model to automatically apply diacritics to Yorùbá texts, their model was trained on the Yorùbá bible, Lagos-NWU speech corpus and some language blogs. However, the model does not generalize to other domains like dialog conversation and news domain because the majority of the texts are from the Bible. Orife et al [6] recently addressed the issue of domain-mismatch by gathering texts from various sources like conversation interviews, short stories and proverbs, books, and JW300 Yorùbá texts but they evaluated the performance of the model on the news domain (i.e Global Voices articles) to measure domain generalization.

Word Embeddings

Word embeddings are the primary features used for many downstream NLP tasks. Facebook released FastText [8] word embeddings for over 294 languages 2 but the quality of the embeddings are not very good. Recently, Alabi et. al [9] showed that Facebook’s FastText embeddings for Yorùbá gives a lower performance in word similarity tasks, which indicates that they would not work well for many downstream NLP tasks. They released a better quality FastText embeddings and contextualized BERT [10] embeddings obtained by fine-tuning multi-lingual BERT embeddings.

Datasets for Supervised Learning Tasks

Yorùbá, like many other low-resourced languages, does not have many supervised learning datasets such as named entity recognition (NER), text classification and parallel sentences for machine translation. Alabi et al. [9] created a small NER dataset with 26K tokens. Through the support of AI4D 3 and Zindi Africa 4, we have created parallel English-Yorùbá dataset for machine translation and news title classification dataset for Yorùbá from articles crawled from BBC Yorùbá 5. The summary of the AI4D dataset creation competition is in [11].

Machine Translation

Commercial machine translation models like Google Translate 6 exist for Yorùbá  to other languages but the quality  is not very good because of the diacritics problem and the small amount of data available to train a good neural machine translation (NMT) model. JW300[12] based on Jehovah Witness publications is another popular dataset for training NMT models for low-resource African languages, it has over 10 million tokens of Yorùbá texts. However, the NMT models trained on JW300, do not generalize to other non-religious domains. There is a need to create more multi-domain parallel datasets for Yorùbá language.

Researcher Profile: David Adelani

David Ifeoluwa Adelani is a doctoral student in computer science at Spoken Language Systems Group, Saarland Informatics Campus, Saarland University, Saarbrücken, Germany. His current research focuses on the security and privacy of users’ information in dialog systems and online social interactions.

He is also actively involved in the development of natural language processing datasets and tools for low-resource languages, with special focus on African languages. He has published a few papers in top Web technology, language and speech conferences including The Web Conference, LREC, and Interspeech.

During his graduate studies, he conducted research on social computing at the Max Planck Institute of Software Systems, Germany and on fake review detection at the National Institute of Informatics, Tokyo, Japan. He holds an MSc in Computer Science from the African University and Science and Technology, Abuja, Nigeria and a BSc in Computer Science from the University of Agriculture, Abeokuta, Nigeria.

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

 

Language profil: Chichewa

Language profile for Chichewa
Language profile for Chichewa

What is Chichewa?

Chichewa is part of the Niger-Congo Bantu group and it is one of the most spoken indigenous languages of Africa. Chichewa is both an individual dialect and a language group as we shall discuss in this short article.

The language, Chichewa, also written as Cichewa, or, in Zambia, Cewa, is the native language of the Chewa. The word ‘chi’ or ‘ci’ is a Bantu prefix used for the tribal name, designating the language rather than the geographical region of the tribe. The word Chewa is the name of a group of people. Chichewa is called Chinyanja, for example in Zambia and Mozambique. Chinyanja was also the old name for the language in Malawi, before the country became a Republic. During that time, as a British Protectorate, Malawi was called Nyasaland.

Chichewa, with the code ‘ny’ is also one of the 13 African languages with a Google automatic translation. The code ‘ny’ was most likely chosen because the language was known first as Chinyanja. This probably reflects the availability of written text in Chichewa compared to other African languages. However, as we will discuss in this article, there are several dialects of Chichewa which differ from each other in noticeable ways. I do not know whether this was taken into account for the text used in the  machine language models by Google. But this is a whole new interesting topic in itself!

Who are the Chewa?

The Chewa are a Bantu speaking people, traditionally described as the descendants of the Maravi, who in the 16th (some say, in the 14th) century migrated to the present day Malawi from the region now called Congo-Kinshasa. Most of what we know about the migrations of the Cewa come from oral tradition. Samuel Nthara collected some of the oral traditions in his book Mbiri ya Achewa, published in 1944. The name Maravi first appeared in Portuguese documents in 1661.

Nowadays, some of the well known districts in Malawi where the Chewa live are: Mchinji, Lilongwe, Kasungu, Nkhotakota, Dowa and Dedza. The consensus is that the Chewa of the mainland kept their name as Chewa and lived mainly in the Central Region. The Manganja are the Chewa who settled in the Southern region. And some Chewa groups who settled at the lake or around the Shire River in the south are called Nyanja. Man’ganja (or Maganja) is southern Chichewa as opposed to the language spoken in the Central Region (which was also called Western Chichewa / Nyanja). There are phonetical, grammatical and vocabulary differences between these dialects.

Where is Chichewa spoken?

In Malawi, Chichewa is widely understood. It was declared the national language in 1968 and it is viewed as a symbol of national unity by diverse groups. In Mozambique it is spoken especially in the provinces of Tete and Niassa, where it is referred to as Chinyanja.  In Zambia, it is spoken in Lusaka and in the Eastern Province (the language is referred to as Nyanja). The language spoken in Lusaka is sometimes called town-Nyanja as opposed to the Nyanja spoken in rural areas in other parts of Zambia, where it is referred to as deep-Nyanja. Nyanja is the language of the Police and the Army.  In Zimbabwe, according to some estimates, Chichewa is the third most widely used language after Shona and Ndebele. There is a sizable community of descendents from those who migrated to this area from Nyasaland during colonial times to work in the mines.

Chichewa is spoken in South Africa. There are a significant number of migrants from Malawi who work in mining, as domestic workers or in other industries.  There are radio services in Chichewa in Malawi, Zambia, South Africa and even in Ethiopia.

How many people speak the language?

According to sources quoted in Wikipedia, there are 12 million native speakers of Chichewa. A similar number is mentioned on the Joshua project website and includes Chichewa speakers from 8 countries of the world. This number seems then to refer to all the people who identify themselves as Chewa, Nyanja and Manganja, as these, according to the Malawi Population Census of 2018, make about 40% of the population in Malawi. However, in Malawi, the large ethnic groups of Lomwe, Yao and Ngoni have over the course of time adopted Chichewa as their native language.

It is the case that the number of people understanding and using Chichewa is much higher than the 12 million native speakers. Like Swahili, Chichewa is considered by some a universal language, a common skill enabling people of varying tribes and those living in Malawi, Zambia, Mozambique to communicate without following the strict grammar of specific local languages. In Zambia, many of those whose mother tongue is now Chinyanja have come to consider themselves Ngoni; Nyanja is a lingua franca, being spoken by the police and the administration.

The Need for Datasets in Chichewa

As discussed, seven important facts provide impetus to the initiative to develop data set for Chichewa: (1) Chichewa is an important African language, (2) it is representative of the Niger Congo Bantu group of languages, (3) it is widely spoken, (4) it contains a considerable literature, more than other local African languages, (5) there are several methodological grammar and phonetics studies and (6) several translations from languages such as English and (7) it is spoken by old and young alike.

There has been an interest in developing digital tools for language documentation and natural language processing. Such initiatives have come from researchers involved in linguistics, such as those belonging to linguistics departments at universities in Malawi and Zambia. For example, in Malawi, we found the Chichewa monolingual dictionary corpus containing about 13,000 nouns or this one phonetically annotated short corpus.

The comparative online Bantu dictionary at Berkley includes a dataset for Chichewa, however, the project seems to have stalled in 1997. More recently, there has been an interest in creating datasets used in NLP tools and machine translation and, recently, according to Professor Kishindo, there is a PhD candidate at the University of Malawi interested in working on Machine Translation for Chichewa.

From our investigation, we observe that these datasets or tools tend to be kept in the private domain, are not regularly maintained, or are used only once, and are not well documented. However, their existence is important and it shows that there is a desire and need for such tools.

Conclusions

Chichewa is an important African language. There are differences between the main dialects of Chichewa and the language is undergoing continuous change. Improved methods for discovering online content and digitizing text can open new opportunities for organising Chichewa text into useful corpora. These can then be useful in linguistic work, in building tools for manipulating and comparing text, for finding and visualising connections between texts and for improving machine translation.

Chichewa continues to change as new terms are added to the vocabulary arising from technological needs for example. Its use by the younger generation creates new idioms and meaning, and the creative expressions through poetry and literature find venues online. Looking at language in new and novel ways using technology, can also help engage with the new generation in how they use, view and develop their language.

In this short article, we looked at the use of Chichewa and why we think it is important to build data sets for this language. We hope that this will be motivating and inspiring to others who are interested in this language or other African languages. This article was written as the author embarked on an AI4D Language Dataset Fellowship for putting together a Chichewa dataset. This is a small but important initiative aimed at engaging with the Machine Learning generation on the African continent. I am honoured to be a small part in the building of such datasets.

Researcher Profile: Amelia Taylor

Amelia graduated with a PhD in Mathematical Logic from Heriot-Watt University in 2006 where I was part of the ULTRA group. After that she worked as a research assistant on a project with Heriot-Watt University and the Royal Observatory in Edinburgh, aiming at developing an intelligent query language for astronomical data. From 2006 to 2013, Amelia also worked in finance in the City of London and Edinburgh – she built risk models for asset allocation and liability-driven investments. F

or the last 5 years, Amelia has been teaching programming and AI courses at the University of Malawi in the CIT and engineering department. Amelia also teaches research methodology and supervises MSc and PhD students. While my first interest in AI as an undergraduate was in the field of Natural Language Processing and intelligent query systems, she is interested in the other use of technology and AI for solving real-world problems.

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

 

Language profil: Wolof

Language profile for Wolof
Language profile for Wolof

Overview

Wolof /ˈwoʊlɒf/[4] is a language of Senegal, the Gambia and Mauritania, and the native language of the Wolof people. Like the neighbouring languages Serer and Fula, it belongs to the Senegambian branch of the Niger–Congo language family. Unlike most other languages of the Niger-Congo family, Wolof is not a tonal language.[1]

Pertinence

Wolof is spoken by more than 10 million people and about 40 percent (approximately 5 million people) of Senegal’s population speak Wolof as their native language. Increased mobility, and especially the growth of the capital Dakar, created the need for a common language. Today, an additional 40 percent of the population speak Wolof as a second or acquired language. In the whole region from Dakar to Saint-Louis, and also west and southwest of Kaolack, Wolof is spoken by the vast majority of the people. Typically when various ethnic groups in Senegal come together in cities and towns, they speak Wolof. It is therefore spoken in almost every regional and departmental capital in Senegal.[1]

Nevertheless, in Senegal, while the communication in schools and formally registered companies takes place in french, Wolof remains the most used language in critical settings such as :

  • Market places
  • Medical centers
  • In apprenticeship for a major array of occupations such as hairdressing, tailoring, engine
  • repairing, carpenting, agriculture among other manual jobs.
  • at police stations
  • in banking or telecommunications agencies
  • in shops and restaurants

Existing work

Senegalease Government has created a linguistic Department for Wolof and other local languages to promote the use of Wolof in some environments like school and also translation of different book in Wolof languages, but there is still a lot of work to have Wolof used in official documents and schools. There also exist some french-wolof dictionaries. In the academic world, some work has been done to better understand Wolof Phonemes[2], POS[3], automatic translation of wolof to french[4], Automatic Speech Recognition From a startup called BAAMTU.

Researcher Profile: Thierno Diop

Thierno Ibrahima DIOP My name is a computer science engineer. He is lead data scientist at Baamtu and passionate about NLP and everything that revolves around machine learning.he has been mentoring data scientist students and apprentices for two years.

Before getting into data science,he did a lot of freelancing in the development of web and mobile applications for local and international clients. he is co-founder of GalsenAI, an artificial intelligence community in Senegal, he is also ZINDI ambassador in Senegal and co-organizer of GDG Dakar.

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

 

Language profile: Kiswahili

Language profile for Kiswahili
Language profile for Kiswahili

Overview

Swahili (also known as Kiswahili) is one of the most spoken languages in Africa. It is spoken by 100–150 million people across East Africa. Swahili is spoken by countries such as Tanzania, Kenya, Uganda, Rwanda, and Burundi, some parts of Malawi, Somalia, Zambia, Mozambique and the Democratic Republic of the Congo (DRC).

Pertinence

In Tanzania, [1]Swahili is the official language and main communication medium for economic, social, and government activities across the country and it is the official language of instruction in all schools.

Swahili is popularly used as a second language by people across the African continent and taught in schools and universities. Swahili has been influenced by Arabic and even had an Arabic script during its early years., given its presence within the continent and outside.

Swahili is also one of the working languages of the African Union and officially recognized as a lingua franca of the East African Community. In 2018, South Africa legalized the teaching of Swahili in South African schools as an optional subject to begin in 2020. The Southern African Development Community (SADC) officially recognized the Swahili as their official language.

Existing work

In Tanzania, [2]Baraza la Kiswahili la Taifa (National Swahili Council, abbreviated as BAKITA) is a Tanzanian institution responsible for regulating and promoting the Kiswahili language. Key activities mandated for the organization include creating a healthy atmosphere for the development of Kiswahili, encouraging the use of the language in government and business functions, coordinating activities of other organizations involved with Kiswahili, standardizing the language.

BAKITA cooperates with organizations like [3]TATAKI in creation, standardization, and dissemination of specialized terminologies Other institutions can propose new vocabulary to respond to emerging needs but only BAKITA can approve usage. Also, BAKITA coordinates its activities with similar bodies in Kenya and Uganda to aid in the development of Kiswahili.

There exist different English to Swahili dictionaries online from [4]elimuyetu website and Swahili to English dictionaries online from [5]africanlanguages website and the mobile Swahili Dictionary [6] on the Android play store.

Researcher profile: Davis David

He graduated with a Bachelor’s Degree in Computer Science from the University of Dodoma in 2017 where I was a Co-organizer of Python Community during my time at university. After that, he worked as a Software Developer at TYD innovation Incubator developing different innovative systems to solve educational and economical challenges in Tanzania. Davis also worked as a Data scientist at ParrotAI developing different AI solutions focus on Agriculture, health, and finance.

He built computer vision models for classifying Banana Diseases from Leaf Images.. For the last 4 years, Davis has been teaching machine learning and data science across different universities, tech communities, and events with a passion to build a community of Data Scientists in Tanzania to solve local problems

He is also working with Zindi Africa as a Zindi Ambassador and a mentor in Tanzania, he organizes different machine learning hackathons across different cities in Tanzania and mentored different students and junior data scientists across Africa.

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

 

Language profile: Tunisian Arabizi

Language profile for Tunisian Arabizi
Language profile for Tunisian Arabizi

Overview

On Social Media, users tend to express themselves in their own local dialect. To do so, Tunisians use Tunisian Arabizi which consists in supplementing numerals to the Latin script rather than using the Arabic alphabet. [7] mentioned that 81\% of the Tunisian comments on Facebook used the Romanized alphabet.

In [8], a study was conducted on 1,2M social media Tunisian comments  (16M  words  and  1M  unique  words)  showed  that  53%  of  the  comments used the Romanized alphabet while 34% used Arabic alphabet and 13% used script-switching.

The study also mentioned that 87% of the comments based on the Romanized alphabet are TUNIZI, while the rest are French and English.  TUNIZI,  our  dataset  includes  100%  Tunisian  Arabizi  sentences  collected from people expressing themselves in their own local dialect using Latin characters and numerals.  TUNIZI is a Sentiment Analysis Tunisian Arabizi Dataset, collected, preprocessed, and annotated

Previous projects on Tunisian Dialect

In [1], a lexicon-based sentiment analysis system was used to classify the sentiment  of  Tunisian  tweets.   The author  developed  a  Tunisian  morphological analyzer to produce linguistic features and achieved an accuracy of 72.1% using the small-sized TAC dataset (800 Arabic script tweets). [2]  presented  a  supervised  sentiment  analysis  system  for  Tunisian  Arabic script tweets.

With different bag-of-word schemes used as features, binary and multiclass classifications were conducted on a Tunisian Election dataset (TEC)of  3,043  positive/negative  tweets  combining  MSA  and  Tunisian  dialect.

The support vector machine was found of the best results for binary classification with an accuracy of 71.09% and an F-measure of 63%. In  [3],  the  doc2vec  algorithm  was  used  to  produce  document  embeddings of Tunisian Arabic and Tunisian Romanized alphabet comments.

The generated embeddings were fed to train a Multi-Layer Perceptron (MLP) classifier where both the achieved accuracy and F-measure values were 78% on the TSAC (Tunisian  Sentiment  Analysis  Corpus)  dataset.

This  dataset  combines  7,366 positive/negative Tunisian Arabic and Tunisian Romanized alphabet Facebook comments.  The same dataset was used to evaluate Tunisian code-switching sentiment analysis in [5] using the LSTM-based RNNs model reaching an accuracy of 90%.

In [4], authors conducted a study on the impact on the Tunisian sentiment classification  performance  when  it  is  combined  with  other  Arabic  based  pre-processing tasks (Named Entities tagging,  stopwords removal,  common emoji recognition,  etc.).

A lexicon-based approach and the support vector machine model were used to evaluate the performances on the above-mentioned datasets (TEC and TSAC datasets).

In  order  to  avoid  the  hand-crafted  features  labor-intensive  task,  syntax-ignorant n-gram embeddings representation composed and learned using an unordered composition function and a shallow neural model was proposed in [6].The proposed model, called Tw-StAR, was evaluated to predict the sentiment on five Arabic dialect datasets including the TSAC dataset  [3].

We  observe  that  none  of  the  existing  Tunisian  sentiment  analysis  studies focused on the Tunisian Romanized alphabet which is the aim of this work.

Tunisian Arabizi vs Arabic Arabizi

Tunisian dialect, also known as “Tounsi” or “Derja”, is different from ModernStandard Arabic.  In fact,  Tunisian dialect features Arabic vocabulary spiced with  words  and  phrases  from  Tamazight,  French,  Turkish,  Italian  and  other languages [9].Tunisia is recognized as a high contact culture where online social networks play  a  key  role  in  facilitating  social  communication  [10].

].   To  illustrate  more, some examples of Tunisian Arabizi words translated to MSA and English are presented in Table 1.

 

TUNIZI MSA translation English Translation
3asslema مرحبا Hello
Chna7welek كيف حالك How are you
Sou2el سؤال Question
5dhit أخذت I took

Table 1: Examples of TUNIZI common words translated to MSA and English

Since some Arabic characters do not exist in the Latin alphabet, numerals, and multigraphs instead of diacritics for letters, are used by Tunisians when they write on social media. For instance, ”ch” is used to represent the character ش.

An example is the word شرير (wicked) represented as ”cherrir” in TUNIZI characters. After a few observations from the collected datasets, we noticed that Arabizi used by Tunisians is slightly different from other informal Arabic dialects such as Egyptian Arabizi.  This may be due to the linguistic situation specific to each country.  In fact, Tunisians generally use the French background when writing in Arabizi, whereas, Egyptians would use English.

For example, the word مشيت would be written as ”misheet” in Egyptian Arabizi, the second language being English.  However, because the Tunisian’s second language is French, the same word would be written as ”mchit”.In Table 2, numerals and multigraphs are used to transcribe TUNIZI char-acters that compensate the absence of equivalent Latin characters for exclusively Arabic Arabic sounds.

They are represented with their corresponding Arabic characters and Arabizi characters in other countries.  For instance, the number 5 is used to represent the character خ in the same way as the multigraph ”kh”.

For example, the word ”5dhit” is the representation of the word أخذت as shown in Table 1.  Numerals and multigraphs used to represent TUNIZI are different from those used to represent Arabizi.  As an example, the word غالية (expensive) written as ”ghalia” or ”8alia” in TUNIZI corresponds to ”4’alia” in Arabizi.

 

Arabic Arabizi TUNIZI
ح 7 7
خ 5 or 7’ 5 or kh
ذ d’ or dh dh
ش $ or sh ch
ث t’ or th or 4 th
غ 4’ gh or 8
ع 3 3
ق 8 9

Table 2: Special Tunizi characters and their corresponding Arabic and Arabizi characters

Tunizi Uses

TUNIZI dataset can be used for Sentiment Analysis projects dedicated for other underrepresented Maghrebian dialects, such as the Libyan, Moroccan or Algerian because of similarities of the dialects.  Also, this dataset can be used also for other NLP projects, such as chatbots.

Tunizi in the industry

TUNIZI dataset is used in all iCompass products that are using the Tunisian Dialect.  TUNIZI is used in a Sentiment Analysis project dedicated for the e-reputation and also for all Tunisian chatbots that are able to understand the Tunisian Arabizi and reply using it.

Researcher Profile: Chayma Fourati

Chayma Fourati is an AI R&D Engineer at iCompass. She is a graduate of Software Engineering (June 2020) from the Mediterranean Institute of Technology in Tunisia. She had her final year project at iCompass where she participated in most of the R&D projects. She was invited as a speaker at a webinar during the covid-19 crisis in March 2020 to talk about African IT solutions in fighting the Covid-19 through the latest AI Technologies.

During her last academic years, in both internships and university classes, she developed her skills in the AI field, and at iCompass, in the NLP field. During her final year internship at iCompass, she published a paper with two teammates at iCompass in the ICLR 2020 workshop. Her current research intersts include Natural Language Processing, Neural Networks and Deep Learning.

Researcher Profile: Hatem Haddad

Hatem Haddad is Co-Founder, CTO and RD director of iCompass. He received a doctorate in Computer Science (2002) from University Grenoble Alpes, France. He occupied assistant professor positions at Grenoble Alpes university (France), NTNU (Norway), at UAEU (EAU), at Sousse university (Tunisia), at Mevlana university (Turkey) and at ULB (Belgium). He worked for industrial corporations in R&D at VTT Technical Research Centre of Finland and Institute for Infocomm Research, Image Processing and Applications Lab of Singapore.

He was an invited researcher at Leibniz-Fachhochschule School of Business (Germany) and Polytechnic Institute of Coimbra (Portugal). His current research interests include Natural Languages Processing, Machine Learning and Deep Learning. He is author or co-author of more than 50+ papers published in peer-reviewed international Journals and Conferences and a frequent reviewer for international journals, conferences and R&D projects.

Researcher Profile: Malek Naski

Malek Naski is currently a summer intern at iCompass. She will graduate in June 2021 as a software engineer from the national school of engineering of Tunis (ENIT). Previously, she did her academic end-of-year project for the year 2019/2020 at iCompass, working on sentiment analysis and classification for the tunisian dialect using state-of-the-art NLP methods and technology. She is now focusing on natural language processing and natural language understanding and her current research interests include sentiment analysis and conversational agents.

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

 

Language profile: Twi

Building a database for Twi language in Africa
Building a database for Twi language in Africa

Overview

Twi is arguably the most recognizable Akan Language natively spoken in parts of southern and central Ghana, as well as parts of Cote d’Ivoire. By some estimates it has approximately 20 million native speakers [1]. It is a tonal language. It comprises at least four distinct dialects, namely Asante, Akuapem, Fante and Bono. Asante is arguably the most widely spoken and common dialect.

Pertinence

In practice, knowing this language alone allows one to navigate most parts of Ghana. You are likely to find someone who at the very least understands the language in every part of Ghana.

Example Sentences

English Twi
What is going on here? Ɛdeɛn na  ɛrekɔ so wɔ ha?
Wake up Sɔre
She comes here every Friday Ɔba ha Fiada biara
Learn to be wise Sua nyansa

Prior Work

The website and app Kasahorow [2] has a rather limited set of translations. The JW300 dataset [3] has some (over ½ million) extremely noisy English to (Akuapem) Twi parallel translation sentence pairs. A noisy Wikipedia is available [4], but the volume and quality leave much to be desired [4]. Some 700 sentence pairs are available in the TC Akan Corpus [9].

A recent study [5], which investigated the quality of these data sources in the context of FastText embeddings constructed on Twi, found them to be woefully insufficient. It is the only modern computing study of Twi that we are aware of. We have since replicated and slightly improved these FastText embeddings [6], trained and shared a variety of embeddings from the Transformers/BERT family through the HuggingFace model repo [7] and crowd sourced close to 1000 manually curated translation pairs. We have also developed a fairly decent English-Twi translator (transformer-based seq2seq model) which we are hoping to refine on the data that this collaboration yields. You can find more information on our official and github pages [8].

Researcher Profile: Paul Azunre

Paul Azunre holds a PhD in Computer Science from MIT and has served as a Principal Investigator on several DARPA research programs. He founded Algorine, a Research Lab dedicated to advancing AI/ML and identifying scenarios where they can have a significant social impact. Paul also co-founded NLP Ghana, an open source initiative focused on using NLP and Transfer Learning with Ghanaian and other low-resource languages. He frequently contributes to peer-reviewed journals and has served as a program committee member at some ICML workshops in AutoML and NLP. He is the author of the “Transfer Learning for NLP” book recently published by Manning Publications.

Researcher Profile: Lawrence Adu-Gyamfi

A subsea installation engineer by profession with a background in Aerospace engineering. Currently devoting the rest of my off-work time to contributing to the activities of NLP Ghana, assisting with the collection of data, preprocessing them and making them ready for use in the models we are testing internally. Serving as the NLP Ghana Director of Product, overseeing how the different teams of NLP Ghana work together.

Researcher Profile:Esther Appiah

Esther Appiah holds a BA in Modern Languages from the Kwame Nkrumah University of Science and Technology with a Diploma in French Studies from the Université D’Abomey Calavi, Centre Beninois des Langues Étrangères (CEBELAE) in Benin. She is currently pursuing an MPhil in Theoretical Linguistics at UiT, Norway. Her language specialties include French, English and Akan. She has a vast experience spanning various sectors/industries on language use and interface with core tasks on writing, proofreading, translation and researching. She works with the Ghana NLP as a data researcher and ultimately hopes to specialise in Computational Linguistics to help streamline NLP processes in underrepresented African languages in the digital space.

Researcher Profile: Felix Akwerh

Felix is currently enrolled in a Masters program in Computer Science at the Kwame Nkrumah University of Science and Technology. He augments his education with online classes and Machine Learning events. He is actively involved in the development of natural language processing  with Ghana NLP. He co-authored a paper on Artificial Intelligence in Construction for submission. He holds a Bsc in Mathematics at the Kwame Nkrumah University. He worked with the UITS-KNUST where he helped build a transport system and other  software projects. His research interest lies in Machine Learning and NLP, specifically in neural conversational models.

Researcher Profile: Salomey Osei

Salomey holds a Master of Philosophy in Applied Mathematics and an Msc in both Industrial Mathematics and Machine Intelligence. She is a recipient of Google and Facebook Scholarship, MasterCard Foundation Scholarship amongst others. She is the team lead for unsupervised methods for Ghana NLP and a co organizer for Women in Machine Learning and Data Science Accra chapter (WiMLDS). She is also passionate about mentoring students, especially females in STEM and her long term goal is to share her knowledge with others by lecturing.

Researcher Profile: Samuel Owusu

Samuel Owusu is currently working as a data scientist for the Ministry of Finance, Ghana. He holds a BSc in Information Technology from Ghana Technology University College. He was a team member of the group that won 1st prize of Ghana’s maiden national hackathon organised by the World Bank and Ministry of Water Resources and Sanitation. His Research interest lies in NLP – Automatic Speech Recognition for low resourced languages. He is involved in developing open source curriculums in Machine Learning and Computer Science for young girls. Samuel is a life-long learner.

Researcher Profile: Cynthia Amoaba

Cynthia Amoaba is a high school graduate from Chemu Senior High School and a student at the University For Development Studies. She’s an Ambassador and founder of the first Women In Stem (WiSTEM) chapter in Ghana.She also founded the STEM club in her high school and looks forward to extending it to schools in deprived areas. Currently,  she tutors high school students in her community in Physics and Maths and helps train school dropouts in beads and soap making. She’s a science enthusiast and looks forward to learning more through her involvement in the development of NLP with Ghana-NLP.

Researcher Profile: Salomey Afua Add

Salomey Afua Addo is the founder of  Lighted Hope, a Non Governmental  Organization that seeks to promote literacy and coding skills among children living in slums in Ghana. She holds an MSc in Mathematical Sciences from the African Institute for Mathematical Sciences and a certificate in business management from the European School of Management and Technology, Berlin. She is the coding instructor for The Love Academy in the USA. Currently, she serves as a volunteer at Ghana NLP, and she plays a vital role in collecting and preprocessing data for the data team  at Ghana NLP. Salomey Afua Addo lives a purpose driven life.

 Researcher Profile: Edwin Buabeng-Munkoh

Edwin Buabeng-Munkoh is currently working as a Software Engineer at Huawei Technologies Ghana Limited. He holds a BSC in Computer Engineering from Kwame Nkrumah University of Science and Technology. He is enrolled in the Data Science Mentorship program with Notitia AI. He is actively involved in the development of natural language processing with GhanaNLP. He serves as a volunteer at Ghana NLP where he helps with preprocessing data for the data team. Along with his daily work he has enrolled and completed multiple online courses on Data Science, AI and NLP. His research interest lies in Machine Learning, NLP and Computer Vision. He plans to help build a world where language is not a barrier in education and good healthcare

Researcher Profile:Nana Boateng

Nana Boateng holds a PhD. in Statistics from The University of Memphis. He  has  three masters degrees in Statistics, Mathematics and Economics. He  has worked as a Data Scientist for Companies such as Fiat Chrysler Automobiles, Nice Systems Inc and Baptist Memorial Hospital. He is interested in application of mathematics, statistics and economics principles  in solving problems in healthcare, finance and several other industries. He has several peer-reviewed publications to his name. He is the founder of Rest Analytics which advises companies on how to apply machine learning  to increase efficiency and productivity. He contributes to GhanaNLP in the area of supervised learning.

Partners

Partners in Cracking the Language Barrier for a Multilingual Africa
Partners in Cracking the Language Barrier for a Multilingual Africa

References

[1] https://en.wikipedia.org/wiki/Twi

[2] https://www.kasahorow.org/

[3] Z. Agic et. al., JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages, ACL Proceedings

[4] https://ak.wikipedia.org/

[5] J. Alibi et al., Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi, LREC Proceedings 2020

[6] https://medium.com/swlh/ghana-nlp-computational-mapping-of-ghanaian-languages-edf60c56bcce

[7] https://huggingface.co/Ghana-NLP

[8] https://ghananlp.github.io/

[9] https://www.researchgate.net/publication/323998547_TypeCraft_Akan_Corpus_Release_10

Disclaimer

The designations employed and the presentation of material on these map do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or any area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Final boundary between the Republic of Sudan and the Republic of South Sudan has not yet been determined. Final status of the Abyei area is not yet determined.

Language profile: Luganda

Building a database for Luganda language in Africa
Building a database for Luganda language in Africa

Overview

The Ganda language or Luganda is a Bantu language spoken in the African Great Lakes region. It is one of the major languages in Uganda, spoken by more than eight million Baganda and other people principally in central Uganda, including the capital Kampala of Uganda. It belongs to the Bantu branch of the Niger-Congo language family. Typologically, it is a highly-agglutinating, tonal language with subject-verb-object, word order, and nominative-accusative morphosyntactic alignment [1].

With about six million first-language speakers in the Buganda region and a million others fluent elsewhere, it is the most widely spoken Ugandan language. As a second language, it follows English and precedes Swahili [1].

The language is used in all domains: education, media, telecommunication, trade, entertainment and in religious centres [2]. It has been used at lower institutions as pupils begin to learn English and until the 1960s, Luganda was also the official language of instruction in primary schools in Eastern Uganda [1]. It is also among the languages that have been tabled in the East African parliament to be selected as the official language for the East Africa Community [2].

Existing Work

As the use of the Luganda language is drastically growing across the different sectors from formal to informal, there has been work done on building a Luganda corpus and developing NLP models such as a  Luganda text to speech machine [3]; English noun phrase to Luganda translator [4], smart Luganda language translator – given a source text in English it translates it to Luganda automatically [5]. To broaden access to search, a Luganda interface was launched for Google web search [6]. However, some of these applications have been developed based on minimal data.

In terms of language resources, there exists the Luganda Bible [7] which is an online Bible from the Word Project [13] and other religious books from the Jehovah’s Witnesses [12]. There exist some good online Luganda dictionaries like the globe dictionary [10], learn Luganda [9], Luganda phrasebook [9], learn Luganda concise [11] dictionary and Luganda Dictionary [8]. However, most of the available dictionaries are copyrighted and contain just a few word extracts in the language which results in a small representation for a language like Luganda where new words are being created and spoken every year.

Quite recently, a drive has been made by a team of researchers from the Makerere AI Lab [15] to add Luganda to the Common voice platform [14] and it is anticipated that through this project, a large voice dataset for building voice recognition models for Luganda will be generated.

Example of Sentence in Luganda

Luganda: Aboomukyalo bafuna nnyo mu by’obulimi.

English: The people in rural areas benefit a lot from agriculture.

Conclusion

With the regional integration of the East African Community in place, the use of the Luganda language has stretched boundaries from Uganda to the East African community, because most of the native speakers of this language are actively participating in this cooperation. It has been used to support inter-ethnic communication [2].  This, however, stretches beyond East Africa on the other hand.

Therefore, there is a need to build a robust Luganda dataset, which can be made publicly available so that different researchers can use it to build downstream applications such as machine translators, speech recognition machines, chatbots, virtual assistants, sentiment analytic models, in ensuring that information is accessible to all and also addressing some of the local-contextual problems with in the society.

Researcher Profile: Joyce Nakatumba-Nabende

Joyce Nakatumba-Nabende is a lecturer in the Department of Computer Science in Makerere University. She is also the head of the Makerere Artificial Intelligence and Data Science Lab in the College of Computing and Information Sciences. She obtained a PhD in Computer Science from Eindhoven University of Technology, The Netherlands. Her current research interests include Natural Languages Processing, Machine Learning, and Process Mining and Business Process Management. She is co-author of more than 20+ papers published in peer-reviewed international journals and conferences. She has supervised several PhD and Masters students in the field of Computer Science and Information Systems. She is a member of several international AI bodies that include Open for Good Alliance, Feministic AI Network and UN Expert Group Recommendation 3C Group on Artificial Intelligence.

Researcher Profile: Andrew Katumba

Andrew Katumba is a Lecturer in the Department of Electrical and Computer Engineering as well as a senior researcher with netLabs!UG, a research Center of Excellence in  Telecommunications and Networking  both in the College of Engineering, Design, Art & Technology (CEDAT), Makerere University. Andrew champions the research and applied Artificial Intelligence (AI) activities at netLabs!UG as the lead for the Marconi Society Machine Learning Lab. Andrew holds a PhD in Photonics and Machine Learning from the Gent University, Belgium.  He has co-authored 50+ publications in peer-reviewed international journals and conferences and holds 2 patents in neuromorphic computing.

Researcher Profile: Jonathan Mukiibi

Jonathan Mukiibi is a computer science practitioner with a background in software engineering,linguistics, machine learning, big data and natural language processing. Over the past years he has been involved in artificial intelligence based projects like satellite image analysis, radio mining, social media mining, ambulance tracking and traffic which have been successfully implemented to solve real world problems in developing communities. Currently, he is pursuing a Masters in Computer Science at Makerere University where he is also doing research work at the AI and Data Science Research Lab. He is actively working on different NLP tasks but majorly doing research in end-to-end topic classification models for crop pests and disease surveillance from radio recordings.

Researcher Profile: Claire Babirye

Claire Babirye is a computer science professional with vast experience in different computing modules: from computer networks, computer security, network monitoring to machine learning, data science, natural language processing, deep learning technologies and use of technology for improved service delivery. She is a Research Assistant at the AI and Data Science Research Lab Makerere University and her role is to: tap into the revolution to obtain more and better data so as to support development work and humanitarian; support data analytics and visualization to generate patterns on insights on the data;  develop machine learning models for classification. Within the domain of NLP, she has worked on tasks that involve: sentiment analysis on social media data and text classification to identify topics of interest from the farmer agricultural data.

References

  1. https://en.wikipedia.org/wiki/Luganda
  2. https://www.open.edu/openlearn/languages/more-languages/linguistics/english-squeezing-out-local-languages-uganda
  3. Nandutu, I., & Mwebaze, E. (2020). Luganda Text-to-Speech Machine. arXiv preprint arXiv:2005.05447.
  4. https://www.researchgate.net/publication/338036914_Model_for_Translation_of_English_Language_Noun_Phrases_to_Luganda
  5. https://www.researchgate.net/publication/323682143_Smart_Luganda_Language_Translator
  6. https://africa.googleblog.com/2009/07/how-volunteer-translators-impact-local.html
  7. https://play.google.com/store/apps/details?id=com.LugandaBible&hl=en
  8. https://web.archive.org/web/20080120211744/http://www.cbold.ddl.ish-lyon.cnrs.fr/CBOLD_Lexicons/Ganda.Snoxall1967/Text/Ganda.Snoxall1967.txt
  9. https://learn-luganda.com/
  10. https://glosbe.com/lg/en/
  11. https://learnluganda.com/concise
  12. https://wol.jw.org/lg/wol/h/r138/lp-lu
  13. https://www.wordproject.org/bibles/lug/index.htm
  14. https://commonvoice.mozilla.org/lg
  15. http://www.air.ug/natural-language-processing/

 

Foreword

Gender Equality is the 5th United Nations Sustainable Development Goal (SDG). As with other SDGs, Artificial Intelligence can play a role in promoting good practices, or, to the contrary, can enhance the existing biases and prejudices. A recent workshop at IJCAI, in Macao, made the case for a number of projects relating SDGs and Artificial Intelligence. In order to push forward the questions relating AI and gender equality, the Knowledge for All foundation, the Centre de Recherches Interdisciplinaire de Paris and the Unesco Chair on OER at Université de Nantes jointly organized this one day workshop. The workshop was built around sessions on the different aspects of the question. We were glad to give a special status to our keynote speaker, Bhavani Rao, from Amrita University, Director of the Ammachi and holder of the Unesco Chair on « Women’s Empowerment and Gender Equality ».

The questions identified with the help of our program committee were the following:

  • Bias issues: typically, AI will reproduce the bias in the data. If the data contains a prejudice, the decision making based on AI can reproduce (and sometimes enhance) that prejudice.
  • Gender issues in AI projects. Is it a good idea to add a “gender officer” to an AI project? Someone who can look out so that prejudice doesn’t creep in?
  • AI for Education: how educating women can make special sense? What do we need to look out for?

But, as the following workshop notes show, the discussion allowed us to reflect upon many different aspects too.

Colin de la Higuera

UNESCO Chair on Technologies for training teachers with open educational resources

Université de Nantes

Download the report here or scroll down to read


About the workshop

The workshop took place at the Centre de Recherches Interdisciplinaires at the CRI – Learning center extension from 10am to 5pm and was advertised through the websites of the different partners organizing the event (UNESCO Chair at Université de Nantes; Knowledge for All Foundation and CRI). It was advertised online.

This meeting built upon work done by a number of partners, concerning gender issues in teaching computing, fair representation of women by AI and more broadly the impact of AI on the United Nations Sustainable Development Goals (SDGs).

These questions correspond to the 5th SDG and it is already known that AI can both increase the effect of bias or correct it, depending on how it is deployed.

Colin de la Higuera, Université de Nantes, Unesco Chair in Teacher Training Technologies with Open Education Resources set the scene, explained how this workshop was linked with the previous workshop organized by the Knowledge for All foundation in Macau in July 2019. He acknowledged the help of the CRI, the Unesco, Université de Nantes, the Knowledge for All Foundation and the Société informatique de France in organizing the event.

Bhavani Rao, from Amrita University, Director of the Ammachi Labs (a computer human interaction lab) and holder of the Unesco Chair on « Women’s Empowerment and Gender Equality » presented the initiatives led in India and the spirit of the work done around these questions in India, also involving Human Computer Interfaces and Artificial Intelligence. She explained how they have used artificial intelligence to map the various factors contributing to women’s vulnerability across India and to identify “hot spots” and “cold spots” for women’s empowerment. These identified locations take into account more than 250 quantitative data measurements in combination with qualitative data to represent a comprehensive understanding of the state of empowerment at that location. Bhavani Rao emphasised the need to track and monitor the progression of the women involved in Ammachi Labs’ (or any, for that matter) vocational training programmes and evaluate the impact it has on their community. Furthermore, she advocated in favour of a holistic approach and warned against initiatives that are just aimed at solving isolated issues, as often there is unintended fallout that negatively impacts both women and their communities.

John Shawe-Taylor, Unesco Chair in Artificial Intelligence at University College London, presented the different interventions that have been implemented at UCL toward gender equality in a Computer science department. These can be summarized as the 4 As: 1) Arrive: encouraging girls to study computer science, 2) Aspire: creating a supportive environment, 3) Achieve: ensuring they realise their full potential, 4) Advance: ensuring equal opportunities for career progression. The talk also highlighted a number of ways in which AI enabled systems might further accelerate the effectiveness of these interventions.

Wendy MacKay, INRIA National Institute for Research in Computer Science and Automation, Situ Ex talked about her own experience as a woman and a researcher. She also insisted on the importance of a user oriented approach: keeping the user in the loop at the different times of the development of an AI project could help humans develop and learn alongside AI.

Prateek Sibal, co-author of Unesco publication “Steering AI for Knowledge Societies” highlighted that while technological artefact may be neutral the culture and the practices associated with its use are biased against women. He discussed how different AI based tools including facial recognition and digital voice assistants mirror biases in society. For instance, several labelled image data sets used for training facial recognition programmes have high error rates in recognising gender of dark skinned women as compared to men. He pointed out that deep fakes based on Generative Adversarial Networks (GANs) are overwhelmingly used to target female celebrities by creating pornographic videos. He raised concerns around ‘technological determinism’ and advocated for an approach to the development and use of AI that is human centred and is anchored in human rights and ethics. He demonstrated how some instances of use of facial recognition technology are violative of human rights violations, can have life threatening consequences for people with diverse sexual orientations and gender identities. Vigilance by researchers, civil society and governments is enabling detection of bias in AI systems, this presents an opportunity to influence the culture of technology by developing artefacts that are gender equal, that respect diversity or even obliterate gender differences as was demonstrated with the example of gender neutral digital voice assistants.

A discussion with the room followed. Some of the ideas expressed during the debate were:

  • A goal is to design interventions and avoid undesired side effects: ideally one might need a simulator? even better, a causal model? Can we consider randomized controlled trials?
  • What if we improve women’s life in an otherwise unchanged world? This can turn for the bad. This being the key point made during Bhavani Rao’s talk.

Michèle Sebag, from CNRS (Centre national de la recherche scientifique) and Univ. Paris-Saclay discussed some thoughts about biases, glass ceilings, and how to handle them. Even after a 1st glass ceiling has been overcome (e.g. for women in selective engineering schools), biases remain as to the choice of options, with an impact on careers, money, etc. Even more puzzling, the wording of an option makes a significant difference regarding women’s choice (e.g. Energy for XXIst century vs Challenges of Sustainable Development) despite their technical content being 95% the same: the bias is in the language (footnote: nothing unexpected, see Bourdieu). As both genders might be using two different languages, a debiasing action might be to build translators; and/or display the gendered versions conveying the same content. This would also be fun! which is an important part of effective actions. [Using AI to reinforce self-confidence is another great perspective; note however that undermining the self-confidence feeds a multi-billion dollar industry].

Frédérique Krupa, Human Machine Design Lab, presented her own trajectory in the field and how, for her PhD on Girl Games: Gender Design and Technology, she studied belief systems as the principal influence amongst numerous factors in encouraging boys and discouraging girls to be interested in technology and pursue a career in ICT. The family factor is still today determining things far too strongly: through early choices by the parents (or the family environment) little girls were being deprived of the exciting activities and only getting access to less interesting, less challenging, less time consuming technological experiences. She followed up on her machine learning postdoc at 42, noticing the absence of interest on the quality, accuracy and representativity of data amongst homogenous teams of coder, mostly male, white, straight and from upper social classes, so that they do not consider these questions in their quest for optimal performance and chances of publishing – because they are not likely to suffer from bias. The issue of data quality is about having contextual information available to determine what bias may be present in the data and/or its resulting model. She calls for the development of AI UX practices, developed from quantitative social science methods.

A discussion with the room followed with some points:

  • Detecting known biases is a hot topic in AI (gender, race, wealth, sexual orientation…). But what are the unknown biases? Building experiments to provide evidence for biases: defines a challenge to be tackled with psychologists, neurosciences, MLers;
  • Another topic is ethical recommendation; to de-bias the recommendation one should have an idea of the targeted ultimate fair distribution. This is a normative issue: we need (on the top of all others) lawyers, politicians, citizens, …, sci-fi writers,…

Sophie Touzé, VetAgro Sup-Université de Lyon, and past President of the Open Education Consortium presented an original approach and offbeat vision of AI and the warning role it represents. The AI forces us to look at the skills unique to humanity, our added value in relation to the intelligence of machines. By challenging us, it provokes change and the saving awareness to know what we need to teach at school and university.

She insisted on what skills are essential. The 4 Cs are Collaboration, Communication, Creativity and Critical thinking and the 3 Ss are self-awareness, self-motivation and self-regulation. Unfortunately, these skills are not taught at school nor university. An app could be developed to help individuals monitor and develop these critical skills throughout their lives.

Empowered by these skills, each citizen of the world could participate to forge consciously the future we want no more as individual but as the human species.  The narrative of humanity should not be left in the hands of a few people that present to us as heroes. It’s time for women to participate writing humanity’s epic story together!

Sophie Touzé concludes with “We are the heroes we’ve been waiting for”.

Mitja Jermol, Unesco Chair on Open Technologies for OER and Open Learning, used his experience in AI based education projects to present what an education to AI could be. He made the point that there are 3 issues here: 1. Developing AI, 2. Using AI and 3. Interacting with AI. Most discussions today are related to increasing the knowhow in developing AI which includes two very specific domains namely mathematics and software programming. The fact is that AI has become mature enough to be taught to other domains as tolls to be used. This is why the education should be concerned with the 2 last ones. Like Sophie Touzé he insists on the importance of soft skills. He also describes some projects related to the question in which he is involved, such as the X5-GON project. Opening education, free and inclusive access to all through a global open education could be a strong mechanism to empower not just women but any individual in the world. AI plays a major role in this by understanding the complex system of materials, competences, infrastructure and the needs of particular individual.

Conclusion

As a conclusion, it was reflected that these questions should be further discussed. Colin de la Higuera believed that there were 2 different issues which had been at the core of the discussions of the day. 1. The issue of gender equality which is just as present in the field of AI as in other fields: female researchers are finding it difficult to emerge and only those strong, or –as Frédérique Krupa remarked- who don’t follow the rules, will make it. Yet everyone agrees that a more equal representation in the field is necessary. 2. The second issue is the effects of AI itself towards gender (in)equality, women vulnerability or women empowerment.

Actions to follow are to push the findings of this workshop forward in Unesco and elsewhere. Furthermore, the Knowledge for All Foundation will also build upon these discussions.

A report delivered by K4A trustees for UNESCO reporting on Artificial intelligence (AI) which is taking an important part in our lives, the question of educating towards AI becomes increasingly relevant. We argue in this document that although it may be premature to teach AI, we recommend an education to five pillars or core questions which should be of great use in the future:

  1. Data awareness or the capacity of building, manipulating, visualising large amounts of data;
  2. Understanding randomness and accepting uncertainty or the ability to live in a world where models cease to be deterministic
  3. Coding and computational thinking or the skills allowing each to create with code and to solve problems through algorithms
  4. Critical thinking as adapted to the digital society and finally
  5. A series of questions amounting to understand our own humanity in view of the changes AI induces.

Download the report here or scroll down to read


Introduction

Artificial intelligence has been described as the new electricity. As such, the belief that it will have a profound influence over many fields, of which Education, is widely shared. For instance, in its 2018 report on artificial intelligence, the French committee chaired by Cedric Villani [25] presented Transforming Education as the first “focus”. In [12], hundreds of applications of AI have been scrutinised and mapped to the relevant technologies.

More recently, the JRC report The Impact of Artificial Intelligence on Learning, Teaching, and Education, by Ikka Tuomi et al. [20], considered the different aspects of the questions relating artificial intelligence and learning. And more generally, the question of transforming education with the help of technology is addressed by the Sustainable Development Goal 4 adopted by the United Nations in September 2015, and also by the OECD [13].

In this report we study the different interactions between AI and Education with an emphasis on the following question: If we accept that artificial intelligence is an important element in tomorrow’s landscape, what are the skills and competences which should appear in the future curricula and how can we help to train the teachers so that they can play the required role?

This report is one of the first addressing these questions: as such it is less built as a synthesis of existing reports with an increment from previous works than as an analysis based on the experience of teachers, researchers, academics and practitioners. A recent exception is the work by the UNESCO itself who has been exploring the links between AI and education [15].

What is AI? Why is the issue of general interest?

The history of artificial intelligence goes back to the history of computing. Alan Turing was interested very early in the topic of machine intelligence [21]: some of the ideas he introduced 70 years ago are still extremely relevant today; he argued in favour of randomness and discussed the implications of machine learning to society. Even if Turing didn’t predict the importance of data, he did understand the machine’s capacity of learning would be key to machine intelligence.

Another of Turing’s contributions to artificial intelligence is what became known as the Turing test: in this test an external (human) examiner has the capacity of interaction with both a machine and a human, but the interface being mechanical, he will have to examine the answers to the interactions rather than their form.

The examiner’s goal is to distinguish man from machine; the goal of the artificial intelligence is to confuse the examiner. This leads to the very general definition of artificial intelligence still in use today where it is less about a machine being intelligent than about a machine being able to convince the humans that it is intelligent.

The official birth of artificial intelligence is usually associated with the Dartmouth Summer Research Project on Artificial Intelligence: in 1956 researchers met in Dartmouth College to address the difficult questions for which computing failed to contribute [11].

Today, because of the impact of Machine Learning, and most notably of Deep Learning, alternative definitions for artificial intelligence have been considered: a more business oriented view is that AI matches these deep learning techniques which have a strong impact on industry [12, 30].

Being able to pass the Turing test is no longer the shared goal of research and would not explain the impact of AI today. Today’s successes of AI depend on several factors including machines tailored to the needs of the algorithms and the massive increase in quantity and quality of data. Machine learning techniques work today much better than 10 years ago.

They build better models, make less errors in prediction, they can make good use of the huge volumes of data, are able to generate new realistic data, and are being tuned and adapted to an increasing variety of tasks. As such, these algorithms are no longer aiming at tricking the human in believing that they are intelligent; they are actually replacing (in part) the human in one of her more intelligent tasks: that of building algorithms.

If computing is about algorithms and data, modern AI is a data science: it relies on being able to handle and make the most from data. Whereas the natural trend for computing was to build algorithms to handle data, me may argue that artificial intelligence is about data building the algorithms that build algorithms.

Why understanding AI matters

Artificial intelligence is influencing all parts of society where data can be made available and where there is room for improvement, either by automatising or by inventing new challenges and needs. In substance, this means that every human activity is being impacted or can be impacted.

For instance, all 17 United Nations sustainable development goals (SDG) are currently being scrutinised by AI experts [8]. The use of AI can lead to complex new situations, which can only be understood through an actual understanding of the technical and conceptual aspects underlying it. In many cases our physical understanding of the world is insufficient to gauge the impact or even the opportunity for AI.

When we read for i= 1 to 1000000 intuition is of little use: no human does anything 1 000 000 times in a lifetime! The mathematical world and its full abstraction doesn’t give an adequate answer either. People may imagine artificial intelligence as a process by which a machine does things the same way as we (humans) do, only faster and with more memory, storage space or computation power. But AI doesn’t always work that way. The algorithms will not follow the patterns from our physical world and an understanding of what they do will not give us a realistic idea of why they work and why they don’t.

When it comes to training teachers, that leaves us with two approaches: the first one supposes the teachers should be able to actually build simple AI systems: they should know how to code and be able to assemble blocks in order to obtain more complex systems, run artificial intelligence algorithms, build models and use them. The second approach supposes people do not learn how to design but only how to interpret and use. They will then necessarily interpret things through their very limited own physical world values.

AI and Education

The links between AI and Education are not new. They have worked in both directions, but one of these has received, up to today, much more attention than the other [15].

AI for Education

The first conference on Artificial and Education dates back to more than 20 years ago: the challenges have since been wide ranging and are now addressed by strong multidisciplinary communities. Research projects have been funded by the European Union, Foundations and individual countries. The goal is to make use of artificial intelligence to support education. An emerging industry has developed, covered by the name Edutech (which isn’t strictly AI) and the question has been studied in a number of reports: [23, 15, 25, 20].

Education for AI

Whereas the question of educating everyone to artificial intelligence is new, the one of training experts for industry has been dealt with for some time. Artificial intelligence has been taught in Universities around the world for more than 30 years. In most cases these topics require some strong foundational knowledge in computer science and in mathematics.

The increasing importance of data science, artificial intelligence and machine learning is currently leading to a modification of the computing curricula [16]. Education to artificial intelligence, prior to University, is in 2019 a whole new question. If it has not appeared before, this can be due to several reasons:

  • The need is novel: if AI has been around for some time, applications for everyday life have been limited. Today, through mobile phones or ambient companions, interaction with AI occurs in a routine way, at least in the more industrialised countries [12].
  • The issues surrounding education as a global question are huge and AI may not be perceived as essential [14], even when the goal is that of studying education in a digital setting [22]. The efforts made in many countries to introduce an Information and Communication Technologies curriculum have not even been able to produce yet results, and, with AI, it seems we are asking for more today [24]!
  • It is difficult to introduce the topic without trained teachers. As an example, in 2018, the French government decided to add a new computing curriculum at high school level (16-18 year old students) with the ambition of introducing artificial intelligence. This was impossible due to the numerous resistances.
  • It is still unclear what should be taught if we consider a teaching which should have some form of validity over the years. What will artificial intelligence be in 20 years, or even 10 or only 5?

Why should we train teachers in AI?

AI applications are going to be present in all areas [12]. As such, one could just add the usage of the applications to the individual skills required by a teacher. But this would probably limit in many ways the capacity of the teachers to adapt to new applications. One can also state that the children are going to be brought up in a world where artificial intelligence will be ambient and can therefore be called AI natives [26]. Should a teacher just know about the key ideas of AI? Or should she be more aware of other questions?

Is teaching AI an enhanced version of teaching computational thinking?

Teaching coding and computational thinking has been advocated strongly since 2012 [28, 7]. Many countries have now introduced the topic. Learning to code isn’t just about acquiring a technical skill: it is about being able to test one’s own ideas instead of just being able to run those of someone else. And of course, there is a strong dose of ideas in artificial intelligence, which means that both knowing how to code can help one use AI in a creative way and also understand the underlying questions and concepts. AI is an extension of computing. It comprises it but also introduces some new ideas and concepts.

Why should a teacher be AI aware?

Many of the reasons for training the teachers to some understanding of AI are very close to those advocated to prepare them to digital skills [24]. Whereas a first goal is to make sure that teachers are digitally aware, and this goal is not reached yet, how important/urgent is it to be AI aware? Why would that be necessary? Let us discuss some reasons.

The role toward community. If artificial intelligence is to impact every aspect of society, as many are predicting, citizens and future citizens are going to require guiding, some help to understand, decrypt these technologies.

In order to train skilled learners. An aspect put forward by many analysts is the impact of artificial intelligence on jobs. The more optimistic analyses point out that where many jobs will be lost to robotisation, new jobs will emerge.

And even if these jobs seem to require soft skills, it would seem reasonable that many will be linked directly or not with the technical aspects of AI. If in 2019 it seems neither relevant nor possible to train every child to AI, it would on the other hand be necessary that each child be given the principles and bases allowing her to adapt and learn at a later stage.

Because the learning environment is changing and will include AI. Intelligent tutoring systems, tools which will allow to propose individualised learning experiences, tailored companions… are some of the projects under way which will necessarily lead to situations where the learner is helped. Understanding these tools will be an asset, if not a necessity.

Because AI is a valuable tool for teaching. AI is used today to help the teacher. For example, project X5-GON recommends open educational resources adapted to the needs of a particular teacher [29]. In the same way as today a teacher is penalised by not being able to make use of the digital tools available, tomorrow’s teacher may lose out if she cannot access AI tools in a simple way.

Towards a curriculum. The proposed five pillars.

Artificial intelligence has not reached its maturity. The topic as it was defined in 1956, studied during 40 years, reached its spectacular results since 2012 is still difficult to understand. It is even more difficult to forecast how the technologies will evolve, even in a close future. If building a full curriculum is beyond the reach of this document, it is possible to put forward five pillars and propose to build upon these. We represent our proposal in Figure 1: we believe five pillars or core questions should be added to the training of teachers (lower part of the figure) and on these, in due time, AI would be able to be taught (top of figure).

Figure 1: The AI competences

Uncertainty and Randomness

Data is inconsistent. It does not demonstrate a strict causal nature. With data, a same cause can lead to different effects. Dealing with this legitimate non determinism in the modelled world, which is going to be used for AI based decision making, requires the acquisition of alternative skills. Probabilistic reasoning and statistics will need to be taught, but before that, activities allowing children to understand the stochastic nature of most modelled processes and those encouraging to make the best use of the imperfections of the data are necessary.

Yet AI both also means a new form of determinism which deserves our attention: when predictive systems are taken (too) seriously and we are told that our child aged 1 will develop into a scientist or have a complicated social life through a misuse of data, not understanding how these predictions work can cause a lot of damage.

An understanding that the forecasts proposed by AI are not ground truths, but estimations, and how these are to be interpreted is of great need. Teaching this may be complicated for didactic reasons: accepting uncertainty also means teaching without implying that the teacher is omni-scient and makes no mistakes.

Coding and Computational Thinking

Coding and computational thinking are today in the curricula of many countries, following the recommendations of experts [10, 18]. In many cases the AI code it is about using libraries for the programming languages which allow us to manipulate large amounts of data with very few instructions. But a proper usage of these techniques does involve some coding skills [19, 28, 6].

Furthermore, it has been argued that expert users of AI (for example the doctors) will need to understand the algorithms in order to know when not to trust the machine learning decision. Efforts have taken place in different countries to address this question and the related question of training the teachers [10]. In France, project Class’Code [5, 4] relies upon Open Educational Resources to allow teachers and educators to learn. Computers and robots are obvious artefacts, but an alternative approach is that of Computer Science Unplugged [2].

Data Awareness

AI is going to rely on data. Whereas the algorithm is at the centre in computing, this is much less the case with AI, where, often, most of the effort will concern the collection, preparation and organisation of the data [16]. An education to data (science) will rely on activities where data is collected and visualised, manipulated and analysed [15]. As a side effect, large amounts of data justify that algorithms get taught with more care as testing becomes much more complex.

Critical Thinking

Social sciences can and should contribute with many of the ethical questions AI raises. Critical thinking is one important aspect but it is essential that it relies on a real understanding of the way technology works.

A typical example: when attempting to detect fake news and information –a truly important question– it is often suggested that the answer consists in finding the primary source or in relying on common sense. This is a great 20th century answer but is of less use in many situations on the web from AI generated texts, images or videos. Today, the answer comes through a combination of understanding the digital literacy concepts and being able to make use of algorithms to build one’s convictions. In other words, the human alone is going to find it difficult to argue with a machine without the help of another machine.

Yet it would be just as much a mistake to only teach the technology without giving the means to understand the impacts, evaluate the risks and have a historical perception of media. In most reports on AI there is an agreement that an analysis of the ethical implications should be taken into account before deploying. Researchers from Media Sciences have worked on the question for some time [27] and should be encouraged to work with AI specialists.

Post AI Humanism

The previous 4 pillars can be matched to existing competences, skills and teaching profiles. The one we introduce now may be more difficult to fit in. The key idea is that the progress of AI is making us, as human beings, reconsider some ground truths. It is already known that our interaction with technology has an impact on non technological attitudes: for examples, teachers agree that the children’s use of the smartphone and the specific type of multitasking is introduces has an effect
on their capacity of studying, at least in the formal settings proposed by schools. With artificial intelligence, these changes may be even more formidable. We introduce this idea through four examples.

Truth

In 2017, system Libratus was built by researchers at Carnegie Mellon University [3]. This system beat some of the best poker players in the world. For this, the system used reinforcement learning to build its optimal policy. This includes the fact that it learnt to bluff -a necessary skill in poker without being explicitly trained to bluff. But Libratus did learn that the better strategy to win was to lie from time to time. In other words, the system was trained to win. And if this included bluffing –which is a socially acceptable form of lying– it did just that.

Experience

In 2016 system Alphago beat go champion Lee Sedol by four victories to one [17]. The system, like most till then, made ample use of libraries containing thousands of games played by experts: the machine built its victories on top of the human history and knowledge. A few months later, the new version called Alphago-zero was launched, beat Alphago by 100-0, and then was adapted to chess. The main difference was that Alphago-zero discarded all the human knowledge, just used the rules of the game and the capacity of the machine to learn by playing against itself. The question this raises is: do we need to build society upon its experience?

Creativity

This question is regularly posed. It can matter legally and intellectually. Today, artificial intelligence can compose music, write scripts, paint pictures, modify our photographs. Through artificial generation new artefacts can be created. It should be noted that in such cases where artificial intelligence is used for artistic creation it is most often reported that a human artist was part of the project. Whilst this may be in part true, it may only be that we need to be reassured.

Yet, again in the area of games, it is interesting to see specialists comment online the games played by the latest artificial intelligence programs. Whereas some years ago the “brute force” of the machine was put forward, this is much less the case today: the creative nature of the moves is admired by the human grand-masters. A question raised here is: can a machine create without feeling nor conscience? One answer is to say that the result is what matters: if we believe there is creation, do we need the feeling? [9].

Intelligence

Intelligence itself is being impacted by the progress of AI. Each time a progress is made and machine beats man at something which up to now was considered to be an activity requiring intelligence, experts invariably announce that the given activity didn’t really need intelligence. More and more, the goal seems to define intelligence in such as way as to make it unachievable by a machine.

Extending the model

The pillars described in the previous section must be understood as being able to support a larger framework of competences the teachers and learners will need to master in order to use and create AI systems (see Fig. 1). But they can also be seen as self contained 21st century skills which would allow them better to make use of the technologies introduced by and with artificial intelligence.

Linking with the UNESCO ICT-CFT

AI will have to be taught by teachers. There is a big difference between the countries in teacher training, for example over the basic ICT skills we can rely on to be able to install an AI teaching agenda. As a uniform starting point and framework we take the approach promoted by the UNESCO, namely the UNESCO ICT-Competency Framework for Teachers (ICT-CFT), and reflected in a series of evolving documents (over the past 10 years) [24].

In [24], the 6 aspects of a teacher’s work are scrutinised with respect to a goal of making use of ICT for better teaching:

  • A1 Understand ICT in Education: how ICT can help teachers better understand the education policies and align their classroom practices accordingly;
  • A2 Curriculum and assessment: how ICT can allow teachers to better understand these questions but also intervene and propose new modalities;
  • A3 Pedagogy: how the actual teaching itself can be positively impacted through the informed use of ICT;
  • A4 Application of Digital skills: how to make use of the new skills acquired by the learners to support higher -order thinking and problem solving skills;
  • A5 Organisation and administration: how to participate in the development of technology strategies at different levels;
  • A6 Teacher professional learning: how to use technology to interact with communities of teachers and share best practices.

Each aspect is then analysed, regarding the impact towards ICT, following three stages: technology literacy (.1), knowledge deepening (.2) and knowledge creation (.3). The report raises two questions:

  1. In what measure does the ICT-CFT offer a good framework for teachers to be trained to AI?
  2. In what measure would the ICT-CFT benefit from the impact of AI?
Figure 2: The alignment between pillars and aspects. full lines indicate an impact of an aspect on a pillar; dashed lines indicate a contribution of a pillar to an aspect. Double lines work both ways

How the ICT-CFT allows teachers to move on to AI

The ICT-CFT aims at allowing teachers to know how to use, in an experimented way, and to develop new ideas, learning material, curricula through ICT. As the different AI pillars presented in this paper all  rely on an understanding of how computers, algorithms, data works, the ICT-CFT will be an important stepping stone towards AI. Teachers who will be able to master the different tools, strategies and skills proposed in the ICT-CFT would be able to interact better with the questions raised by the proposed pillars. We represent, in Fig. 2, with a full line, the main contributions of the ICT-CFT aspects to the five pillars.

The impact of AI on the ICT-CFT

The arrival of new AI tools for education (like [29]) will probably help motivate better teachers to the usage of ICT: the advantages will become clearer and we can hope that the usage will be simplified. Typically, OER are today difficult to construct, to offer, to find. AI and related technologies should make them much more accessible, which would ensure their wider adoption.

On the other hand, a better understanding of the key questions raised by the proposed pillars will have a positive effect on the motivation of the  researchers to progress in levels across the different aspects proposed in the ICT-CFT.

For example, better understanding the social and ethical implications (Critical thinking and post AI humanism) would have a positive impact on the way teachers react and their motivations in training.

The ambitions proposed with the 5 pillars can also impact positively the ICT-CFT by requiring extra ambition: Coding and computational thinking are mentioned but not recommended, whereas for AI the proposal is that these are necessary skills. Learning to code would, on its own, render the ICT-CFT much more fulfillable.
We represent, in Fig. 2, with a dashed line, the main contributions of the five pillars to the ICT-CFT.

Artificial Intelligence and Open Educational Resources

ICT Teachers are very much the forerunners of sharing open educational resources (OER): as they naturally use the computer as an object of learning and a learning artefact, they have been very active in producing and sharing OER. This is also the case for AI: one can predict a great benefit for all.

Artificial intelligence is also working today at provided better tools to publish, share and access OER [29]. Therefore, the education of AI or towards AI proposed in this document should make ample use of OER.

Conclusion

We have presented in this preliminary report five competences or pillars which should be taking an increasing importance given the penetration of AI in society. Further work should follow, to better understand at what age and in what way the relevant concepts should be introduced, studied, mastered. around which we expect to explain how AI should be taught, both to teachers and to learners.

Acknowledgements

We would like to thank the following persons for their help and expertise: Neil Butcher, Victor Connes, Jaco Dutoit, Maria Fasli, Françoise Soulié Fogelman, Marko Grobelnik, James Hodson, Francois Jacquenet, Stefan Knerr, Bastien Masse, Luisa Mico, Jose Oncina, John Shawe-Taylor, Zeynep Varoglu, Emine Yilmaz.

References

[1] Fathers of the deep learning revolution receive ACM A.M. Turing award. News, 2019. https://www.acm.org/media-center/2019/march/turing-award-2018.

[2] T. Bell, I. H. Witten, M. Fellows, R. Adams, J. McKenzie, M. Powell, and S. Jarman. CS Unplugged. csunplugged.org, 2015.

[3] Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2017.

[4] Class’Code. Class’code, ses principes et leviers. Position paper, 2015. https://project. inria.fr/classcode/classcode-ses-valeurs/.

[5] Class’Code. Maaison : Maîtriser et accompagner l’apprentissage de l’informatique pour notre société numérique. Position paper, 2015. https://drive.google.com/drive/folders/0B42D-mwhUovqQ1RyOW01WUtyR1k.

[6] Informatics Europe. Informatics education: Europe cannot afford to miss the boat. Technical report, 2015. http://www.informatics-europe.org/images/documents/informatics-education-acm-ie.pdf.

[7] A. Fluck, M. Webb, M. Cox, C. Angeli, J. Malyn-Smith, J. Voogt, and J. Zagami. Arguing for computer science in the school curriculum. Educational Technology & Society, 19(3):38–46, 2016.

[8] Knowledge for All Foundation. IJCAI workshop on Artificial Intelligence and United Nations sustainable development goals. Workshop, 2019. https://www.k4all.org/event/ijcai19/.

[9] Yuval Noah Harari. Homo Deus: A Brief History of Tomorrow. Harvill Secker, 2016.

[10] L’Académie des Sciences. L’enseignement de l’informatique en France – il est urgent de ne plus attendre. Position document, 2013. http://www.academie-sciences.fr/fr/activite/rapport/rads_0513.pdf.

[11] John McCarthy, Marvin L. Minsky, Nathaniel Rochester, and Claude E. Shannon. A proposal for the dartmouth summer research project on artificial intelligence, august 31, 1955. AI magazine, 27(4):12–12, 2006.

[12] McKinsey. Notes from the AI frontier. insights from hundreds of use cases. Discussion paper, 2018. https://www.mckinsey.com/featured-insights/artificial-intelligence/notes-from-the-ai-frontier-applications-and-value-of-deep-learning.

[13] L. Nedelkoska and G. Quintini. Automation, skills use and training. Position document, 2018. Documents de travail de l’OCDE sur les questions sociales, l’emploi et les migrations, n 202, Éditions OCDE, Paris, https://doi.org/10.1787/2e2f4eea-en. https://www.oecd-ilibrary.org/employment/automation-skills-use-and-training_2e2f4eea-en.

[14] High Level Committee on Programmes. Towards a UN system-wide strategic approach for achieving inclusive, equitable and innovative education and learning for all. Report, 2019.

[15] UNESCO Education Sector. Artificial intelligence in education: Challenges and opportunities in sustainable development. Report, 2019.

[16] R. Benjamin Shapiro, Rebecca Fiebrink, and Peter Norvig. How machine learning impacts the undergraduate computing curriculum. Commun. ACM, 61(11):27–29, 2018.

[17] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–489, 2016.

[18] The Royal Society. After the reboot:computing education in uk schools. Position paper, 2017. https://royalsociety.org/-/media/policy/projects/computing-education/computing-education-report.pdf.

[19] P. Tchounikine. Initier les élèves à la pensée informatique et à la programmation avec scratch. Research paper, 2016. http://lig-membres.imag.fr/tchounikine/PenseeInformatiqueEcole.html.

[20] Ikka Tuomi. The impact of artificial intelligence on learning, teaching, and education. Submitted file, 2018. http://publications.jrc.ec.europa.eu/repository/bitstream/JRC113226/jrc113226_jrcb4_the_impact_of_artificial_intelligence_on_learning_
final_2.pdf.

[21] Alan Turing. Computing machinery and intelligence. Mind, Oxford University Press, 59(236):33–35, 1950.

[22] The NESCO/Netexplo Advisory Board (UNAB). Human learning in the digital era. Report, 2019.  https://unesdoc.unesco.org/ark:/48223/pf0000367761.locale=en.

[23] UNESCO. Unesco and sustainable development goals. Policy document, 2015. http://en. unesco.org/sdgs.

[24] UNESCO. Unesco ICT competency framework for teachers. Report, 2018. https://unesdoc.unesco.org/ark:/48223/pf0000265721.

[25] Cédric Villani. Donner un sens à l’intelligence artificielle. Position document, 2018. https: //www.aiforhumanity.fr/pdfs/9782111457089_Rapport_Villani_accessible.pdf.

[26] Randi Williams, Hae Won Park, and Cynthia Breazeal. A is for artificial intelligence: The impact of artificial intelligence activities on young children’s perceptions of robots. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK.ACM, New York, NY, USA. ACL, 2019. https://doi.org/10. 1145/3290605.3300677.

[27] Carolyn Wilson, Alton Grizzle, Ramon Tuazon, Kwame Akyempong, and Chi Kim Cheung. Media and information literacy curriculum for teachers. Report, 2014. https://unesdoc. unesco.org/ark:/48223/pf0000192971.locale=en.

[28] Jeannette M. Wing. Computational thinking. Communications of the ACM, 49(3):33–35, 2006.

[29] x5gon. Cross modal, cross cultural, cross lingual, cross domain, and cross site global oer network artificial intelligence and open educational resources. European project h2020, 2017. https://x5gon.org.

[30] Les échos. AI for business. Comment faire des entreprises françaises des championnes de l’IA ? Livre blanc, 2019. https://www.lesechos-events.fr/think-tank/ai-business/#fndtn-recommendations.

 

On 29 May 2018, the Government of Slovenia proposed to the Director-General the establishment of a Category 2 Centre on Artificial Intelligence (AI) under the auspices of UNESCO at Jožef Stefan Institute (JSI), Ljubljana, Slovenia.

The Knowledge 4 All Foundation was highly instrumental in achieving this success by supporting both sides.

Based thereon, UNESCO undertook the required feasibility study and assessed the proposed Centre’s scope, objectives, strategies and networking with other institutions from UNESCO’s vantage point.

The feasibility study also examined the available and promised human, material and financial resources for the Centre’s operations and sustainability. Further, it ascertained the commitments of both the Government of Slovenia and the Jožef Stefan Institute towards the proposed Category 2 Centre.

In order to allow full functional autonomy and as per the provisions set out in the revised integrated strategy on Category 2 Centre and institutes (37/C-18/PART_I), an International Research Centre on Artificial Intelligence (IRCAI) will be created by transferring the operations of the Centre for Knowledge Transfer in Information Technologies (CT3) within JSI.

JSI currently has more than 950 researchers, of which 180 cover areas within Artificial Intelligence, such as Machine Learning, Data-Mining, Text-Mining, Web-Mining, Multimedia Mining, Semantic Technologies, Social Network Analysis, Language Technologies, Natural Language Processing, Cross-lingual Technologies, Real-time Data Analysis, Data Visualization, Knowledge Management, Knowledge Reasoning, Inductive Logic Programming, Evolutionary Computation, Multistrategy learning and principles of multiple knowledge, among others.

IRCAI’s overall objective cover Research, Advocacy, Capacity Building and Dissemination of information about artificial intelligence and other advanced ICTs. The objectives of the Centre shall be to:

  • Conduct theoretical and applied research in the field of artificial intelligence and advanced ICTs;
  • Develop open solutions to help achieve sustainable development goals with specific focus on SDGs 4, 5, 8, 9, 10, 13, 16 and 17;
  • Provide policy support to help member states address the technical, legal, social and ethical challenges at the intersection of technology and policy;
  • Provide training for upstream and downstream capacity enhancement for artificial intelligence;
  • Encourage multi stakeholder participation and decision making in addressing the challenges raised by artificial intelligence;
  • Disseminate information and encourage literacy about artificial intelligence;
  • Promote measures for removal of gender bias in the development and deployment of artificial intelligence;
  • Facilitate north-north and north-south cooperation in the development of artificial intelligence with special emphasis on supporting the development of a vibrant artificial intelligence ecosystem in Africa.

IRCAI’s four objectives of research, capacity enhancement, advocacy and dissemination of knowledge of AI for SDGs aligns with UNESCO’s mandate of leveraging ICTs for sustainable development with a special emphasis on SDGs 4, 5, 8, 9, 10, 13, 16 and 17.

Knowledge 4 All Foundation played a pivotal role in the creation of the new UNESCO Recommendation in OER that can be implemented via our technology and can be used to empower all other stakeholders.

Preamble

The General Conference of the United Nations Educational, Scientific and Cultural Organization (UNESCO), meeting in Paris in 2019, at its 40th session,

Recalling that the Preamble of UNESCO’s Constitution affirms, “that the wide diffusion of culture, and the education of humanity for justice and liberty and peace are indispensable to the dignity of man and constitute a sacred duty which all the nations must fulfil in a spirit of mutual assistance and concern”,

Recognizing the important role of UNESCO in the field of information and communication technologies (ICT) and in the implementation of the relevant decisions in this area adopted by the General Conference of that Organization.

Further recalling Article I of UNESCO’s Constitution, which assigns to UNESCO among other purposes that of recommending “such international agreements as may be necessary to promote the free flow of ideas by word and image”,

Affirming the principles embodied in the Universal Declaration of Human Rights, that states all people have rights and fundamental freedoms that include the right to seek, receive and impart information and ideas through any media and regardless of frontiers (Article 19), as well as the right to education (Article 26), and the right to freely participate in the cultural life of the community, to enjoy the arts, and to share in scientific advancement and its benefits; and the right to the protection of the moral and material interests resulting from any scientific, literary, or artistic production of which one is the author (Article 27),

Also affirming the 2007 UN Declaration on the Rights of Indigenous Peoples, which recognizes the rights of indigenous peoples in formulating national legislation and implementing national policy,

Noting the 2006 Convention on the Rights of People with Disabilities (Article 24), which recognizes the rights of persons with disabilities to education and the principles contained in the Convention against Discrimination in Education (1960),

Referring to the resolutions of the General Conference of UNESCO with regard to the promotion of multilingualism and universal access to information in cyberspace,

Also referring to the 1997 UNESCO Recommendation concerning the Status of Higher-Education Teaching Personnel as well as the 1966 ILO/UNESCO Recommendation concerning the Status of Teachers which stresses that as part of academic and professional freedom teachers “should be given the essential role in the choice and adaptation of teaching material, the selection of textbooks, and the application of teaching methods”,

Reaffirming the importance of the United Nations 2030 Agenda for Sustainable Development, which underlines that the “spread of information and communications technology and global interconnectedness has great potential to accelerate human progress, to bridge the digital divide and to develop knowledge societies…” and of Goal 4 (SDG 4), which calls for the International community to “ensure inclusive and equitable quality education and promote lifelong opportunities for all”,

Acknowledging the 2003 World Summit on the Information Society, Declaration of Principles, committing “to build a people-centered, inclusive and development-oriented Information Society where everyone can create, access, utilize and share information and knowledge”,

Referring to the Education 2030 Framework for Action that lists a set of strategic approaches for the implementation of SDG 4, and which underlines increasing access must be accompanied by measures to improve the quality and relevance of education and learning, and in particular that ‘ Education institutions and programmes should be adequately and equitably resourced.. and books other learning materials, open educational resources and technology that are non-discriminatory, learning conducive, learner friendly, context specific, cost effective and available to all learners – children, youth and adults.’

Recognizing that the development of ICTs provides opportunities to improve the free flow of ideas by word, sound and image but also presents challenges for ensuring the participation of all in Knowledge Societies,

Recognizing that quality basic education, as well as media and information literacy are prerequisites to access and benefit from ICTs,

Also recognizing that, in building inclusive Knowledge Societies, Open Educational Resources (OER) can support quality education that is equitable, inclusive, open and participatory as well as enhance academic freedom and professional autonomy of teachers by widening the scope of materials available for teaching and learning,

Considering the 2007 Cape Town Open Education Declaration, the 2009 Dakar Declaration on Open Educational Resources, the 2012 Paris OER Declaration, the Millennium Declaration, the 2000 Dakar Framework for Action, and the International Covenant on Economic, Social and Cultural Rights (Article 13.1), which all recognize “the right of everyone to education”,

Building on the Ljubljana OER Action Plan 2017 to mainstream OER to help all Member States to create inclusive Knowledge Societies and achieve the 2030 Sustainable Development Agenda, namely SDG 4 (quality education), SDG 5 (Gender equality), SDG 9 (Infrastructure), SDG 10 (Reduced inequalities within and across countries), SDG 16 (Peace, justice and strong institutions) and SDG 17 (Partnerships for the goals), thereto:

  1. Adopts the present Recommendation on Open Educational Resources (OER);
  2. Recommends that Member States apply the following provisions by taking appropriate steps, including whatever legislative or other measures may be required, in conformity with the constitutional practice and governing structures of each State, to give effect within their territories to the principles of the Recommendation;
  3. Also recommends that Member States bring the Recommendation to the attention of the authorities and bodies responsible for learning and education and consult relevant stakeholders concerned with learning and education;
  4. Further recommends that Member States report to UNESCO, at such dates and in such manner as shall be determined, on the action taken in pursuance of this Recommendation.

Definition and Scope

  1. Open Educational Resources (OER) are teaching, learning and research materials in any medium that may be composed of copyrightable materials released under an open license, materials not protected by copyright, materials for which copyright protection has expired, or a combination of the foregoing.
  2. Open license refers to a copyright license that respects the intellectual property rights of the copyright owner and provides limited permissions granting the public the rights to access, use, adapt, and redistribute educational materials.
  3. Information and communication technologies (ICT) provide great potential for effective, equitable and inclusive access to OER and their use, adaptation and redistribution. They can open possibilities for OER to be accessible anytime and anywhere for everyone including individuals with disabilities and individuals coming from marginalized or disadvantaged groups. They can help meet the needs of individual learners and effectively promote gender equality and incentivize innovate pedagogical, didactical and methodological approaches.
  4. Stakeholders in this Recommendation include: educators, learners, governmental bodies, parents educational institutions, education support personnel, teacher trainers, educational policy makers, cultural institutions (such as libraries, archives and museums) and their users, technical infrastructure providers, researchers, research institutions, civil society organizations (including professional and student associations), publishers, the public and private sectors, intergovernmental organizations, copyright holders and authors, media and broadcasting groups and funding bodies.

Aims and Objectives

  1. One key prerequisite to achieve SDG 4 is sustained investment and educational actions by governments and other key education stakeholders, as appropriate, in the creation, curation, regular updating, ensuring inclusive and equitable access, and effective use of high quality materials and programmes of study.
  2. As is articulated in the 2007 Cape Town Open Education Declaration and the 2012 Paris OER Declaration, the application of open licenses to educational materials introduces significant opportunities for more cost-effective creation, access, use, adaptation, redistribution, curation, and quality assurance of those materials, including, but not limited to translation, adaptation to different learning and cultural contexts, development of gender-sensitive materials, and the creation of alternative and accessible formats of materials for learners with special educational needs.
  3. In addition, the judicious application of OER in combination with appropriate pedagogical methodologies can support a broad range of innovative pedagogical options to engage both educators and learners to become more active participants in educational processes and creators of content as members of diverse and inclusive Knowledge Societies.
  4. Furthermore, regional and global collaboration and advocacy in the creation, access, use, adaptation, redistribution and evaluation of OER can enable governments to optimise their own investments in educational content creation, as well as IT infrastructure and curation, in ways that will enable them to meet their defined national educational policy priorities more cost-effectively and sustainably.
  1. Noting these potential benefits, the objectives and areas of action of this OER Recommendation are as follows:

(i) Capacity building: developing the capacity of all key education stakeholders to create, access, use, adapt, and redistribute OER, as well as to use and apply open licenses in a manner consistent with national copyright legislation and international obligations;

(ii) Developing supportive policy: encouraging governments, and education authorities and institutions to adopt regulatory frameworks to support open licensing of publicly funded educational materials, develop strategies to enable use and adaptation of OER in support of high quality, inclusive education and lifelong learning for all, and adopt integrated mechanisms to recognize the learning outcomes of OER-based programmes of study;

(iii) Effective, inclusive and equitable access to quality OER: supporting the adoption of strategies and programmes including through relevant technology solutions that ensure OER in any medium are shared in open formats and standards to maximize equitable access, co-creation, curation, and search ability, including for those from vulnerable groups and persons with disabilities;

(iv) Nurturing the creation of sustainability models for OER: supporting and encouraging the creation of sustainability models for OER at national and institutional levels, and the planning and pilot test of new sustainable forms of education and learning;

(v) Facilitating international cooperation: supporting international cooperation between stakeholders to minimize unnecessary duplication in OER development investments and to develop a global pool of culturally diverse, locally relevant, gender-sensitive, accessible, educational materials in multiple languages.

Areas of Action

  1. This Recommendation addresses five objectives: (i) Building capacity of stakeholders to create, access, use, adapt and redistribute OER; (ii) Developing supportive policy; (iii) Encouraging inclusive and equitable quality OER; (iv) Nurturing the creation of sustainability models for OER; and (v) Facilitating international cooperation.

Building Capacity of Stakeholders to create, access, use, adapt and redistribute OER

  1. Member States are recommended to strategically plan and support OER capacity building and awareness raising at the institutional and national levels, targeting all education sectors and levels. Member States are encouraged to consider the following:

(a) building awareness among relevant stakeholder communities on how OER can increase access to educational resources, improve learning outcomes, maximise the impact of public funding, and empower educators and learners to become co-creators of knowledge;

(b) providing systematic and continuous capacity building (in-service and pre-service) on how to create, access, make available, use, adapt, and redistribute OER as an integral part of training programmes at all levels of education. This should include improving capacity of public authorities, policy makers, quality development and assurance professionals to understand OER and support their integration into teaching and learning;

(c) raising awareness concerning exceptions and limitations for the use of copyrighted works for educational and research purposes. This should be enacted to facilitate the integration of a wide range of works in OER, recognizing that the fulfilment of educational goals as well as the development of OER requires engagement with existing copyright protected works.

(d) leveraging open licensed tools, platforms and standards to help ensure OER can be easily found, accessed, used, adapted and redistributed in a safe, secure and privacy protected mode. This could include free and open source authoring tools, libraries and other repositories and search engines, systems for long term preservation and frontier technologies for automatic OER processing such as artificial intelligence methods and tools; and

(e) making available easily accessible resources that provide information and assistance to all OER stakeholders on OER related topics including copyright and open licensing of educational material.

Developing supportive policy

  1. Member States, according to their specific conditions, governing structures and constitutional provisions, should develop or encourage policy environments, including those at the institutional level, that are supportive of effective OER practices. Through a transparent participatory process that includes dialogue with stakeholders, Member States are encouraged to consider the following:

(a) developing and implementing policies and/or regulatory frameworks which encourage that educational resources developed with public funds be openly licensed or dedicated to the public domain as appropriate, and financial and human resources coordinated for the implementation of the policies;

(b) encouraging and supporting institutions to develop or update legal or policy frameworks to stimulate the creation, access, use, adaptation and redistribution of quality OER by educators and learners; and to develop and integrate quality assurance mechanism for OER into the existing quality assurance strategies for teaching and learning materials;

(c) developing mechanisms to support and incentivize all stakeholders to publish source files and accessible OER using standard open file formats in public repositories;

(d) aligning OER policies with other open policies and guiding principles such as those for Open Access, Open Data, Open Pedagogy, Open Source Software and Open Science; and

(e) adjusting or reforming the curriculum and assessment in accordance with the needs of the use of OER and to motivate the active use, creation, and sharing of OER by teachers and students; and recognizing the learning outcomes of OER-based programmes of study;

Encourage inclusive and equitable OER

  1. Member States are encouraged to support the creation, access, use, adaptation and redistribution of inclusive and equitable quality OER for all stakeholders. These would include those learners in formal and non-formal education contexts irrespective of age, gender, physical ability, socio-economic status, as well as those who live in remote areas (including nomadic populations), internally and forcibly displaced persons, migrants and refugees. In all instances, gender equality should be ensured, and particular attention paid to equity and inclusion for learners who are especially disadvantaged due to multiple and intersecting forms of discrimination. Member States are recommended to consider the following:

(a) ensuring access to OER that most suitably meet both the needs and material circumstances of target learners and the educational objectives of the courses or subjects for which they are being provided. This would include offline (including printed) modalities for accessing resources where appropriate;

(b) supporting OER stakeholders to develop gender-sensitive, culturally and linguistically relevant OER, and to create local language OER, particularly in indigenous languages which are less used, under- resourced and endangered;

(c) ensuring that the principle of gender equality, non-discrimination, accessibility and inclusiveness is reflected in strategies and programmes for creating, accessing, using, adapting, and redistributing OER;

(d) incentivising public and private investments in IT infrastructure and broadband, to provide increased access to OER, particularly for low-income, rural and remote communities; and

(e) developing and adapting existing evidence-based standards, benchmarks and related criteria for the quality assurance of OER, as appropriate, which emphasize reviewing educational resources (both openly licensed and not openly licensed) under regular quality assurance mechanisms.

Nurturing the creation of sustainability models for OER

  1. Member States, according to their specific conditions, governing structures and constitutional provisions, are recommended to support and encourage the development of comprehensive, inclusive and integrated OER sustainability models. Member States are encouraged to consider the following:

(a) reviewing current provisions, procurement policies and regulations to expand and simplify the process of procuring quality goods and services to prioritize the creation, ownership, translation, adaptation, curation, and sharing of OER, where appropriate, as well as to develop the capacity of all OER stakeholders to participate in these activities;

(b) catalysing sustainability models through traditional funding sources but also non-traditional reciprocity based revenue generation such as donations, memberships, pay what you want, and crowdfunding that may provide revenues and sustainability to OER provision;

(c) promoting and raising awareness of other value added models using OER across institutions and countries where the focus is on participation, co-creation, generating value collectively, community partnerships, spurring innovation, and bringing people together for a common cause; and

(d) enacting regulatory frameworks that support the development of OER products and services that align with national standards as well as the interest and values of the OER stakeholders.

Facilitating international cooperation

  1. To promote the development and use of OER, Member States should facilitate international cooperation among all relevant stakeholders, whether on a bilateral or multilateral basis. Member States are encouraged to consider the following:

(a) promoting and stimulating cross-border collaboration and alliances on OER projects and programmes, leveraging existing transnational, regional and global collaboration mechanisms and organizations. This should include joining efforts on collaborative development and use of OER as well as capacity building, communities of practice, joint research on OER, and mutual cooperative assistance between all countries regardless of their state of development;

(b) exploring methods to establish regional and international funding mechanisms for implementing and strengthening OER as well as to understand mechanisms that can support international, national, and regional efforts;

(c) supporting the creation and maintenance of effective peer networks that share OER, based on areas such as subject matter, language, institutions, regions, and level of education, at local, regional and global levels; and

(d) incorporating, where appropriate, specific clauses relating to OER in international agreements concerned with cooperation in the fields of education.

Monitoring

  1. Member States should, according to their specific conditions, governing structures and constitutional provisions, evaluate policies and programmes related to OER using a combination of quantitative and qualitative approaches, as appropriate. Member States are encouraged to consider the following:

(a) deploying appropriate research programmes, tools and indicators to measure the effectiveness and efficiency of OER policies and incentives against defined objectives, including specific targets for disadvantaged and vulnerable groups;

(b) collecting, presenting, and disseminating progress, good practices, innovations and research reports on OER and its implications with the support of UNESCO and international open education communities; and

(c) developing strategies to monitor and evaluate the educational effectiveness and long-term financial efficiency of OER, which include participation of all relevant stakeholders. Such strategies could focus on improving learning processes and strengthening the connections between findings, decision-making, transparency, and accountability to ensure the best educational outcomes. This would include support to the development of an evidence base on the impact of OER on education and learning.

 

Abstract

To create an automatic data annotation tool and ground truth dataset for malaria diagnosis using deep learning. The ground truth dataset and the tool will streamline the development of AI tools for pathology diagnosis.

Introduction

Technology is transforming how health care is delivered in Africa, providing more people especially in limited resource setting areas and around the world access to better care. Likewise, easier access to data supports both doctors and policymakers in making better-informed decisions about how to continue to improve the health care system. However, the existing traditional methods especially for disease diagnosis have limitations such as expensive equipment, need of experts and time consuming for a single diagnosis. This becomes impractical in areas with high disease burden such as sub-Saharan regions. In this project we focus on improvement of malaria diagnosis.We choose malaria because it is a life threatening disease dominant in developing countries. According to WHO In 2017, nearly half of the world’s population was at risk of malaria with more than 90 countries reporting malaria cases and Africa was home to 435,000 death. Also they report that malaria kills a child every 2 minutes. Nevertheless, prompt diagnosis and treatment can reduce such death.

In the area of Artificial Intelligence (AI), several techniques have been adopted to create malaria diagnosis tools that are fast, accurate and requires less experts. Deep convolutional networks as one of AI techniques, has been used for detection of malaria parasites (Sanchez Sanchez, 2015). Concerning the sensitivity of health, AI tools dealing with health issues such as diagnosis usually require large amounts of data in order to achieve high accuracy for its applicability. However, in the context of developing countries there is a shortage of such data for research and developing such tools. Henceforth, there is a necessity of creation of dataset for research and development of pathology diagnosis tool such as for malaria.

Rationale

One of the major problems that hinders development of AI and its applicability in developing countries include lack of data. This is evident in limited access to the available data from both government and non-governmental organization. In addition to that, data may be available but still lacks the necessary quality in terms of pixels, labels that is required for development of AI tools. Lastly, in some scenarios such as agriculture and health there is no data to be used for training, testing and validation of AI tools. For these reasons, it becomes difficult and takes a longer time to create a comprehensive dataset. These problems regarding dataset, particularly in the health sector, cause a significant setback to the AI tools development which is a potential technology in solving problems in our health sector. Therefore, there is a need of  coming up with a tool for improving the entire process of acquiring dataset.

Main Objective

The aim of this project is to create AI tool that will be used to effectively create ground truth dataset for malaria diagnosis using deep learning. Specific Objectives:

  • To capture microscopic images of malaria parasitized and uninfected stained blood smear
    sample using a smartphone.
  • To develop automatic annotation tool for the captured images by integrating an open
    source annotation tool and object detection model.
  • To verify the effectiveness of the automatic annotation tool.

Abstract

To initiate a research roadmap for the preservation of indigenous languages through the means of collecting, categorizing and archiving of translation and voice synthesis to perform the automatic translation in official and indigenous languages.

Objectives

Build, Curate and Explore a massive dataset of public content in indigenous languages. The objective is to identify and enumerate data sources for retrieving content in a indigenous language, creating an open archive that can be leveraged in a variety of activities, including for training translation models to promote national languages, or for building vocal synthesizers to help distribute news content to illiterate citizens.

Initiate a research roadmap on translation and voice synthesis to promote indigenous languages through content sharing Preserving indigenous languages is a challenging endeavor which require first closing the information gap that may exist between official (mainly colonial) languages and indigenous languages. For  example, news content are abundant in official languages, while rural areas are provided with brittle summaries in indigenous languages. Artificial intelligence can help in closing the gap through automatic translation of texts and voice synthesis (to account for illiteracy). The project will initiate a state-of-the-art survey of available and missing components in the context towards realizing this endeavor.

Long-term vision

The long-term vision of the preservation project is to ensure that indigenous languages, hence the indigenous cultures, are sustained. To that end, the project investigates:

  • the means to systematize the collection and archiving the contents. This will ensure that all data are openly made available in readily processible formats and in a unique repository endpoint
  • the opportunity to perform automatic translation to ensure a back-and-forth exchange of viewpoints in official and indigenous languages
  • the democratization of information by the elite to the rural citizens who only speak indigenous languages.

This last point is the ultimate goal towards preserving indigenous languages by ensuring that the information gap is closed, thus realizing one objective of open data, which is to increase democratic participation via information.