The report Artificial Intelligence Capacity in Africa, commissioned by the Knowledge 4 All Foundation as part of the AI4D grant initiative, provides a comprehensive analysis of the AI landscape in Sub-Saharan Africa (SSA). It highlights significant gaps and opportunities in AI education, research, and policy across the region. The study identifies key stakeholders, including higher education institutions, governments, and the broader AI community, emphasizing their roles in fostering a robust and responsible AI ecosystem. It underscores the importance of capacity building, from enhancing formal education in AI to supporting short-term training programs, while addressing gender and diversity challenges that hinder inclusive AI development. The findings reveal that despite growing interest, many institutions face constraints such as limited funding, inadequate infrastructure, and a lack of AI-focused policies.

Artificial Intelligence Capacity in Sub-Saharan Africa
Artificial Intelligence Capacity in Sub-Saharan Africa

A major finding of the report is the lack of systematic integration of AI into higher education curricula and research across SSA. While several universities offer AI-related modules within broader science or engineering programs, dedicated AI degrees remain rare. The report points out the critical need for both foundational skills in STEM and the inclusion of humanities and social sciences to ensure ethical and socially relevant AI solutions. It also reveals significant disparities in gender representation, with males dominating AI-related education and professional spaces. This calls for targeted initiatives to promote diversity, such as scholarships and mentorship programs for women and underrepresented groups in AI.

The report also addresses the challenges of establishing a supportive ecosystem for AI development. Limited government engagement in AI policy and strategy formation, coupled with a lack of alignment between academic institutions and industry needs, stifles innovation. Moreover, issues such as unreliable internet connectivity, inadequate access to data, and limited funding for AI startups further hinder growth. The study highlights the need for public-private partnerships to fund research and infrastructure and suggests creating national AI strategies that align with global ethical standards and regional development priorities.

In conclusion, the report offers actionable recommendations to enhance AI capacity in SSA. It calls for governments to develop regulatory frameworks and invest in AI research, while academic institutions are encouraged to integrate AI into their curricula and foster interdisciplinary research. The AI community is urged to champion diversity and inclusion, provide technical expertise, and collaborate with policymakers. Through the coordinated efforts of all stakeholders, SSA has the potential to harness AI as a transformative force for socio-economic development while ensuring equitable and ethical applications.

The report Responsible Artificial Intelligence in Sub-Saharan Africa: A General State of Play and Landscape examines the status, challenges, and opportunities for adopting responsible AI in the region. Commissioned by the Knowledge 4 All Foundation as part of the AI4D grant initiative, the report identifies significant gaps in AI readiness, infrastructure, and policy across Sub-Saharan Africa. It underscores the potential of AI to drive progress in achieving sustainable development goals (SDGs), such as poverty reduction, improved healthcare, and better education. However, the report warns that without targeted investments and ethical frameworks, AI may exacerbate existing inequalities. The study highlights the uneven distribution of AI advancements, with certain countries like South Africa, Kenya, and Ghana leading the way due to relatively stronger technological infrastructure and policy initiatives.

Responsible Artificial Intelligence in Sub-Saharan Africa: Landscape and General State of Play
Responsible Artificial
Intelligence in Sub-Saharan Africa: Landscape and General
State of Play

A key finding of the report is the critical role of innovation ecosystems, capacity building, and policy frameworks in fostering responsible AI. The report identifies a growing number of grassroots machine-learning communities, academic partnerships, and emerging start-ups as the foundation for AI development in the region. However, it stresses that many of these initiatives are underfunded and lack robust local leadership. Furthermore, the reliance on imported technologies and frameworks often overlooks the unique socio-economic and cultural contexts of the region, limiting their effectiveness and sustainability. This points to the need for AI solutions tailored to African realities, particularly in sectors like agriculture and public health.

The report also examines the ethical implications of AI deployment in Sub-Saharan Africa, particularly concerning data privacy and algorithmic bias. It highlights how a lack of inclusive data and contextual algorithms can reinforce existing societal inequalities, particularly those affecting marginalized groups and women. Furthermore, the report warns against the unchecked adoption of AI technologies developed in regions with different socio-economic contexts, cautioning that such practices could lead to digital colonialism. It recommends proactive engagement with local stakeholders to ensure AI technologies are culturally sensitive and aligned with the values of the communities they aim to serve.

In conclusion, the report emphasizes the importance of collaborative efforts between governments, academic institutions, and private entities to build a robust and inclusive AI ecosystem in Sub-Saharan Africa. It advocates for increased investment in capacity-building initiatives, improved infrastructure, and the establishment of ethical governance frameworks to support the responsible development of AI. Through strategic interventions and leveraging initiatives like the AI4D grant, Sub-Saharan Africa can position itself as a leader in responsible AI innovation that aligns with global best practices while addressing regional challenges.

The International Research Center in AI – IRCAI (Slovenia), the Data-Pop Alliance (USA), the ELLIS Unit Alicante Foundation (Spain), the Knowledge 4 All Foundation (UK), the UNESCO Chair in Analytics and Data Science at University of Essex, and the Regional Center for Studies on the Development of the Information Society – CETIC (Brazil) under the auspices of UNESCO have established NAIXUS – a global network of AI centres of excellence for sustainable development.

The founding partners and all members are dedicated to connecting the best researchers and projects and helping them build a community dedicated to solving sustainability challenges by facilitating international research collaboration.

The initial partners are coming from Slovenia, Australia, Andorra, Brazil, Chile, Finland, France, Ghana, Hungary, Iceland, Italy, Kenya, Mexico, Netherlands, Nigeria, Pakistan, Senegal, South Africa, Spain, Sweden, Tanzania, UK and USA.

Scientific partners include:

  1. International Research Centre in Artificial Intelligence under the auspices of UNESCO (IRCAI)
  2. Aboitiz Data Innovation
  3. Advanced International Center for Smart Decision Science Applications based on Blockchain And Artificial Intelligence (BAIA)
  4. African Institute for Mathematical Sciences (AIMS)
  5. AI Laboratory at the University of the Witwatersrand (RAIL)
  6. Andorra Research + Innovation
  7. Artificial Intelligence Policies Association
  8. Bio-Robotics Laboratory, National Autonomous University of Mexico
  9. Chung-Hua Institution for Economic Research
  10. Data Scientists Network Foundation
  11. Data-Pop Alliance
  12. ELLIS Unit Alicante Foundation
  13. Eötvös Loránd University
  14. Finnish Center for Artificial Intelligence (FCAI)
  15. Icelandic Institute for Intelligence Machines
  16. International Computer Science Institute (ICSI)
  17. Kabarak University
  18. Knowledge 4 All Foundation
  19. Masakhane Foundation
  20. National Cheng Kung University
  21. Northeastern University, Northeastern Civic A.I.
  22. Queensland University of Technology
  23. Regional Center for Studies on the Development of the Information Society (CETIC)
  24. Tanzania AI Lab & Community
  25. The Alan Turing Institute
  26. The Hague University of Applied Sciences
  27. TU Delft, Digital Ethics Centre
  28. Research ICT Africa
  29. UNICEF headquarters
  30. University College London, Centre for Artificial Intelligence
  31. University Islamabad of the Islamic Republic of Pakistan (COMSATS)
  32. University of Cape Coast
  33. University of Edinburgh
  34. University of Essex
  35. University of Gothenburg
  36. University of Leeds
  37. University of Pretoria
  38. University of Tuscia
  39. University of Ljubljana
  40. University of Gabes
  41. University of Monastir

Abstract

The monitoring of work towards the SDGs is essential to assess progress and obstacles to realise our shared agenda.

A large amount of SDG documents created by governments, universities, as well as private and public entities are often assessed by the UN to measure progress, usually requiring expert labelling. However, annual SDG progress reports are becoming more common beyond the UN (for example in academia, to evaluate the contribution of research/teaching to this agenda), aiming to identify challenges and achievements.

In this project we propose to create an automatic tool for SDG labelling based on Artificial Intelligence (AI), which can save time in expert querying, facilitating this labelling. Additionally, we propose to leverage the power of cutting-edge AI-based language models. These models are usually trained on the whole internet before being fine tuned on a task (such as SDG tagging). As such, they bring an enormous level of expertise that could reduce the bias in expert labels, as well as represent the interconnectedness of our SDGs.

Our final objective is to build an online tool (web app and API) for querying the model, which has a wide range of use cases in research and education.

Personel

Dr Perez-Ortiz is an Assistant Professor at the Centre for Artificial Intelligence at UCL. She isprogram co-founder and Deputy Director of a new MSc program on AI for Sustainable Development, which engages the new generations of engineers in developing responsible and innovative AI technologies for people and the planet. She teaches two modules related to AI and the intersection of the UN’s SDG agenda, as well as how to build responsible and ethical AI systems. Her research is fully interdisciplinary, actively collaborating with psychologists, medical doctors, social scientists, educators, agronomists and climate scientists alike. Every summer, Perez-Ortiz leads a group of MSc students to complete their dissertation in the technology for sustainable development domain, creating new technologies for identifying illegal deforestation/fishing, enabling the energy transition, designing tools to understand the impact of policies, etc. Perez-Ortiz has more than 12 years of experience doing theoretical and applied AI research (h-index 21), with a focus on environmental AI and educational recommender systems. Perez-Ortiz has collaborated in fruitful research with the European Space Agency, the HumaneAI network, the Knowledge 4 All Foundation, Apple, Google’s DeepMind, Spotify and multiple European and American universities.

Sahan Bulathwela is a Research Assistant contributing to multiple large projects on the topic of “AI for Education”. His contributions to the area, published in esteemed research venues, span multiple topics connected to this grant, namely text-tagging, recommender systems and natural language processing. Before joining UCL, he worked in several research roles in the industry where he gained experience in creating data products in a big data landscape. He has experience managing engineering teams to build API and web services.

John Shawe-Taylor is Professor of AI at UCL, Director of the UCL MSc on AI for Sustainable Development, Director of the International Research Center on Artificial Intelligence under the auspices of UNESCO and UNESCO Chair in AI. His foundational work in AI has attracted around 85.000 citations, making him one of the most featured and prolific researchers in the field.

Dr Wayne Holmes is a learning sciences and innovation researcher at the UCL Institute of Education, as well as a consultant researcher on AI and  education for UNESCO. Wayne brings a critical studies perspective to the connections between AI and education, and their ethical, human
and social justice implications.

 

 

International studies identify a lack of preparation and training in OE usage. However, the problem is to build the capacity to use OE as a tool to solve social problems. The OE4BW educational program allows its mentees/project developers to develop an advanced understanding, while addressing specific challenges in the areas of capacity and community building in OE.

The OE4BW mentoring programme is at the forefront of combining OER and SDGs and helping create a more personal approach towards building OER that can inform, educate and present value in new ways. The OERs have to address at least one of the 17 Sustainable Development Goals (SDGs), from ending poverty to a range of social needs including education, health, equality and job opportunities, while tackling climate change and preserving our environment.

It is a half year-long programme which is organised in a sustainable way as it takes place fully online for students from all backgrounds, regions and continents with the potential and desire to employ Open Educational Resources to solve large scale and relevant problems important in relation to today’s global landscape.

New project developers and new communities will require technical and media knowledge, educational content, pedagogical and didactical principles, social and psychological aspects, new organization and value-added models, strategies, and the potential paths for the organizational change, relevant policies, and legal aspects.

In addition, OE projects occur in a social context, requiring a social justice component. Many formal programs are inaccessible for students from the global South, underdeveloped countries, and underrepresented communities. Furthermore, leaders and their projects may not be properly connected to others. A critical mass of leaders in open education is fundamentally important to start making global changes.

The OE4BW addresses educational pathways, network development, and improved outcomes for open education in meaningful ways. From new participants to next-generation leadership, the program accelerates personal, professional and educational development.  Together, it creates new networks of first-time participants, mentors, coordinators, advisors. In particular, by creating networks of new participants, OE4BW strives to build inclusivity in the OE movement.

Outcome 1: Improve the communications infrastructure of OE4BW and staff capabilities

Enhance communication and collaboration among the developers, mentors, hub coordinators, and alumni of the OE4BW mentorship program through the customized MiTeam platform.

Outcome 2: Strengthened networks and new topical hubs

Support participants to physically join the OE4BW yearly final event EDUSCOPE in 2022, projected as a live event.

Outcome 3: Research the assumptions, practices, and results of OE4BW

Investigate the respective impacts and results of OE4BW. Provide research results as a basis for further improvement. Use qualitative and quantitative analysis through surveys/questionnaires of the OE4BW participants to determine the programs’ current impacts on society and connection to SDGs.

 

Description

Namibia is home to 2.5 million people with a rich cultural and colonial history spanning over 100 years.

The stories of the Namibian people have not been told with regards to their cultural practises, knowledge, nor its history from the perspectives of the Namibian people. As Goring said at the Nuremberg trials “The victor will always be the judge, and the vanquished the accused.”

As such, this project aims to capture this knowledge in the historical and cultural context, for one of the most critically endangered languages, Khoekhoegowab and the Namibian most widely spoken, Oshiwambo — and in doing so provide data for NLP tasks.

This project builds on prior efforts to create cultural and historical texts in the khoekhoegowab language, by crowdsourcing a speech dataset from 300 war veterans from a potential 10000 Namibian war veterans, mostly Oshiwambo speaking and a community of Khoekhoegowab elders, whose traditional methods are still used in wildlife conservation, for monitoring and tracking.

The project will consider various data gathering methods such as interviews, focus groups and web apps to capture the data. The speech data will be annotated and translated into English

The objective of this project is to build a Wolof text-to-speech system. Three people will be involved Thierno Ibrahima DIOP, senior data scientist at Baamtu SARL, El Hadj Mamadou Nguer, Assistant Professor at Universite Virtuelle du Senegal, and Sileye BA, Senior machine learning researcher at L’Oreal Innovation Center, in Paris. Thierno Ibrahima DIOP, and Mamadou Nguer will be the project’s principal investigators.

The project will exploit a dataset of 40000 Wolof phrases uttered by two actors. This open-source dataset is a deliverable of a previous project.

The project will be conducted following four phases:
1. Evaluation of the quality of the dataset
2. Implementation of a machine learning model mapping Wolof texts into their
corresponding utterances
3. Quantitative and qualitative evaluation of the implemented model’s performances
4. Development of and API exposing implemented text to speech model

Database quality will be assessed on a randomly sampled portion of about a thousand uttered phrases. These phrases will be qualitatively validated in terms of comprehensiveness by fluent Wolof speakers.

A state of the art in neural network speech synthesizer will be implemented and evaluated using the dataset. Neural network models have been selected as they can be trained end to end without requiring word segmentation at the phoneme level as required by competing statistical models. We will investigate Text-to-Spectrogram models such as Tacotron, Glow-TTS, Speedy-Speech, and also Vocoders models such as MelGAN.

The trained model will be evaluated quantitatively and qualitatively. The quantitative evaluation will be done using metrics provided in standard text to speech evaluation libraries. The qualitative evaluation will be based on fluent Wolof speakers’ comprehension of synthesized Wolof utterances.

The model will be exposed via an API which will take as input a language token and input text, and returns the synthesized input text into an audio file. This API will be plugged to à web platform based on the Masakhane MT web platform.

For the deployment a kubernetes cluster will be used to have a horizontal scaling, in the beginning, we can have only one instance, and depending on the load, the number will be automatically adjusted. The cost of an instance (8 cores, 32GB of RAM) will be about $83.95 per Month subject to a yearly reservation basis.

An objective of this project is to publish work done on the dataset, and the developed speech synthesis model in a natural language processing conference such as African NLP Workshop, or Deep Learning Indaba. This will give more visibility to this work, and at the same time advances machine learning based African language processing activities.

 

Introduction

Wildlife tourism is a significant and growing contributor to the economic and social development in the African region through revenue generation, infrastructure development and job creation. According to a recent press release by the World Travel and Tourism Council [1], travel and tourism contributed $194.2 billion (8.5% of GDP) to the African region in 2018 and supported 24.3 million jobs (6.7% of total employment). Globally, travel and tourism is a $7.6 trillion industry, and is responsible for an estimated 292 million jobs [2]. Tourism is also one of the few sectors in which female labor participation is already above parity, with women accounting for up to 70% of the workforce [2].

However, the wildlife tourism industry in Africa is increasingly threatened by rising human population and wildlife crime. As poaching becomes more organised and livestock incursions become frequent occurrences, shortages in ranger workforce and shortcomings in technological developments in this space have put thousands of species at risk of endangerment, and threaten to collapse the wildlife tourism industry and ecosystem.

Tourism in Kenya contributed a revenue of $1.5 billion in 2018 [3]. And The National Wildlife Conservation Status Report, 2015 – 2017 [4] presented by the Ministry of Tourism and Wildlife of Kenya claimed that wildlife conservancies in Kenya supported over 700,000 community livelihoods. The recession of the wildlife tourism industry could therefore have major adverse economic and social impacts on the country. It is thus critical that sustainable solutions are reached to save the wildlife tourism industry, and further research is fuelled in this area.

Problem definition

According to The National Wildlife Conservation Status Report, 2015 – 2017 [4] presented by the Ministry of Tourism and Wildlife of Kenya, there is currently a shortage of 1038 rangers, from the required 2484 rangers in Kenyan national parks and reserves, a deficit of over 40%. To address shortages in ranger workforce, carry out monitoring activities more effectively, and detect criminal or endangering activities with greater precision, we propose the deployment of Unmanned Ground Vehicles (UGVs) for intelligent patrol and wildlife monitoring across the national parks and reserves in Kenya.

The UGVs would be fitted with a suite of cameras and sensors that would enable it to navigate autonomously within the parks, and run multiple deep learning and computer vision algorithms that can carry out numerous monitoring activities such as detection of poaching, livestock incursions, human wildlife conflict, distressed wildlife, and species identification.

The UGVs could be monitored from a central surveillance system, where alerts can be generated on detection of any alarming activity, and rangers dispatched to respond. Ethical considerations can be made to facilitate the deployment of these UGVs in a manner that aids the ranger workforce in their routine surveillance tasks throughout the national parks and reserves that often span thousands of square kilometers, rather than replace them. Sustainable and ethical automation could help create more jobs in the automotive and technology sectors without replacing current jobs.

The deployment of a project of this scale, however, would require significant investments in building the UGV, and require feasibility studies from the government and international wildlife conservation bodies. Furthermore, without reasonable computer vision and autonomous navigation accuracies, investments towards building the unmanned vehicle would be futile. It is thus crucial that efforts are first made towards solving the computer vision and autonomous navigation challenges posed by the rough terrains prevalent in national parks and reserves.

This project therefore serves as a stepping-stone towards adopting autonomous vehicle technology in Africa and pioneering further research in the field and its applications to broader areas beyond just transportation. Additionally, its adaptation in national park environments would allow it to be tested in unstructured environments lacking road infrastructure and free of traffic and pedestrians, thus allowing the systems to be tested safely and get quicker policy approvals. The scope of this research is hence limited to developing an end-to-end deep learning model that can autonomously navigate a vehicle over dirt roads and challenging terrain that is present in national parks and reserves.

The model will be trained on trail video as well as driving data such as steering wheel angle, speed, acceleration, and Inertial Measurement Unit (IMU) data. The accuracy of the model will be measured by calculating the error rate between the model’s prediction and the driver’s actual inputs over a given distance. We also look to publish the dataset of annotated driving data from national parks and reserves, the first of its kind, to encourage further research in this space. Additionally, we shall collect metadata such as number of patrol vehicles per square kilometer, average distance travelled per vehicle per day, distance of traversable road in the park per square kilometer, that can be used to give a preliminary analysis on the feasibility of the project results towards automated wildlife patrol.

References

[1] “African tourism sector booming – second-fastest growth rate in the world”, WTTC press release, Mar. 13, 2019. Accessed on Jul. 11, 2019. [Online]. Available:
https://www.wttc.org/about/media-centre/press-releases/press-releases/2019/african-tourism-sector-booming-second-fastest-growth-rate-in-the-world/
[2] “Supporting Sustainable Livelihoods through Wildlife Tourism”, World Bank Group, 2018.
[3] “Tourism Sector Performance Report – 2018”, Hon. Najib Balala, 2018.
[4] “The National Wildlife Conservation Status Report, 2015 – 2017”, pp. 131, 74, 75 Ministry of Tourism and Wildlife, Kenya, 2017.

 

Abstract

According to the Open Data Barometer by the World Wide Web Foundation, countries in sub-Saharan Africa are ranked poorly with an average score of about 20 out of a maximum of 100 on open data initiatives based on readiness, implementation, and impact [1]. To make the processing of creation, introduction, and passage of parliamentary bills a force for public accountability, the information needs to be easier to analyze and process by the average citizen.

This is not the case for most of the bills introduced and passed by parliaments in Sub-Saharan Africa. In this work, we present a method to overcome implementation barrier. For the Nigerian parliament, we used a pre-trained optical character recognition tool (OCR), natural language processing techniques and machine learning algorithms to categorize congress bills. We propose to improve the work on the Nigeria parliamentary bills by using text detection models to build a custom OCR tool. We also propose to extend our method to three other African countries:  South Africa, Kenya, and Ghana.

Introduction

Given the challenges and precariousness facing developing and underdeveloped countries, the quality of policymaking and legislation is of enormous importance. This legislation can be used to impact the success of some of the United Nations Sustainable Development Goals (SDGs) like poverty alleviation, good public health system, quality education, economic growth and, sustainability. Targets 16.6 and 16.7 from the UN SDGs is to “develop effective, accountable, and transparent institutions at all levels” and to “ensure responsive, inclusive, participatory and representative decision making at all levels” [2]. For countries in Sub-Saharan Africa to meet this target, an open data revolution needs to happen at all levels of government and more importantly, at the parliamentary level.

Objectives and Expectations

To achieve the goal of meeting the UN SDG targets 16.6 & 16.7, making effective use of data is key. However, does such data currently exists? If so, how should it be organized in a framework that is amenable to decisionmaking process? Here, we propose expanding our work on categorizing parliamentary bills in Nigeria using Optical Character Recognition (OCR), document embedding and recurrent neural networks to three other  countries in Africa: Kenya, Ghana, and South Africa.

We also plan to improve our text extraction process by training a custom OCR using AI. The objective of this project is to generate semantic and structured data from the bills and in turn, categorize them into socio-economic driven labels. We plan to recruit three interns to work on this project for five months: two machine learning and one software engineering interns.

Conclusion and Long Term Vision

Our initial experimental results show that our model is effective for categorizing the bills which will aid our large scale digitization efforts. However, we identified a key remaining challenge based on our results. The output from the pre-trained OCR tool is not generally a very accurate representation of the text in the bills, especially for the low-quality PDFs. A fascinating possibility is to solve this by training our custom OCR which we proposed. The intensive acceleration of text detection research with novel deep learning methods can help us in this area.

Methods such as region-based or single-shot based detectors can be employed. In addition to this, we plan to use image augmentation to alter the size, background noise or color of the bills. A large scale annotation effort of the texts can be as the labels for us to train our custom OCR for text identification and named entity recognition. We are also extending our methodology to other countries in Sub-Saharan Africa. Results that lead to accurate categorization of parliamentary bills are well-positioned to have a substantial impact on governmental policies and on the quest for governments in low resource countries to meet the open data charter principles and United Nation’s sustainability development goals on open government.

Also, it can empower policymakers, stakeholders and governmental institutions to identify and monitor bills introduced to the National Assembly for research purposes and facilitate the efficiency of bill creation and open data initiatives. We plan to design an intercontinental tool that combines information from all bills and categories and make them easily accessible to everyone. For our long term vision, we plan to analyze documents on parliamentary votes and proceedings to give us more insight into legislative debates and patterns.

Description

Algorithms for text classification still contain some open problems for example dealing with long pieces of texts and with texts in under-resourced languages.

This challenge gives participants the opportunity to improve on text classification techniques and algorithms for text in Chichewa. The texts are of varying length, some being quite long and will pose some challenges in chunking and classification. The texts are made up of news articles.

The objective of this challenge is to classify news articles.

We hope that your solutions will illustrate some challenges and offer solutions.

Algorithms for text classification have come a long way, but classifying long texts and working with under-resourced languages can still pose difficulties. This challenge gives participants the opportunity to improve on text classification techniques and algorithms for text in Chichewa. The texts are made up of news articles or varying lengths. The objective of this challenge is to classify these articles by topic. We hope that your solutions will illustrate some challenges and offer solutions.

Chichewa is a Bantu language spoken in much of Southern, Southeast and East Africa, namely the countries of Malawi and Zambia, where it is an official language, and Mozambique and Zimbabwe where it is a recognised minority language.

tNyasa Ltd Data Science Lab

We are a company based in Malawi offering intelligent technological solutions for the travel, technology, trade, cultural and education sector in Malawi. Part of the data Science Lab we work on language tools for Chichewa such as the construction and curation of data sets, speech to text and information processing.

AI4D-Africa is a network of excellence in AI in sub-Saharan Africa. It is aimed at strengthening and developing community, scientific and technological excellence in a range of AI-related areas. It is composed of African Artificial Intelligence researchers, practitioners and policymakers.

Datasets

The data was collected from news publications in Malawi. tNyasa Ltd Data Science Lab have used three main broadcasters: the Nation Online newspaper, Radio Maria and the Malawi Broadcasting Corporation. The articles presented in the dataset are full articles and span many different genres: from social issues, family and relationships to political or economic issues.

The articles were cleaned by removing special characters and html tags.

Your task is to classify the news articles into one of 19 classes. The classes are mutually exclusive.

List of classes: [‘SOCIAL ISSUES’, ‘EDUCATION’, ‘RELATIONSHIPS’, ‘ECONOMY’, ‘RELIGION’, ‘POLITICS’, ‘LAW/ORDER’, ‘SOCIAL’, ‘HEALTH’, ‘ARTS AND CRAFTS’, ‘FARMING’, ‘CULTURE’, ‘FLOODING’, ‘WITCHCRAFT’, ‘MUSIC’, ‘TRANSPORT’, ‘WILDLIFE/ENVIRONMENT’, ‘LOCALCHIEFS’, ‘SPORTS’, ‘OPINION/ESSAY’]

Files available for download:

  • Train.csv – contains the target. This is the dataset that you will use to train your model.
  • Test.csv- resembles Train.csv but without the target-related columns. This is the dataset on which you will apply your mode.
  • SampleSubmission.csv – shows the submission format for this competition, with the ‘ID’ column mirroring that of Test.csv. The order of the rows does not matter, but the names of the IDs must be correct.

Partners

AI4D-Africa; Artificial Intelligence for Development-Africa Network
AI4D-Africa; Artificial Intelligence for Development-Africa Network