Motivation

Poultry diseases are economically significant from their effects on poultry production, performance and some are zoonotic. Poultry farming in Tanzania operates on small to medium scale managed by youths and women in peri-urban and rural areas. Farms are either backyard type or semi-intensive.

Poultry sector in Tanzania valued at USD 210 million in 2015 has 36 million chickens in 4.6 million households (27 million people). The poultry sector is challenged by low productivity from diseases including Salmonella, Newcastle and Coccidiosis. The economic effects include high mortality rates.

Vaccination coverage is limited to some farming systems and specific areas. Therefore, robust diagnostics are required to these diseases especially in endemic areas like Tanzania. With the help of deep learning; farmers will have the potential to better diagnose poultry diseases and improve livestock health.

The ongoing efforts to develop an automated poultry diseases diagnostics tool using deep learning will produce a dataset of 2,000 high-quality labeling for 500 images per class of poultry fecal1. The classes are healthy, Salmonella, Newcastle Disease and Coccidiosis.

The initial baseline models trained for image classification on VGG16 and Resnet 50 deep learning frameworks overfit on the dataset of 271 to 500 images per class2,3. We aim to supplement the dataset with 2,000 images per class to train a large enough model for diagnostics.

Outcomes

The main objective of the proposed project is developing a dataset for poultry diseases diagnostics using machine learning. The specific objectives are:

  • Generation of Training/Testing dataset for poultry diseases diagnostics
  • Dataset curation
  • Diagnostics

The expected outcome of the proposed project is establishing an annotated dataset for poultry diseases diagnostics for small to medium scale poultry farmers.

The dataset will be shared on open access to the ML community on Kaggle datasets and used for teaching at different Machine Learning initiatives in Africa and globally. The different algorithms used on the project will be shared on GitHub to ensure the work is reproducible.

Long term vision

The proposed project will contribute to the efforts of modernizing the poultry sector in Tanzania4 by providing a data driven solution to support concentrated delivery of veterinary and extension services to smallholder poultry farmers. The dataset will be open access enabling the solution to be reproduced in other countries.

Personnel

  1. Hope Emmanuel Mbelwa- Project Lead, Nelson Mandela African Institution of Science and Technology (NM-AIST)
  2. Ezinne Nwankwo, Duke University
  3. Dr. Dina Machuve, Nelson Mandela African Institution of Science and Technology (NM-AIST)
  4. Dr. Neema Mduma, Nelson Mandela African Institution of Science and Technology (NM-AIST)
  5. Dr. Evarest Maguo, Elang’ata Agrovet Services

Description

The goal of this project is to find a drug which can be repurposed to effectively treat Leishmania. Specifically we aim to find a viable parasite protein-ligand pair.

We assume that it is possible to identify multiple strong candidate protein-ligand pairs by starting with  computationally analyzing the proteome of Leishmania, identifying the most promising targets and finding drugs binding to these targets.

Outcomes

The goal of this project is to start with applying existing techniques for protein-ligandprediction for proteins which are characteristic of tropical diseases starting with Leishmania.

These can include leveraging the PyRosetta library to test protein-ligand pairs for proteins where the 3D structure has been mapped while also looking into  techniques predicting ligand interaction from sequences. Furthermore, a host of new protein representation techniques has been developed by applying language modelling on millions of protein sequences.

These embeddings have been found to capture many biochemical properties. This project can explore the ability of these embeddings to predict interactions with ligands.

This direction can be expanded to finding other proteins tied with other neglected tropical diseases. A second step can be investigating which protein and ligand feature or neural representations are suitable to accelerate the process of matching proteins and ligands.

As part of the Indaba Grand Challenge, all data produced from this research will be made available to the public domain. We will also release our code source via the MIT license to facilitate further development of treatment options for rare diseases especially Leishmania.

Depending on the findings, the initial results can initially be published at a top-tier AI conference which usually has a “Machine Learning for the developing world” or “AI for social good workshop”. After initial feedback, we can target top tier bioinformatics publications such as Bioinformatics or PLOS Computational Biology.

Ambition

Out of the 13000 existing diseases known in the medical literature, roughly 5000 have available treatments with the remaining 8000 belonging to the rare disease category. By definition, rare diseases affect a smaller proportion of the global population and are therefore not the major targets of pharmaceutical R&D programs.

Many tropical diseases fall in this category where the market size may not justify costs incurred by de-novo drug development. For this category, drug repurposing seems like the best avenue to find treatment leveraging information about existing, approved drugs (around 1500 drugs) and the known 400 drug targets.

Depending on the approaches, finding a cure to a disease may either involve identifying molecules binding to proteins involved in a pathway responsible for a syndrom, proteins which are characteristic of the parasite/virus carrying the disease or affecting the vector of the diseases.

Many of these steps are built upon the backbone of predicting protein-ligand interactions to either match a protein with the ligands that can be bound to it or match a ligand to the protein it can bind to. Understanding suitable representations amenable to tackle this problem will give a framework widely applicable beyond the specific case of Leishmania.

Motivation

Globally, there are approximately 7% of the population suffering from Pneumonia (Ruuskanen et al. [2011]). Diagnosing pneumonia requires a review of chest radiography by specialized trained radiologists. Improving the efficiency and reach of diagnostic services is of great importance in developing countries.

Deep Neural Networks (DNN) have been experiencing a rapid, tremendous progress due to the availability of large datasets. DNNs such as ChexNet (Rajpurkar et al. [2017]), TieNet (Wang et al. [2018]) have been proposed by researchers to aid in early diagnosis and detection of pneumonia and there has been great success. These networks, however, have deep architectures resulting into large memory storage and computation requirements. And as such, poses a great challenge for portable medical devices and embedded systems.

Model compression is a technique for reducing such large computations and memory requirements. Various methods to do such have been proposed. Some of which include filter pruning (Li et al. [2016]), channel pruning (He et al. [2017]). Majority of these pruning techniques have been experimented on large DNN models for different task. To our knowledge, not much has been done for models trained for medical image detection.

Hence, the goal of this work is to build a model compression algorithm for ChexNet. The ChexNet network is chosen as the base model because it is the current state of the art technique in detecting Pneumonia on chest x-ray and as such, a reasonable choice.

Methodology

Most model compression process involves training a model, pruning the model and fine-tuning the model. In this work, we have an extra condition of using an embedded device that in a way simulates a low power medical diagnostic device. As such our proposed method involves a unified model compression algorithm.

We propose a two different component to the model compression. One component is weight pruning and the other part is structured pruning. The objective of the weight pruning is to minimize the loss function while satisfying the pruning criteria.

The pruning criteria are constraints and one of which is the absolute value of each wight meeting a particular threshold. We have a condition of implementing on a low powered device and so, while weight pruning may reduce the computation, the memory storage could still be large. Hence, the other component is structured pruning where we aim to compress the convolutional layer in the deep network as this is where the computational overhead lies.

Long-term vision

It is common in most African countries that diagnostic services are not easily accessible to low income earners. Hence this project has a long term vision of developing low powered embedded devices. Since these devices would involve the use of machine learning models, our methodology can be easily extended to various models for reducing memory storage and computation resources.

Importantly, our model compression algorithm would be used for multiple models that would be deployed to edge devices. Diagnosing skin diseases on mobile phone is part of the suite of projects the team is working on. Various projects within medical imaging is ongoing in the team, and hence, this model compression is an essential component for other projects in the pipeline.

 

Motivation

The internet is an important source of information for many people, and today’s social media platforms continue to shape how people access and act on health information.

Social media platforms serve as channels for top-down communication from health officials to the public, for peer sharing of health information across users, and for finding communities with shared health goals and challenges.

Nonetheless, along with these useful interactions come false information that has the potential to cause people to take harmful measures, reject factual updates from authorities, and upend the work of local health institutions.

Events such as the 2014 Ebola epidemic and today’s COVID-19 pandemic bring to light the need for social media platforms to facilitate access to accurate and reliable information.

In this project, we aim to use a mixed-method approach to study the use of social media as an information channel during the ongoing COVID-19 pandemic in Nigeria – what accounts shared news that was later found to be false, how did false news spread within the network before they were corrected, why do people share certain kinds of information, and what strategies can we learn to navigate the spread of health misinformation online within developing and under-developing countries.

Outcomes

The overall objective of this project is to study the dynamics of the spread of factual and false information in online social networks in Nigeria during a pandemic.

Our study will combine approaches in social network feature engineering and analysis, machine learning (ML), and natural language processing (NLP) with qualitative insights from social network users.

These findings will help online platforms, journalists, the general public, and health institutions in Nigeria identify ways that health misinformation is spread online and rethink what strategies can be employed to mitigate the danger it poses.

Our results will include code (in Python and/or R), social media data analyses, anonymized survey data, visualizations, a blog post, and a research publication.

We will release new code or point to existing open-source resources that will be used for our analyses. These will be hosted on Github to allow independent reruns. We will refer to the privacy policies of specific online communities regarding sharing identifiable data. We will release blog posts and visualizations with simple readable information for a wider audience.

We also aim to publish our findings at conference venues interested in the interaction between technology and society, and how both factors influence themselves e.g. CHI Human Factors in Computing Systems), CSCW (Computer Supported Cooperative Work), The Web Conference, WSDM (Web Search and Data Mining), etc.

Long term vision

We hope that this research will support on-the-ground healthcare work by helping to inform how workers interact with the public, and how to address the publics’ constantly changing perception of what is true or not.

We hope to contribute to the joint effort of journalists and government officials to stop the spread of the virus in Nigeria and other developing countries. Our approach will be useful for studying other forms of misinformation in future health crises and/or political events (i.e. elections).

User perceptions of these events are very much shaped by social media, however, is currently understudied in many African countries. Additionally, research on social media echo chambers and political polarization are widely studied in America but not in the African context.

Previous research in this space is focused on diseases like ebola1 but had not specifically focused on Africa2, or HIV but did not include a qualitative analysis3.

Description

Artificial intelligent system for predictors of early detection of maternal, neonatal and child health risks and their timely management.

Rationale

The idea we propose is to build an artificial intelligent (AI) system for informing on predictors of early detection of maternal, neonatal and child health risks and their timely management. Tanzania is among the countries with the highest maternal mortality rates (MMR) in the world. The estimated MMR according to the 2015-2016 Tanzania Demographics and Health Survey (DHS) was 556 per 100,000.

In fact, according to the Partnership for Maternal, Newborn and Child Health (PMNCH), the maternal mortality in Tanzania has changes only slightly over the years contrary to child mortality rates which were 99 deaths per 1000 in 1999 and had become 68 per 1000 by 2005. Therefore, much needs to be done in preventing maternal mortality.

The Ministry of Health Community Development Gender  Elderly and Children (MoHCDGEC) has the District Health Information System (DHIS-2) which digitizes health data at the district level, as well as the Integrated Diseases Surveillance and Response (IDSR) for capturing weekly data on key conditions and diseases. In addition, there is the National Bureau of Statistics (NBS), which conducts and supplies data from the Demographic and Health Survey (DHS). In this proposal we intend to make use of this data present in local, national and international databases and artificial intelligence (AI) tools to build decision-making support systems.

This will involve training AI system i.e. using machine learning algorithm to be able to identify predictors of early risk detection and early risk management. Prior research in integrating AI in health care system in detecting MMR have demonstrated that AI can bring paradigm shift in reducing MMR by predicting the pregnancy outcome.

In the Tanzanian context, the most important determinant of MMR is the timing of detection of risks and their timely management. Full understanding of this aspect will be vital to the fight to reduce MMR as well as neonatal and child deaths. Therefore, our central hypothesis is that computer-based decision procedures, under the broad umbrella of artificial intelligence (AI), can assist in reducing MMR and generally improving health care in poor resources environment through detection of predictors of early risk detection and management.

The uniqueness of our hypothesis is that it will address the crux of the maternal, neonate and child deaths problem, which is what causes untimely detection and management of risks? Understanding of the predictors will help in redesigning the health care practice, management and financing around this area. The decision support tools from this proposal will be applicable in a wider scale, from members of the households, to clinicians, researchers, policymakers and maternal, neonate and child health activists.

Outcomes

The project will involve developing two AI systems (1) An AI system to be used at National level in Tanzania and (2) AI system to be used at hospital level to predict individual cases. However, for this phase (Phase one) we will focus more in the first objective of developing AI system at National level.

Therefore, gathering of data will be divided in two phases. This phase (Phase one) will involve three National platform: District Health Information System (DHIS), Integrated Diseases Surveillance and Response (IDSR)and Demographic and Health Survey (DHS) which is under National Bureau of Statistics (NBS).

DHIS and IDSR were developed in silos and so they do not communicate, they have different sets of indicators. DHIS is under the custodian of the Ministry of Health Community Development Gender, Elderly and Children (MoHCDGEC) and is an electronic tool for digitizing data at the district level. While IDSR is used for capturing weekly data on key conditions and diseases.

Secondary data concerning maternal and neonatal and child risks filtered and cleaned from DHIS, IDSR and NBC will be generated. These data will be used to generate spectrum of factor and their weights for determining the timing of risk detection and their management at national level.

Moreover, in phase two of this project individual routine data to be collected from health facilities will be used to extract factors associated with MMR threat, in order to determine the likelihood of MMR maternal, neonatal and child health risks before and after the pregnancy. This will assist the hospital management to act and intervene at individual level.

Long-term vision

Once the models have been selected they will continue being tested with more incoming data. The second phase of this project will involve development of AI system of early detection of maternal, neonatal and child health risks at hospital level, which will also be integrated to the developed predictions models and data sources at national level.

If this pilot study will show positive results. A future project will involve testing and scale up the developed tool to be used as a control intervention schemes in other areas and even to scale up for other diseases.

Personnel

Dr. Gladness G. Mwanga holds a PhD in Information and Communication Science and Engineering focusing on decision-support tools using Machine learning. She is a mentor in Data science at NOTTECH Lab. In the past four years she has been working on a data science projects that gave her experience in building AI systems in solving various problems within the society. One of them she managed to develop machine learning models to predict decisions to be made by dairy farmers, identifying factors that can influence their decisions and forecast on farmers demands regarding to their specific needs or services in four Eastern Africa countries (Ethiopia, Kenya, Tanzania and Uganda). Gladness also has four years’ experience of working as a research assistant at The Nelson Mandela African Institution of Science and Technology and a consultant in developing ICT based platforms, visualization tools and oversee all activities to ensure a successful project. She’s is going to lead this project and assist in the development of an AI system.

Mr. Timothy Wikedzi is a Senior Software Engineer and a mentor at Nottech Lab, he has intensive experience in building and managing large scale software solutions. Since 2018 he has been a part of the Core team that built and support services for ShowClix Inc an Event Management and Ticketing company based in Pittsburgh USA. Prior to that Mr. Timothy has worked as Lead Tech Consultant in projects that helped to build tools and services for various Organizations in Tanzania, UK, and the USA. Timothy’s areas of interest are in building scalable solution, secured web applications, building fast and efficient systems, forming and leading teams behind software products. He  therefore brings in skills in system development.

Mr. Scott Businge is a Senior Software Engineer and a python mentor at NOTTECH lab, but He has specialized in DevOps and Software engineering with Python and Golang. He has diverse practical
experience and abilities both in software development and Operations. He is also, committed to automation, systems optimisation, security, immediate software delivery practices and monitoring processes. He has previously worked with big tech companies in Africa such Andela which offers world class Software Engineering solutions to clients around the world. He therefore brings in skills in system development using advanced python and launching of the system (DevOps).

Description

The goal of this project is to develop a computer-vision based non-intrusive automatic data collection mechanism to collect images and give insights about ecological succession on coral reefs in the Vamizi Island, allowing biologists to analyze data in real-time and infer on animals life story, behavior and population in Mozambican waters.

Rationale

Coral reefs are among the world’s most diverse ecosystems, with more than 800 species of corals providing habitat and shelter for approximately 25% of global marine life, although they cover less than 0.1% of the ocean floor. Coral reefs are also extremely valuable ecosystems providing livelihood for 1 billion people, and generate 2.7 trillions US Dollars from fisheries, coastal protection, tourism and recreation each year worldwide.

Nevertheless, coral reefs are rapidly declining due to various global and local factors such as overfishing, climate change, ocean acidification, pollution and unsustainable coastal development.

In this context, technological resources have been used for monitoring and analysing the state of coral reefs, and to allow biologists to obtain data in real-time to know about animals’ life story, behaviour, population, and survivorship, collecting valuable data that informs sound decision-making and management/conservation efforts.

Different studies show various approaches for collecting data for marine biodiversity conservation purposes, such as using Remotely Operated Vehicles, Autonomous Underwater Vehicles, and fixed underwater video cameras equipped with Video Analytics Services Platforms.

Most of these studies developed deep learning tools for rapid and large-scale automatic collection and annotation of marine data. However, these studies suggested that to improve current solutions, convolutional neural networks have to be optimised and backup power supplies must be improved.

Moreover, some studies also consider applying infrared cameras, which would enable night-time video capture to create a complete picture of the coral ecosystem. In Africa, however, little or no research has focused on these approaches to apply advanced technology to research marine ecology conservation.

Outcomes

In the long-term, resolving this question will help gain insight on the ecological processes around artificial reefs (particularly important in the context of the oil and gas developments occurring in Mozambique and which will warrant the implementation of reef restoration measures).

Further, this system will be helpful to develop many other research projects which require long periods of observation in remote reefs where permanent and nighttime access is limited. Additionally, this project will create capacity in the young mozambican research community regarding the application of Artificial Intelligence technologies to tackle marine conservation issues.

Vision

This project is an opportunity to pioneer the development of new technologies that will ultimately support conservation effort through enhanced data collection and processing.

The vision is to improve data collection capacity by building on top of already existing systems, namely by developing a different mechanism to provide power supply capable of maintaining such systems in coral reefs located more than a few kilometres from shore by using floating solar panels instead.

In the long-run, the project will be replicated for different coral reefs to allow biologists to obtain data in real-time and learn about animals’ life story, behaviour, and population dynamics. In addition, multiple units would be deployed at several locations to allow for more comprehensive research or monitoring reefs from various angles.

Personnel

Erwan Sola, PhD ( Project Lead), Investigator in the Marine Ecology Department, Faculty of Natural Science, Lúrio University, Mozambique. Experience in project coordination. Coral biology specialist. Extensive fieldwork experience on coral reefs. He will contribute to concept development, project coordination and ecological data analysis.

Luís Pina, MSc Computer Engineering Department, Faculty of  Engineering, Lúrio University, Mozambique. Luis Pina has his Master degree in Information Technology, with experience in developing classification models. He will contribute to this project through data pre-processing and developing classification models. Also, he will be involved in developing the object detection model.

Tiago Azevedo, PhD Candidate Department of Computer Science and Technology, University of Cambridge, United Kingdom. 4th-year Computer Science PhD student, with experience in developing Deep Learning and Machine Learning models in real-world settings. He will contribute to this project through support in coding the object detection model.

Lourenço Matandire , BSc Mechanical Engineering Department, Faculty of Engineering, Lúrio University, Mozambique. Lourenco Matandire is a Mechatronics Engineering that will be responsible for creating and providing the assessment to the Flexible Underwater Observatory (FUO) and managing its power supply.

Boaventura Manhique, BSc Computer Engineering Department, Faculty of Engineering, Lúrio University, Mozambique. Bonaventure Manhique is a Computer Engineering in Networking with a deep understanding of electronics. He will be responsible for maintaining and managing all means of communication and information sharing between the FUO and the biologists.

Abstract

According to the Open Data Barometer by the World Wide Web Foundation, countries in sub-Saharan Africa are ranked poorly with an average score of about 20 out of a maximum of 100 on open data initiatives based on readiness, implementation, and impact [1]. To make the processing of creation, introduction, and passage of parliamentary bills a force for public accountability, the information needs to be easier to analyze and process by the average citizen.

This is not the case for most of the bills introduced and passed by parliaments in Sub-Saharan Africa. In this work, we present a method to overcome implementation barrier. For the Nigerian parliament, we used a pre-trained optical character recognition tool (OCR), natural language processing techniques and machine learning algorithms to categorize congress bills. We propose to improve the work on the Nigeria parliamentary bills by using text detection models to build a custom OCR tool. We also propose to extend our method to three other African countries:  South Africa, Kenya, and Ghana.

Introduction

Given the challenges and precariousness facing developing and underdeveloped countries, the quality of policymaking and legislation is of enormous importance. This legislation can be used to impact the success of some of the United Nations Sustainable Development Goals (SDGs) like poverty alleviation, good public health system, quality education, economic growth and, sustainability. Targets 16.6 and 16.7 from the UN SDGs is to “develop effective, accountable, and transparent institutions at all levels” and to “ensure responsive, inclusive, participatory and representative decision making at all levels” [2]. For countries in Sub-Saharan Africa to meet this target, an open data revolution needs to happen at all levels of government and more importantly, at the parliamentary level.

Objectives and Expectations

To achieve the goal of meeting the UN SDG targets 16.6 & 16.7, making effective use of data is key. However, does such data currently exists? If so, how should it be organized in a framework that is amenable to decisionmaking process? Here, we propose expanding our work on categorizing parliamentary bills in Nigeria using Optical Character Recognition (OCR), document embedding and recurrent neural networks to three other  countries in Africa: Kenya, Ghana, and South Africa.

We also plan to improve our text extraction process by training a custom OCR using AI. The objective of this project is to generate semantic and structured data from the bills and in turn, categorize them into socio-economic driven labels. We plan to recruit three interns to work on this project for five months: two machine learning and one software engineering interns.

Conclusion and Long Term Vision

Our initial experimental results show that our model is effective for categorizing the bills which will aid our large scale digitization efforts. However, we identified a key remaining challenge based on our results. The output from the pre-trained OCR tool is not generally a very accurate representation of the text in the bills, especially for the low-quality PDFs. A fascinating possibility is to solve this by training our custom OCR which we proposed. The intensive acceleration of text detection research with novel deep learning methods can help us in this area.

Methods such as region-based or single-shot based detectors can be employed. In addition to this, we plan to use image augmentation to alter the size, background noise or color of the bills. A large scale annotation effort of the texts can be as the labels for us to train our custom OCR for text identification and named entity recognition. We are also extending our methodology to other countries in Sub-Saharan Africa. Results that lead to accurate categorization of parliamentary bills are well-positioned to have a substantial impact on governmental policies and on the quest for governments in low resource countries to meet the open data charter principles and United Nation’s sustainability development goals on open government.

Also, it can empower policymakers, stakeholders and governmental institutions to identify and monitor bills introduced to the National Assembly for research purposes and facilitate the efficiency of bill creation and open data initiatives. We plan to design an intercontinental tool that combines information from all bills and categories and make them easily accessible to everyone. For our long term vision, we plan to analyze documents on parliamentary votes and proceedings to give us more insight into legislative debates and patterns.

Abstract

To create an automatic data annotation tool and ground truth dataset for malaria diagnosis using deep learning. The ground truth dataset and the tool will streamline the development of AI tools for pathology diagnosis.

Introduction

Technology is transforming how health care is delivered in Africa, providing more people especially in limited resource setting areas and around the world access to better care. Likewise, easier access to data supports both doctors and policymakers in making better-informed decisions about how to continue to improve the health care system. However, the existing traditional methods especially for disease diagnosis have limitations such as expensive equipment, need of experts and time consuming for a single diagnosis. This becomes impractical in areas with high disease burden such as sub-Saharan regions. In this project we focus on improvement of malaria diagnosis.We choose malaria because it is a life threatening disease dominant in developing countries. According to WHO In 2017, nearly half of the world’s population was at risk of malaria with more than 90 countries reporting malaria cases and Africa was home to 435,000 death. Also they report that malaria kills a child every 2 minutes. Nevertheless, prompt diagnosis and treatment can reduce such death.

In the area of Artificial Intelligence (AI), several techniques have been adopted to create malaria diagnosis tools that are fast, accurate and requires less experts. Deep convolutional networks as one of AI techniques, has been used for detection of malaria parasites (Sanchez Sanchez, 2015). Concerning the sensitivity of health, AI tools dealing with health issues such as diagnosis usually require large amounts of data in order to achieve high accuracy for its applicability. However, in the context of developing countries there is a shortage of such data for research and developing such tools. Henceforth, there is a necessity of creation of dataset for research and development of pathology diagnosis tool such as for malaria.

Rationale

One of the major problems that hinders development of AI and its applicability in developing countries include lack of data. This is evident in limited access to the available data from both government and non-governmental organization. In addition to that, data may be available but still lacks the necessary quality in terms of pixels, labels that is required for development of AI tools. Lastly, in some scenarios such as agriculture and health there is no data to be used for training, testing and validation of AI tools. For these reasons, it becomes difficult and takes a longer time to create a comprehensive dataset. These problems regarding dataset, particularly in the health sector, cause a significant setback to the AI tools development which is a potential technology in solving problems in our health sector. Therefore, there is a need of  coming up with a tool for improving the entire process of acquiring dataset.

Main Objective

The aim of this project is to create AI tool that will be used to effectively create ground truth dataset for malaria diagnosis using deep learning. Specific Objectives:

  • To capture microscopic images of malaria parasitized and uninfected stained blood smear
    sample using a smartphone.
  • To develop automatic annotation tool for the captured images by integrating an open
    source annotation tool and object detection model.
  • To verify the effectiveness of the automatic annotation tool.

Abstract

To initiate a research roadmap for the preservation of indigenous languages through the means of collecting, categorizing and archiving of translation and voice synthesis to perform the automatic translation in official and indigenous languages.

Objectives

Build, Curate and Explore a massive dataset of public content in indigenous languages. The objective is to identify and enumerate data sources for retrieving content in a indigenous language, creating an open archive that can be leveraged in a variety of activities, including for training translation models to promote national languages, or for building vocal synthesizers to help distribute news content to illiterate citizens.

Initiate a research roadmap on translation and voice synthesis to promote indigenous languages through content sharing Preserving indigenous languages is a challenging endeavor which require first closing the information gap that may exist between official (mainly colonial) languages and indigenous languages. For  example, news content are abundant in official languages, while rural areas are provided with brittle summaries in indigenous languages. Artificial intelligence can help in closing the gap through automatic translation of texts and voice synthesis (to account for illiteracy). The project will initiate a state-of-the-art survey of available and missing components in the context towards realizing this endeavor.

Long-term vision

The long-term vision of the preservation project is to ensure that indigenous languages, hence the indigenous cultures, are sustained. To that end, the project investigates:

  • the means to systematize the collection and archiving the contents. This will ensure that all data are openly made available in readily processible formats and in a unique repository endpoint
  • the opportunity to perform automatic translation to ensure a back-and-forth exchange of viewpoints in official and indigenous languages
  • the democratization of information by the elite to the rural citizens who only speak indigenous languages.

This last point is the ultimate goal towards preserving indigenous languages by ensuring that the information gap is closed, thus realizing one objective of open data, which is to increase democratic participation via information.

Abstract

To initiate the collection and construction of a medicinal plant database on top of which a search engine and AI-based image recognition for plants to enable scalable search of preserved knowledge.

Objectives

Medicinal plant Database construction: collect image dataset + Enumerate, curate and associate labeling metadata about pharmaceuticals virtues. Building a comprehensive database requires that teams are spread out to investigate various sources of information (e.g., existing literature) as well as some well-known traditional healers to collect information and precisely label them. Then the collected data is merged and curated.

Develop and Operationalize a Detection and Search engine for medicinal plants. Leverage the built database to implement Artificial Intelligence technology for recognizing plants based on leaves photographs. Build an “Information Retrieval”-based search engine on top of natural language descriptions in the database to enable scalable search of preserved knowledge.

Long-term vision

As Amadou Hampâté Bâ 1960 said while addressing UNESCO members in 1960, “in Africa, when an old man dies, it’s a library burning”. This is particularly true today when we debate on the virtue of plants for disease therapy. A substantial amount of knowledge is being lost due to a lack of proper preservation in digital, searchable and reusable databases. With this project, we aim to make the preservation of ethnopharmacological Knowledge in the Sahel an ultimate target. To that end, we propose to initiate the construction of a medicinal plant database on top of which a search engine and an AI-based image recognition for plants could help serve a large panel of users.