AI4D Posts

AI4D blog series: Improving Pharmacovigilance Systems using Natural Language Processing on Electronic Medical Records

This research focuses on enhancing Pharmacovigilance Systems using Natural Language Processing on Electronic Medical Records (EMR). Our major task was to develop an NLP model for extracting Adverse Drug Reaction(ADR) cases from EMR. The team was required to collect data from two hospitals, which are using EMR systems (i.e. University of Dodoma (UDOM) Hospital and Benjamin Mkapa (BM) Hospital). During data collection and analysis, we worked with health professionals from the two mentioned hospitals in Dodoma. We also used the public dataset from the MIMIC-III database. These datasets were presented in different formats, CSV for UDOM hospital and MIMIC III and PDF for BM hospital as shown on the attached file.

Team during an interview with Pharmacologist in BM hospital
Team during an interview with Pharmacologist in BM hospital

In most cases, pharmacovigilance practices depend on analyzing clinical trials, biomedical writing, observational examinations, Electronic Health Records (EHRs), Spontaneous Reporting (SR) and social media (Harpaz et al., 2014). As to our context, we considered EMR to be more informative compared to other practices, as suggested by (Luo et al., 2017). We studied schemas of EMRs from the two hospitals. We collected inpatients’ data since outpatients’ would have given the incomplete patient history. Also, our health information systems are not integrated, which makes it difficult to track patients’ full history unless patients were admitted to a particular hospital for a while. From all the data sources used there was a pattern of information that we were looking for, and this included clinical history, prior patient history, symptoms developed, allergies/ ADRs discovered during medication and patient’s discharge summary.

Much as we worked on UDOM and BM hospitals’ data, we encountered several challenges that made the team focus on MIMIC-III dataset while searching for an alternative way to our local data. Here were the challenges noted:

  • The reports had no clear identification of ADR cases.
  • In most cases, the doctor did not mention the reasons for changing a medicine on a particular patient which made it hard to understand whether the medication didn’t work well for a specific patient or any other reasons like adverse reaction.
  • The justification for ADR cases was vague
  • Mismatch of information between patients and doctors
  • The patients talk in a way that doctor can’t understand
  • There is a considerable gap between the health workers and regulatory authorities (They don’t know if they have to report for ADR cases)
  • The issue of ADR is so complex since there is a lot to take into account like Drug to Drug, Drug to food and Drug to herbal interactions.
  • There was no common/consistent reporting style among doctors
  • The language used to report is hard for a non-specialist to understand.
  • Some fields were left empty with no single information which led to incomplete medical history
  • The annotation process prolonged since we had one pharmacologist for the work.

After noticing all these challenges, the team carefully studied the MIMIC-III database to assess the availability of the data with ADR cases which would help to come up with the baseline model to the problem. We discovered that the NoteEvent table has enough information about the patient history with all clear indications of ADR cases and with no ADR see the text.

To start with, we were able to query 100,000 records from the database with many attributes, but we used a text column found in the NoteEvent table with the entire patient’s history including (patient’s prior history, medication, dosage, examination, changes noted during medications, symptoms etc.). We started the annotation of the first group by filtering the records to remain with the rows of interest. We used the following keywords in the search; adverse, reaction, adverse events, adverse reaction and reactions. We discovered that only 3446 rows contain words that guided the team in the labelling process. The records were then annotated with the labels 1 and 0 for ADR and non-ADR cases respectively, as indicated in the filtration notebook.

In analysing the data, we found that there were more non-ADR cases than ADR cases, in which non-ADR cases were 3199 and 228 ADR cases and 19 data rows not annotated. Due to high data imbalance, we reduced Non-ADR cases to 1000, and we applied sampling techniques (i.e upsampling ADR cases to 800) to at least balance the classes to minimize bias during modelling.

After annotation and simple analysis we used NLTK to apply the basic preprocessing techniques for text corpus as follows:-

  1. Converting the corpus-raw sentences to lower cases which helps in other processing techniques like parsing.
  2. Sentence tokenization, due to the text being in paragraphs, we applied sentence boundary detection to segment text to sentence level by identifying sentence starting point and endpoint.

Then we worked with regular contextual expressions to extract information of interest from the documents by removing some of the unnecessary characters and replacing some with easily understandable statements or characters as for professional guidelines.

We removed affixes in tokens which put words/tokens into their root form. Also, we removed common words(stopwords) and applied lemmatization to identify the correct part of speech(s) in the raw text. After data preprocessing, we used Term Frequency Inverse Document Frequency (TF-IDF) from scikit-learn to vectorize the corpus, which also gives the best exact keywords in the corpus.

In modelling to create a baseline model, we worked with classification algorithms using scikit-learn. We trained six different models which are Support Vector Machines, eXtreme Gradient Boosting, Adaptive Gradient Boosting , Decision Trees, Multilayer Perceptron and Random Forest  and then we selected three (Support Vector Machine, Multilayer Perceptron and Random Forest )models which performed better on validation compared to other  models for further model evaluation. We’ll also use the deep learning approach in the next phase of the project to produce more promising results for the model to be deployed and kept in practice. Here is the link to colab for data pre-processing and modelling.

From the UDOM database, we collected a total of 41,847 patient records in chunks of 16185, 18400, and 7262 from 2017 to 2019 respectively. The dataset has following attributes (Date, Admission number, Patient Age, Sex, Height(Kg), Allergy status, Examination, Registration ID, Patient History, Diagnosis, and Medication ), we downsized it to 12,708 records by removing missing columns and uninformative rows. We used regular contextual expressions to extract information of interest from the documents as for professional guidelines. The data cleaned and exchanged data formatting, analyzing and preparing data for machine learning was elaborated in this Colab link.

On the BM hospital, the PDF files extracted from EMS have patient records with the following information.

  1. Discharge reports
  2. Medical notes
  3. Patients history
  4. Lab notes

Health professionals on the respective hospitals manually annotated the labels for each document, and this task took most of our time in this phase of the project. We’re still collecting and interpreting more data from these hospitals.

The team organizes and extracts information from BM hospital PDF files by exchanging data formatting, analyzing and preparing data for machine learning. We experimented with OCR processing for PDF files to extract data, but we didn’t generate promising results as more information appeared to be missing. We therefore hard to programmatically remove content from individual files and align them to the corresponding professional provided labels.

The big lesson that we have learned up to now is that most of the data stored in our local systems are not informative. Policymakers must set standards to guide system developers during development and health practitioners when using the system.

Lastly but not least, we want to thank our stakeholders, mentors and funders for your involvement in our research activities. It is because of such a partnership we can be able to achieve our main goal.

Reposted within the project “Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa” #UnitedNations #artificialintelligence #SDG #UNESCO #videolectures #AI4DNetwork #AI4Dev #AI4D

AI4D blog series: Creating a ground truth dataset for malaria diagnosis in Tanzania

So why have we decided to collect malaria datasets to assist in developing a solution in its diagnosis? First, Malaria remains one of the significant threats to public health and economic development in Africa. Globally, it is estimated that 216 million cases of malaria occurred in 2017, with Africa bearing the brunt of this burden [5*]. In Tanzania, malaria is the leading cause of morbidity and mortality, especially in children under 5 years and pregnant women. Malaria kills one child every 30 seconds, about 3000 children every day [4*]. Malaria is also the leading cause of outpatients, inpatients, and admissions of children less than five years of age at health facilities [5*].

Second, the most common methods to test for malaria are microscopy and Rapid Diagnostic Tests (RDT) [1, 2]. RDTs are widely used, but their chief drawback is that they cannot count the number of parasites. The gold standard for the diagnosis of malaria is, therefore, microscopy. Evaluation of Giemsa-stained thick blood smears, when performed by expert microscopists, provides an accurate diagnosis of malaria [3].

Nonetheless, there are challenges to this method, it consumes a lot of time to perform one diagnosis, requires experienced technologists who are very few in developing countries, and manually looking at the sample via a microscope is a tedious and eye-straining process. We learned that although a microscopic diagnostic is a golden standard for malaria diagnosis, it is still not used in most of the private and public health centers. We realized that some of the lab technologists in health care are not competent in preparing staining reagents used in the diagnosis process. We had to create our own reagents and supply to them for the purpose of this research.

Artificial intelligence is transforming how health care is delivered across the world. This has been evident in pathology detection, surgery assistance and early detection of diseases such as breast cancer. However, these technologies often require significant amounts of quality data and in many developing countries, there is a shortage of this.

To address this deficiency, my team, composed of 6 computer scientists and 3 lab technologists, collected and annotated 10,000 images of a stained blood smear and developed an open-source annotation tool for the creation of a malaria dataset. We strongly believe the availability of more datasets and the annotation tool (for automating the labeling of the parasites in an image of stained blood smear) will improve the existing algorithms in malaria diagnosis and create a new benchmark.

In the collection of this dataset, we first sought and were granted ethical clearance from the University of Dodoma and Benjamin Mkapa Hospital’s research center. We have collected 50 blood smear samples for patients confirmed with malaria and 50 samples for negative confirmed cases. Each sample was stained by the lab technologist and 100 images were taken using iPhone 6S attached to a microscope. This led to having a total of 5000 images for the positive confirmed patients and 5000 imaged for the negative confirmed patient.

Through this work, we have had several opportunities including attending academic conferences and forming connections with other researchers such as Dr. Tom Neumark, a postdoctoral social anthropologist at the University of Oslo. Through our work, we also met Prof Delmiro Fernandes-Reyes, a professor of biomedical engineering. In a joint venture with Prof Delmiro Fernandes-Reyes, we submitted a proposal for the DIDA Stage 1 African Digital Pathology Artificial Intelligence Innovation Network (AfroDiPAI) at the end of November 2019.

We are also disseminating the results of our research. We have submitted an abstract (on the ongoing project) to two workshops (Practical Machine Learning in Developing Countries and Artificial Intelligence for Affordable Health) for the 2020 ICLR conference in Ethiopia, and it has been accepted to be presented as a poster. We were also delighted to get very constructive feedback from reviewers of the conference and look forward to incorporating them as we continue with the projects and final publication.

The next stage will be to start using our data and train deep learning models in the development of the open-source annotation tool. At the same time, together with the AI4D team, we are looking for the best approach to follow when releasing our open-source dataset in the medical field.

But our overall aim is to develop a final product of our mobile application that will assist lab technologist in Tanzania and beyond in the onerous work of diagnosis malaria. We have already met many of these technologists who are not only excited and eagerly awaiting this tool, but generously helped us as we have gone about developing it.

Links

[1] B.B. Andrade, A. Reis-Filho, A.M. Barros, S.M. Souza-Neto, L.L. Nogueira, K.F. Fukutani, E.P. Camargo, L.M.A. Camargo, A. Barral, A. Duarte, and M. Barral-Netto. Towards a precise test for malaria diagnosis in the Brazilian Amazon: comparison among field microscopy, a rapid diagnostic test, nested PCR, and a computational expert system based on artificial neural networks. Malaria Journal, 9:117, 2010.

[2]Maysa Mohamed Kamel, Samar Sayed Attia, Gomaa Desoky Emam, and Naglaa Abd El Khalek Al Sherbiny, “The Validity of Rapid Malaria Test and Microscopy in Detecting Malaria in a Preelimination Region of Egypt,” Scientifica, vol. 2016, Article ID 4048032, 5 pages, 2016. https://doi.org/10.1155/2016/4048032.

[3]Philip J. Rosenthal​*, “How Do We Best Diagnose Malaria in Africa?”: https://doi.org/10.4269/ajtmh.2012.11-0619

[12] UNICEF 2018 Report.   The urgent need to end newborn deaths. The reality of Malaria Summary https://www.unicef.org/health/files/health_africamalaria.pdf

[13]WHO malaria 2018 report. Retrieved on 1st March 2019 from  https://apps.who.int/iris/bitstream/handle/10665/275867/9789241565653-eng.pdf?ua=1

Reposted within the project “Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa” #UnitedNations #artificialintelligence #SDG #UNESCO #videolectures #AI4DNetwork #AI4Dev #AI4D

AI4D blog series: A Computer Vision Tomato Pest Assessment and Prediction Tool

A high yielding crop such as tomato with high economic returns can greatly increase smallholder farmers income when well managed. however, it is apparently constrained by the recent invasion of tomato pest Tuta absoluta that is devastating tomato yield. Look at tomato field situation in highly affected areas of arush [Arusha- mp4 video] and Morogoro regions.

Denis Pastory, team selfie – researcher and field assistant in the field.
Denis Pastory, team selfie – researcher and field assistant in the field.

To tackle this challenge, our work focus on an early detection and control measure initiatives in order to strengthen phytosanitary capacity and systems to help solve Tuta absoluta devastation using computer vision technique. It should be noted that Tuta absoluta control still rely on low-speed inefficient manual identification and a few on the support of limited number of agriculture extension officers.

Our initial works involved field work and in-house experiments to collect data in areas that are mostly affected by Tuta absoluta. We collected image data in Arusha and Morogoro regions of Tanzania.

Fig: Image of the P.I in one of in-house experiment site in Arusha.
Fig: Image of the P.I in one of in-house experiment site in Arusha.

As for any computer vision task, getting the right images for the task at hand is sometimes challenging. Regarding our use case, we had to generate our own image data. To accumulate enough data for model training, we have been collecting data since June 2018 and have had four (4) in-house experiments in the target areas. The whole data collection process is shown in this link.

The data collection process involved taking images of tomato inoculated with Tuta absoluta larvae for the first two (2) weeks of tomato growth since transplanting date. Images were taken for each plant on a daily basis. These images are RGB (Red, Green, Blue) photos of high and low resolutions. In order to acquire high resolution images, we used Canon EOS KISS X7 camera with a resolution of 5184 x 3456 pixels and we used mobile phone camera (set to low resolution).

For our previous first in-house experiment, we had encountered some challenge with the data collection process. The inoculated tomatoes were tagged with a red ribbon. Tagging species or target organisms is a common practice in fields such as entomology. We came to realize, that these tagged images couldn’t be included in the dataset for training our models and therefore had to exclude them from our model.

To meet our objectives, we worked on Convolution Neural Network (CNN) based model for a binary classification that could be able to identify tomatoes affected and not affected by Tuta Absoluta using the state-of-art of CNN architectures (VGG16, VGG19, ResNet50, InceptionV3). The results of this task were promising. Primary preprocessing tasks were limited to selecting the suitable images for training CNN model.

We are certain that the images we collected represented real images of small scale farmers’ fields. The images collected had more images with healthy tomato leaves than those inoculated with Tuta absoluta which implies  data imbalance. To reduce the bias our CNN model may encounter towards images with no Tuta absoluta samples, the number of samples per class were selected to create  balanced classes during model training.

The main aim of the image data collection process was expected to cover the main tomato growing regions in Tanzania affected mostly by Tuta absoluta, though we ended up obtaining data from only two main areas. Our team is certain that the collected data can be a representative case covering Tanzania situation. Also we had to adopt to local agronomic practices of the two areas.

For instance, we collected data of the commonly grown tomato varieties. The in-house experiment was also carried out following the cropping calendar of the respected two regions. To cover the main two growing season in Arusha, we had to carry out three experiments and one experiment in Morogoro.

During CNN model training, following a typical early detection of pest or disease model approach, we managed to focus on identification of affected and none affected plants. We have successfully been able to develop this type of binary classification model to identify tomato affected by tuta and not affected by tuta.

We further, developed another multiclass classification, that would be used to classify tomato affected at mainly three levels of damage i.e. low, high and no damage. This approach gave us a much better sense of the original idea we had. The model results showed us that to meet an early detection system in determining damage at early stage, a typical quantification based model is much better than a binary classification model.

For instance, results of the multiclass model showed us that tomatoes that are highly damaged are easily identified compared to lowly damage tomato. In such case, it would be best to identify tomato damage at early stage i.e at low damage level in order to enhance early control measures for Tuta absoluta.

And this point, we are to redefine the model classification approach. Since the objective is early identification and if a simple classification model cannot perform such a task, this puts us at risk. With that in mind, we are further working on models that can identify Tuta absoluta mine density, a quantification method based on instance segmentation.

Reposted within the project “Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa” #UnitedNations #artificialintelligence #SDG #UNESCO #videolectures #AI4DNetwork #AI4Dev #AI4D

New podcast by Wale Akinfaderin on AI for parliamentary documents in Nigerian

Adewale Akinfaderin
Adewale Akinfaderin

Wale Akinfaderin is a K4A grantee within the AI4D programme working on Predicting and Analyzing Law-Making in Kenya has delivered a podcast for an episode of “I Am Change” podcast series with Korede Azeez. The interview can be found on Soundcloud and Apple podcast.

The project proposes expanding a framework on categorizing parliamentary bills in Nigeria using Optical Character Recognition (OCR), document embedding and recurrent neural networks to three other countries in Africa: Kenya, Ghana, and South Africa.

His work was accepted at 4th Widening NLP Workshop, Annual Meeting of the Association for Computational Linguistics, ACL 2020.

AI4D blog series: Preservation of Indigenous Languages

Context

In most African countries, perhaps more so in Africa than elsewhere, the majority of the populations do not speak the official languages; instead, they speak traditional languages. In some countries, this proportion is as high as 80%. Because of this language barrier, this large part of the population is practically excluded from the march of society: they have no access to information or education and cannot really participate in the debates on the socio-economic development of their country.

From another point of view, our values, cultures, knowledge of all kinds and history are conveyed orally in these languages and thus remain inaccessible to the rest of the world.

Objectives

The main objective of the Preservation of Indigenous Languages project is to contribute to the preservation of local languages and the enhancement of local language content through (1) archiving, (2) promotion and (3) popularization of local language content. Archiving will make it possible to preserve content and knowledge in local languages. We will collect and use existing data in local languages for this purpose. The promotion will be done by exploiting the richness of this local language content. And popularisation will be made possible by making this content accessible in the official languages. In order to achieve these objectives, our project is divided into three parts, all of which have an important upstream data collection and pre-processing stage:

  • Transcription from local languages to text in local languages
  • Translation from local languages to official languages (French) and vice versa
  • Voice synthesis of texts in local languages into audios in local languages.

Team

To successfully carry out the project, we have set up a dedicated team of 10 people:

  • A research mentor with a background in AI,
  • Two practice mentors with a background in local languages. The first one is a specialist of education in local languages and the second one is with various works in translation from French to Moore, the main local language in Burkina Faso.
  • A research assistant with a background in linguistic. In this case, the assistant was a student whose responsibility was to help on the collection of content in languages, pre-treatement of data,
  • Three computer programmers. In this case, the programmers were computer science students (master and PhD students). Each of them has in charge one of the three parts of the project plus some pretreatment tasks.

Implementation

For this project, we limited ourselves to one local language, Mooré. This language is the main language of Burkina Faso and is spoken by more than half of the population. There are also many sources of data in this language and important work has already been done on translations from French into this language, especially in the educational and religious fields.

(0) Data Collection: As announced, data collection is an important and necessary step for the different parts of the project. It is also one of the most difficult steps. The opening of data is not yet compulsory in our countries.

With the invaluable help of practice mentors, meetings were organised with the main institutions, both public and private, to explore existing data and the extent to which these data could be exploited.

Among the institutions that were contacted, the main ones are the following:

  • Fondation pour le Développement Communautaire/ Burkina Faso(FDC-BF);
  • the biblical alliance of Burkina Faso;
  • Fonds pour l’alphabétisation et l’éducation non formelle (FONAENF);
  • The Directorate of Research in Non-Formal Education (DRENF);
  • The DPDMT;
  • Ecole et langue nationale en Afrique (ELAN);
  • Savane Media.

We were thus able to access a certain amount of data but not always in digital format or not always complete. This required an enormous amount of pre-processing work either to put the data in digital format or to complete it either with translations or transcriptions.

One of the first sources of data we had access to was the Moore Bible in text and audio. It is this source that was also used after pre-processing (audio cutting sentence by sentence or verse by verse, alignment of Moore and French texts) for the first tests for the different parts of the project.

The collection and pre-processing work is still in progress to enrich our data sources and improve our models.

(1) Transcription: Since writing is not yet very popular in our local languages, we have a large amount of data in local languages in audio format. In addition, people who cannot write will always use oral communication to express themselves. The step of transcribing the audio content into local languages is an essential step to not only collect existing information but also to gather what people have to say.

After a state of the art and testing of existing transcription tools, the student in charge of this part implemented his transcription model based on the DeepSeepch tool. He uses data from the bible for these tests. In addition to the workload for pre-processing and the working conditions made a bit difficult because of the Covid19 pandemic, we unfortunately had problems with computing capacity and are working with one of the partners to increase the capacities of the leased Virtual Machines.

(2)  Translation: Translation is at the heart of this project. It aims to make official language information accessible to people in rural areas but also to provide access to the wealth of local language content.

The student in charge of this component has, after a state of the art of existing translation approaches, applied classical neural machine translation techniques on bible data using OpenMT. But the results were not very good as one could expect given the lack of training data. So he is now implementing meta-learning using the Meta-NMT tool. Meta-learning has been described in the literature as performing better than the classical approach when there is little data.

Here, too, in addition to the need for more data, we face a need for computing capacity that should also be resolved with the provision of VMs.

(3) Voice synthesis: Voice synthesis will make it possible, after translation from the official languages into local languages, to make the content available to populations who cannot read but who will be able to have it in audio format. The student in charge of this part also carried out a state of the art of existing tools in this field. He is currently testing different tools and studying different models. He, unfortunately, started with a little delay but will continue his work in order to be able to adapt a model and to make tests with the collected data in order to be able to carry out the vocal synthesis of the text in mooré audio.

Results

At this stage, while we just crossed the mid-term of the project execution, we can report that a number of milestones have been achieved:

  • Data collection has been done and is still ongoing.
  • Pre-processing of audio and text content as well as audio and text mapping in Mooré and alignment of text in Mooré et al correspondence in French have been performed.
  • A transcription model for Mooré to French based on deepSpeech has been implemented.
  • The classical translation has been implemented and tested on the Bible dataset

Main challenges

Access to Data

After going through about ten structures, we were confronted with the availability of resources. Indeed, apart from the Bible, some training materials and official documents translated, there were very few documents available in Moore and French.

The structures that produce Moore content, most often do so for training or awareness-raising for the illiterate population. As a result, they do not produce the same content in French. As for radio and television channels, they have interventions directly in Moore, without written notes, even for the presentation of the television news.

However, we found a lot of printed material, without digital versions and only in Moore. For this phase of the project, we collected and carried out the alignment for the already existing data in both languages in digital format. This allowed us to test the model, and although it did not lead to conclusive results, we did identify the problem of data availability. For further work, we plan to translate the existing documents into Moore so that we have both versions to continue the work. We are aware that this is a long term work, but it is the indispensable condition to have enough data to make the results of the algorithms interesting.

Copyright

A second problem we encountered was copyright. Indeed, we do not always have direct access to the authors, and the holders of the documents are reserved to share them without their agreement. In other cases, the documents had been commissioned by international organizations. It was therefore necessary for our interlocutors here to have the agreement of these institutions before giving us access to the data. This takes time and has delayed access to the working data.

In the long term, we plan to bring together a group of authors to raise their awareness of the project so that they can facilitate advocacy for the project.

Computing capacity

We unfortunately do not have a laboratory equipped with servers powerful enough to run our models. Our partnership with Anptic was supposed to allow us to use VMs with greater capacity to go faster in testing, but the administrative burden also delayed the availability of VMs.

Reposted within the project “Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa” #UnitedNations #artificialintelligence #SDG #UNESCO #videolectures #AI4DNetwork #AI4Dev #AI4D

New call for AI4D innovation grants open now

Deep Learning Indaba 2019, Nairobi, Kenya
Deep Learning Indaba 2019, Nairobi, Kenya

Knowledge 4 All Foundation partnered with the Deep Learning Indaba to fund research projects across Africa that are collaborative at heart and have a strong development focus.

This Call for Proposals invites individuals, grassroots organizations, initiatives, academic, and civil society institutions to apply for funding for mini-projects.

A mini-project could also be early-stage research around our Grand Challenge of curing leishmaniasis.

AI4D blog series: Building a Medicinal Plant Database for Facilitating the Exploitation of Local Ethnopharmacological Knowledge

Context

In many African countries such as Burkina Faso, people still rely quite often on traditional medicine for both common and uncommon diseases. This is particularly true in rural areas where 71% of the Burkinabe people live. While the research literature acknowledges the pharmacological virtues of some plants, the relevant knowledge is neither sufficiently organized nor widely shared.

Objectives

The ultimate goal of this project is to build an open and searchable database on medical plants. To that end, the project focuses on (1) collecting a variety of information on such plants from diverse sources, (2) implementing a platform to expose the constructed knowledge, (3) develop context-specific tools to accelerate the accurate identification of plants in the wild.

Team

To successfully carry out the project, we have set up a dedicated team of 10 people:

  • A research mentor with a background in AI,
  • A practice mentor with a background in traditional medicine. In this case, the mentor happened to be the director of the promotion of traditional medicine at the Ministry of Health,
  • A research assistant with a background in Sociology. In this case, the assistant was a student whose responsibility was to help on the collection of ethnobotanical data,
  • Three computer programmers. In this case, the programmers were computer science students who were tasked to devise and implement the database, the search engine as well as the plant identification tool.

And four investigators to collect data on the virtues of plants

Implementation

(1) Data collection: Work sessions with the practice mentor allowed us to devise an adapted methodology and identify data sources.

The adopted methodology consists of drawing a list of plants based on relevant research literature and leveraging online databases. Then, the team can conduct an ethnobotanical study with traditional medicine practitioners to gather information on the uses of plants for therapeutic purposes. For each plant, we agreed to focus on the following information:  Scientific name, Species, Family, Name in three local languages (Moore, Dioula, Fulfulde), Spatial location,  Status (endangered or not), medical use (virtues).

The data collection is mainly performed in the two largest cities in the country, namely Ouagadougou and Bobo-Bobo-Dioulasso. In the implementation of the activities, we were surprised by the amount of research that has already been done on medicinal plants, although the data is not sufficiently structured and shared. In addition, we discovered that both at the level of traditional practitioners as well as the state, there are actions being structured for the valorization of traditional medicine. Our project, therefore, reinforces the existing mechanism. In the continuation of the activities, in addition to plants, we plan to create a database of traditional practitioners. In order to be able to reference them more easily in the research works that are carried out.

(2) Platform development: With respect to the platform, we leverage the ElasticSearch engine to build the backend database and search engine.

(3) Plant detector implementation: We also devised a deep learning system to classify plant leaf images for fast identification in the wild. This work required contextualization as we supposed that users will carry mobile phones with little computing power and potentially no data network connectivity. Thus we implemented a neural network model compression algorithm that yielded a classifier with reasonable prediction accuracy and yet was runnable on low-resource devices.

Results

At this stage, while we just crossed the mid-term of the project execution, we can report that a number of milestones have been achieved:

  • the plant detector has been implemented
  • the first batch of medicinal plant dataset has been collected
  • the platform backend architecture has been finalized

Reposted within the project “Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa” #UnitedNations #artificialintelligence #SDG #UNESCO #videolectures #AI4DNetwork #AI4Dev #AI4D

Mini-documentary on Artificial Intelligence 4 Development

Mini-documentary on AI4D Artificial Intelligence 4 Development
Mini-documentary on AI4D Artificial Intelligence 4 Development

We produced a mini-documentary describing the ideas, aspirations, and research potential of our African colleagues in the field of Artificial Intelligence.

The footage was taken at the kick-off of the workshop Organized by K4A, IDRC, SIDA at workshop “Toward a Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa”, Nairobi, Kenya, April 2019 @IDRC_CRDI #UnitedNations #artificialintelligence #SDG #UNESCO #videolectures #AI4DNetwork #AI4Dev #AI4D

The emerging network of machine learning and AI practitioners and researchers undertaking a collaborative roadmap for AI for Development in Africa. The three-day workshop zoomed in on three critical areas of 1) policy and regulations, 2) skills and capacity building and 3) the application of AI in Africa.

 

ICLR presentations of AI4D mini-grants

As part of the AfricaNLP – Unlocking Local Languages workshop, we hosted a number of projects working in Artificial Intelligence in Africa for Development, funded via IDRC grants.

K4A grant to solve access to Nigeria’s legislative bills with AI

AI4D mini-grants presentations, Nairobi 2019
AI4D mini-grants presentations, Nairobi 2019

K4A grant recipients Adewale Akinfaderin, Olamilekan Wahab and Olubayo Adekanmb, are successfully using Artificial Intelligence to digitize parliamentary bills in Sub-Saharan Africa and Specifically in Nigeria. Read their recent interview in the Techpoint.Africa article.