Visual Question Answering on Medical Images: Our system takes as input a medical image and a clinical relevant question and outputs the answer based on the visual content.


With the increasing interest in artificial intelligence (AI) to support clinical decision making and improve patient engagement, opportunities to generate and leverage algorithms for automated medical image interpretation are becoming increasingly more important.

Since patients in Africa may now access structured and unstructured data related to their health via patient portals, such access to medical AI assistants will likely improve the understanding of their condition based on their medical data.

The need for medical AI models is more profound when hospitals are not manned with medical specialists resulting in inaccurate diagnoses. For instance in Cameroon not all hospitals have a radiologist, a gynecologist or even worse a cardiologist. On several occasions during hospital visits, patients meet with the general practitioners who have no expert training.

At best, the practitioners refer patients to specialized doctors. Even then, scheduling an appointment with a specialized doctor is sometimes impossible as they tend to cater to a large number of patients. This inevitably decreases the chances for a correct diagnosis which can have fatal consequences.

Further, the clinician’s confidence in interpreting complex medical images can be significantly enhanced by a “second opinion” provided by an automated system. In addition, patients may be interested in the morphology/physiology and disease-status of anatomical structures around a lesion that has been well characterized by their healthcare providers – and they may not necessarily be willing to meet a specialist that they are not sure to see, to pay significant amounts for a separate office- or hospital visit just to address such questions.

Although some patients in Africa often turn to search engines (e.g. Google) to disambiguate complex terms or obtain answers to confusing aspects of a medical image, results from search engines may be nonspecific, erroneous and misleading, or overwhelming in terms of the volume of information and the plague of misinformation.

In this project we aim to build a medical AI assistant which has the potential to complement clinician’s diagnoses. We focus on radiology images and tackle four main categories:

  1. Modality, used in radiology to refer to the form of imaging e.g. CT scan, mammography;
  2. Plane is a radiographic positioning terminology which is used routinely to describe the position of the patient for taking various radiographs. e.g. longitudinal, coronal;
  3. Organ system refers to the different body organs e.g.lung for the link to COVID-19;
  4. Abnormality e.g. ectopic pregnancy, fat embolism. These categories are designed with different degrees of difficulty leveraging both classification and text generation approaches.


The goal of our project is to build a Visual Question Answering (VQA) model on medical images. Our system takes as input a medical image and a clinical relevant question and outputs the answer based on the visual content.

This project meets some of the sustainable development goals:

  1. Reduce the inequality among and within the Africa country (number 10) because it will no longer matter that the patient is in the place where there is no specialist, he/she could still have a solution to his/her problem;
  2. Ensure healthy lives and promote well-being for all ages (number 3). All young people could be able to use the system and if a patient is old even the generalist medical doctors could find a solution to the more specific problem the old person may be facing;
  3. Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation (number 9). Since the code of this project will be open source, this project will enhance scientific research in the domain of medicine in Africa.

Long-term vision

We are planning to open source this project at the end of the development, so that researchers in Africa and beyond could use it as a baseline. This could also encourage and open the path for more specialized data collection and the start of more in-depth research in the field of health. We believe our work could help Africa out of the bad health situation it is facing now.

The code will be well structured and made available for everybody on github at the end of the project (6 months after the beginning of the project). It will be easy to run (just run one line of code) and evaluate. We are committed to maintaining our github repo and address any issues that emerge from users.

This project will be presented at the Deep Learning IndabaX to be held in Cameroon. The team lead will be organising this event in 2021. We will also submit the paper of this project to Information Technology, Data science and Digital Health Summit and Expo conference and many other workshops to be well known by the African researchers and beyond the world.

By 2022, this system will be deployed to be used by medical experts, radiologists , to make a good and sure prediction for their patients and also allow patients who had their medical result to be able to be sure of their health situation.


  1. Volviane Saphir MFOGO, project lead, is a student at African Masters in Machine Intelligence (AMMI). She is a computer vision enthusiast with a background in computer science and mathematics. Graduated from African Institute of Mathematical Science in Cameroon.
  2. Dr. Georgia Gkioxari, coordinator was a lecturer at AMMI , she is a research scientist at FAIR. She received her PhD from UC Berkeley, working mainly on computer vision.
  3. Dr. Xinlei Chen, mentor. is a research Scientist at Facebook AI Research, he was a PhD student at Language Technology Institute , Carnegie Mellon University, working mainly on computer vision, computational linguistics and the combination of both.
  4. Jeremiah Fadugba, Core team member. M.Sc in Mathematical Sciences, African Institute for Mathematical Sciences (AIMS), Rwanda. Has 3 years experience as Machine Learning Engineer.