The goal of this project is to find a drug which can be repurposed to effectively treat Leishmania. Specifically we aim to find a viable parasite protein-ligand pair.

We assume that it is possible to identify multiple strong candidate protein-ligand pairs by starting with  computationally analyzing the proteome of Leishmania, identifying the most promising targets and finding drugs binding to these targets.


The goal of this project is to start with applying existing techniques for protein-ligandprediction for proteins which are characteristic of tropical diseases starting with Leishmania.

These can include leveraging the PyRosetta library to test protein-ligand pairs for proteins where the 3D structure has been mapped while also looking into  techniques predicting ligand interaction from sequences. Furthermore, a host of new protein representation techniques has been developed by applying language modelling on millions of protein sequences.

These embeddings have been found to capture many biochemical properties. This project can explore the ability of these embeddings to predict interactions with ligands.

This direction can be expanded to finding other proteins tied with other neglected tropical diseases. A second step can be investigating which protein and ligand feature or neural representations are suitable to accelerate the process of matching proteins and ligands.

As part of the Indaba Grand Challenge, all data produced from this research will be made available to the public domain. We will also release our code source via the MIT license to facilitate further development of treatment options for rare diseases especially Leishmania.

Depending on the findings, the initial results can initially be published at a top-tier AI conference which usually has a “Machine Learning for the developing world” or “AI for social good workshop”. After initial feedback, we can target top tier bioinformatics publications such as Bioinformatics or PLOS Computational Biology.


Out of the 13000 existing diseases known in the medical literature, roughly 5000 have available treatments with the remaining 8000 belonging to the rare disease category. By definition, rare diseases affect a smaller proportion of the global population and are therefore not the major targets of pharmaceutical R&D programs.

Many tropical diseases fall in this category where the market size may not justify costs incurred by de-novo drug development. For this category, drug repurposing seems like the best avenue to find treatment leveraging information about existing, approved drugs (around 1500 drugs) and the known 400 drug targets.

Depending on the approaches, finding a cure to a disease may either involve identifying molecules binding to proteins involved in a pathway responsible for a syndrom, proteins which are characteristic of the parasite/virus carrying the disease or affecting the vector of the diseases.

Many of these steps are built upon the backbone of predicting protein-ligand interactions to either match a protein with the ligands that can be bound to it or match a ligand to the protein it can bind to. Understanding suitable representations amenable to tackle this problem will give a framework widely applicable beyond the specific case of Leishmania.