According to the Open Data Barometer by the World Wide Web Foundation, countries in sub-Saharan Africa are ranked poorly with an average score of about 20 out of a maximum of 100 on open data initiatives based on readiness, implementation, and impact . To make the processing of creation, introduction, and passage of parliamentary bills a force for public accountability, the information needs to be easier to analyze and process by the average citizen.
This is not the case for most of the bills introduced and passed by parliaments in Sub-Saharan Africa. In this work, we present a method to overcome implementation barrier. For the Nigerian parliament, we used a pre-trained optical character recognition tool (OCR), natural language processing techniques and machine learning algorithms to categorize congress bills. We propose to improve the work on the Nigeria parliamentary bills by using text detection models to build a custom OCR tool. We also propose to extend our method to three other African countries: South Africa, Kenya, and Ghana.
Given the challenges and precariousness facing developing and underdeveloped countries, the quality of policymaking and legislation is of enormous importance. This legislation can be used to impact the success of some of the United Nations Sustainable Development Goals (SDGs) like poverty alleviation, good public health system, quality education, economic growth and, sustainability. Targets 16.6 and 16.7 from the UN SDGs is to “develop effective, accountable, and transparent institutions at all levels” and to “ensure responsive, inclusive, participatory and representative decision making at all levels” . For countries in Sub-Saharan Africa to meet this target, an open data revolution needs to happen at all levels of government and more importantly, at the parliamentary level.
Objectives and Expectations
To achieve the goal of meeting the UN SDG targets 16.6 & 16.7, making effective use of data is key. However, does such data currently exists? If so, how should it be organized in a framework that is amenable to decisionmaking process? Here, we propose expanding our work on categorizing parliamentary bills in Nigeria using Optical Character Recognition (OCR), document embedding and recurrent neural networks to three other countries in Africa: Kenya, Ghana, and South Africa.
We also plan to improve our text extraction process by training a custom OCR using AI. The objective of this project is to generate semantic and structured data from the bills and in turn, categorize them into socio-economic driven labels. We plan to recruit three interns to work on this project for five months: two machine learning and one software engineering interns.
Conclusion and Long Term Vision
Our initial experimental results show that our model is effective for categorizing the bills which will aid our large scale digitization efforts. However, we identified a key remaining challenge based on our results. The output from the pre-trained OCR tool is not generally a very accurate representation of the text in the bills, especially for the low-quality PDFs. A fascinating possibility is to solve this by training our custom OCR which we proposed. The intensive acceleration of text detection research with novel deep learning methods can help us in this area.
Methods such as region-based or single-shot based detectors can be employed. In addition to this, we plan to use image augmentation to alter the size, background noise or color of the bills. A large scale annotation effort of the texts can be as the labels for us to train our custom OCR for text identification and named entity recognition. We are also extending our methodology to other countries in Sub-Saharan Africa. Results that lead to accurate categorization of parliamentary bills are well-positioned to have a substantial impact on governmental policies and on the quest for governments in low resource countries to meet the open data charter principles and United Nation’s sustainability development goals on open government.
Also, it can empower policymakers, stakeholders and governmental institutions to identify and monitor bills introduced to the National Assembly for research purposes and facilitate the efficiency of bill creation and open data initiatives. We plan to design an intercontinental tool that combines information from all bills and categories and make them easily accessible to everyone. For our long term vision, we plan to analyze documents on parliamentary votes and proceedings to give us more insight into legislative debates and patterns.