Knowledge 4 All Foundation Concludes Successful Collaboration with European AI Excellence Network ELISE

Knowledge 4 All Foundation is pleased to announce the successful completion of its participation in the European Learning and Intelligent Systems Excellence (ELISE) project, a prominent European Network of Artificial Intelligence Excellence Centres. ELISE, part of the EU Horizon 2020 ICT-48 portfolio, originated from the European Laboratory for Learning and Intelligent Systems (ELLIS) and concluded in August 2024.

The European Learning and Intelligent Systems Excellence (ELISE) project, funded under the EU's Horizon 2020 programme, aimed to position Europe at the forefront of artificial intelligence (AI) and machine learning research
The European Learning and Intelligent Systems Excellence (ELISE) project, funded under the EU’s Horizon 2020 programme, aimed to position Europe at the forefront of artificial intelligence (AI) and machine learning research

Throughout the project, Knowledge 4 All Foundation collaborated with leading AI research hubs and associated fellows to advance high-level research and disseminate knowledge across academia, industry, and society. The Foundation contributed to various initiatives, including mobility programs, research workshops, and policy development, aligning with ELISE’s mission to promote explainable and trustworthy AI outcomes.

The Foundation’s involvement in ELISE has reinforced its commitment to fostering innovation and excellence in artificial intelligence research. By engaging in this collaborative network, Knowledge 4 All Foundation has played a role in positioning Europe at the forefront of AI advancements, ensuring that AI research continues to thrive within open societies

Knowledge 4 All Foundation Completes Successful Engagements in European AI Excellence Network HumanEAI-Net

Knowledge 4 All Foundation (K4A) is pleased to announce the successful completion of its engagements in two prominent European Networks of Artificial Intelligence (AI) Excellence Centres: the HumanE AI Network. These initiatives have been instrumental in advancing human-centric AI research and fostering collaboration across Europe.

Both HumaneAI-Net and ELISE were part of the H2020 ICT-48-2020 call, fostering AI research excellence in Europe.
The HumaneAI-NetE was part of the H2020 ICT-48-2020 call, fostering AI research excellence in Europe

The HumanE AI Network, comprising leading European research centres, universities, and industrial enterprises, has focused on developing AI technologies that align with European ethical values and societal norms. K4A’s participation in this network has contributed to shaping AI research directions, methods, and results, ensuring that AI advancements are beneficial to individuals and society as a whole.

K4A remains committed to advancing AI research and development, building upon the foundations established through these collaborations. The foundation looks forward to future opportunities to contribute to the global AI community and to promote the responsible and ethical development of AI technologies.

Knowledge 4 All Foundation Acknowledged by Masakhane Research Foundation in Groundbreaking NLP Publications

The Knowledge 4 All Foundation is proud to have been acknowledged by the Masakhane Research Foundation in their recent influential publications advancing Natural Language Processing (NLP) for African languages. These publications highlight the Foundation’s pivotal contributions to developing datasets and fostering AI innovation across the continent.

  1. “A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation” (NAACL 2022)
    This paper explores how a small number of translations can significantly enhance pre-trained models for African news translation, addressing the scarcity of African-language datasets. Read the full paper here.
  2. “MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition” (EMNLP 2022)
    This work presents MasakhaNER 2.0, a model leveraging Africa-centric transfer learning techniques for Named Entity Recognition (NER) in African languages, providing a vital resource for African NLP tasks. Read the full paper here.
  3. “MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages” (ACL 2023)
    This research introduces MasakhaPOS, which addresses the challenges of Part-of-Speech (POS) tagging in the diverse and underrepresented African linguistic landscape. Read the full paper here.

These projects, made possible through collaborative efforts and the contributions of Knowledge 4 All Foundation, have significantly advanced NLP for African languages, paving the way for inclusive and representative AI solutions.

The Foundation expresses its gratitude to Masakhane Research Foundation and remains committed to supporting initiatives that promote linguistic diversity, inclusivity, and technological progress for African communities. Together, these partnerships exemplify the power of global collaboration in driving impactful AI research and development.

AI4D blog series: The First Tunisian Arabizi Sentiment Analysis Dataset

Motivation

On social media, Arabic speakers tend to express themselves in their own local dialect. To do so, Tunisians use “Tunisian Arabizi”, which consists in supplementing numerals to the Latin script rather than the Arabic alphabet.

In the African continent, analytical studies based on Deep Learning are data hungry. To the best of our knowledge, no annotated Tunisian Arabizi dataset exists.

Twitter, Facebook and other micro-blogging systems are becoming a rich source of feedback information in several vital sectors, such as politics, economics, sports and other matters of general interest. Our dataset is taken from people expressing themselves in their own Tunisian Dialect using Tunisian Arabizi.

TUNIZI is composed of one instance presented as text comments collected from Social Media, annotated as positive, negative or neutral. This data does not include any confidential information. However, negative comments may include offensive or insulting content.

TUNIZI dataset is used in all iCompass products that are using the Tunisian Dialect. TUNIZI is used in a Sentiment Analysis project dedicated for the e-reputation and also for all Tunisian chatbots that are able to understand the Tunisian Arabizi and reply using it.

Team

 TUNIZI Dataset is collected, preprocessed and annotated by iCompass team, the Tunisian Startup speciallized in NLP/NLU. The team composed of academics and engineers specialized in Information technology, mathematics and linguistics were all dedicated to ensure the success of the project. iCompass can be contacted through emails or through the website: www.icompass.tn

Implementation

  1. Data Collection: TUNIZI is collected from comments on Social Media platforms. All data was directly observable and did not require other data to be inferred from. Our dataset is taken from people expressing themselves in their own Tunisian Dialect using Arabizi. This dataset relates directly to Tunisians from different regions, different ages and different genders. Our dataset is collected anonymously and contains no information about users identity.
  2. Data Preprocessing & Annotation: TUNIZI was preprocessed by removing links, emoji symbols and punctuation. Annotation was then performed by five Tunisian native speakers, three males and two females at a higher education level (Master/PhD).
  3. Distribution and Maintenance: TUNIZI dataset is made public for all upcoming research and development activitieson Github. TUNIZI is maintained by iCompass team that can be contacted through emails or through the Github repository. Updates will be available on the same Github link.
  4. Conclusion: As the interest in Natural Language Processing, particularly for African languages is growing, a natural future step would involve building Arabizi datasets for other underrepresented north African dialects such as Algerian and Moroccan.

AI4D blog series: Building a Data Pipeline for a Real World Machine Learning Application

We set out with a novel idea; to develop an application that would (i) collect an individual’s Blood Pressure (BP) and activity data, and (ii) make future BP predictions for the individual with this data.

Key requirements for this study therefore were;

  1. The ability to get the BP data from an individual.
  2. The ability to get a corresponding record of their activities for the BP readings.
  3. The identification of a suitable Machine Learning (ML) Algorithm for predicting future BP.

Pre-test the idea – Pre testing the idea was a critical first step in our process before we could proceed to collect the actual data. The data collection process would require the procurement of suitable smart watches and the development of a mobile application, both of which are time consuming and costly activities. At this point we learnt our first lessons; (i) there was no precedence to what we were attempting and subsequently (ii) there were no publicly available BP data sets available for use in pre-testing our ideas.

Simulate the test data – The implication therefore was that we had to simulate data based on the variables identified for our study. The variables utilized were the Systolic and Diastolic BP Reading, Activity and a timestamp. This was done using a spreadsheet and the data saved as a comma separate values (csv) file. The csv is a common file format for storing data in ML.

Identify a suitable ML model – The data simulated and that in the final study was going to be time series data. The need to predict both the Systolic and Diastolic BP using previous readings, activity and timestamps meant that we were was handling a multivariate time series data. We therefore tested and settled on an LSTM model for multivariate time series forecasting based on a guide by Dr Jason Browniee (https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/)

Develop the data collection infrastructure – There being no pre-existing data for the development implied that we had to collect our data. The unique nature of our study, collecting BP and activity data from individuals called for an innovative approach to the process.

  • BP data collection – for this aspect of the study we established that the best way to achieve this would be the use of smart watches with BP data collection and transmission capabilities. In addition to the BP data collection, another key consideration for the device selection was affordability. This was occasioned both by the circumstances of the study, limited resources available and more importantly, the context of use of a probable final solution; the watch would have to be affordable to allow for wide adoption of the solution.

The watch identified was the F1 Wristband Heart and Heart Rate Monitor.

  • Activity data collection – for this aspect of the study a mobile application was identified as the method of choice. The application was developed to be able to receive BP readings from the smart watch and to also collect activity data from the user.

Test the data collection – The smart watch – mobile app data collection was tested and a number of key observations were made.

  • Smart watch challenges – In as much as the watch identified is affordable it does not work well for dark skinned persons. This is a major challenge given the fact that a majority of people in Kenya, the location of the study and eventual system use, are dark skinned. As a result we are examining other options that may work in a universal sense.
  • Mobile app connectivity challenges – The app initially would not connect to the smart watch but this was resolved and the data collection is now possible.

Next Steps

  • Pilot the data collection – We are now working on piloting the solution with at least 10 people over a period of 2 – 3 weeks. This will give us an idea on how the final study will be carried out with respect to:
  1. How the respondents use the solution,
  2. The kind of data we will be able to actually get from the respondents
  3. The suitability of the data for the machine learning exercise.
  • Develop and Deploy the LSTM Model – We shall then develop the LSTM model and deploy it on the mobile device to examine the practicality of our proposed approach to BP prediction.

Reposted within the project “Network of Excellence in Artificial Intelligence for Development (AI4D) in sub-Saharan Africa” #UnitedNations #artificialintelligence #SDG #UNESCO #videolectures #AI4DNetwork #AI4Dev #AI4D