Knowledge 4 All Foundation Concludes Successful Collaboration with European AI Excellence Network ELISE

Knowledge 4 All Foundation is pleased to announce the successful completion of its participation in the European Learning and Intelligent Systems Excellence (ELISE) project, a prominent European Network of Artificial Intelligence Excellence Centres. ELISE, part of the EU Horizon 2020 ICT-48 portfolio, originated from the European Laboratory for Learning and Intelligent Systems (ELLIS) and concluded in August 2024.

The European Learning and Intelligent Systems Excellence (ELISE) project, funded under the EU's Horizon 2020 programme, aimed to position Europe at the forefront of artificial intelligence (AI) and machine learning research
The European Learning and Intelligent Systems Excellence (ELISE) project, funded under the EU’s Horizon 2020 programme, aimed to position Europe at the forefront of artificial intelligence (AI) and machine learning research

Throughout the project, Knowledge 4 All Foundation collaborated with leading AI research hubs and associated fellows to advance high-level research and disseminate knowledge across academia, industry, and society. The Foundation contributed to various initiatives, including mobility programs, research workshops, and policy development, aligning with ELISE’s mission to promote explainable and trustworthy AI outcomes.

The Foundation’s involvement in ELISE has reinforced its commitment to fostering innovation and excellence in artificial intelligence research. By engaging in this collaborative network, Knowledge 4 All Foundation has played a role in positioning Europe at the forefront of AI advancements, ensuring that AI research continues to thrive within open societies

Knowledge 4 All Foundation Acknowledged by Masakhane Research Foundation in Groundbreaking NLP Publications

The Knowledge 4 All Foundation is proud to have been acknowledged by the Masakhane Research Foundation in their recent influential publications advancing Natural Language Processing (NLP) for African languages. These publications highlight the Foundation’s pivotal contributions to developing datasets and fostering AI innovation across the continent.

  1. “A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation” (NAACL 2022)
    This paper explores how a small number of translations can significantly enhance pre-trained models for African news translation, addressing the scarcity of African-language datasets. Read the full paper here.
  2. “MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition” (EMNLP 2022)
    This work presents MasakhaNER 2.0, a model leveraging Africa-centric transfer learning techniques for Named Entity Recognition (NER) in African languages, providing a vital resource for African NLP tasks. Read the full paper here.
  3. “MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages” (ACL 2023)
    This research introduces MasakhaPOS, which addresses the challenges of Part-of-Speech (POS) tagging in the diverse and underrepresented African linguistic landscape. Read the full paper here.

These projects, made possible through collaborative efforts and the contributions of Knowledge 4 All Foundation, have significantly advanced NLP for African languages, paving the way for inclusive and representative AI solutions.

The Foundation expresses its gratitude to Masakhane Research Foundation and remains committed to supporting initiatives that promote linguistic diversity, inclusivity, and technological progress for African communities. Together, these partnerships exemplify the power of global collaboration in driving impactful AI research and development.

AI4D blog series: The First Tunisian Arabizi Sentiment Analysis Dataset

Motivation

On social media, Arabic speakers tend to express themselves in their own local dialect. To do so, Tunisians use “Tunisian Arabizi”, which consists in supplementing numerals to the Latin script rather than the Arabic alphabet.

In the African continent, analytical studies based on Deep Learning are data hungry. To the best of our knowledge, no annotated Tunisian Arabizi dataset exists.

Twitter, Facebook and other micro-blogging systems are becoming a rich source of feedback information in several vital sectors, such as politics, economics, sports and other matters of general interest. Our dataset is taken from people expressing themselves in their own Tunisian Dialect using Tunisian Arabizi.

TUNIZI is composed of one instance presented as text comments collected from Social Media, annotated as positive, negative or neutral. This data does not include any confidential information. However, negative comments may include offensive or insulting content.

TUNIZI dataset is used in all iCompass products that are using the Tunisian Dialect. TUNIZI is used in a Sentiment Analysis project dedicated for the e-reputation and also for all Tunisian chatbots that are able to understand the Tunisian Arabizi and reply using it.

Team

 TUNIZI Dataset is collected, preprocessed and annotated by iCompass team, the Tunisian Startup speciallized in NLP/NLU. The team composed of academics and engineers specialized in Information technology, mathematics and linguistics were all dedicated to ensure the success of the project. iCompass can be contacted through emails or through the website: www.icompass.tn

Implementation

  1. Data Collection: TUNIZI is collected from comments on Social Media platforms. All data was directly observable and did not require other data to be inferred from. Our dataset is taken from people expressing themselves in their own Tunisian Dialect using Arabizi. This dataset relates directly to Tunisians from different regions, different ages and different genders. Our dataset is collected anonymously and contains no information about users identity.
  2. Data Preprocessing & Annotation: TUNIZI was preprocessed by removing links, emoji symbols and punctuation. Annotation was then performed by five Tunisian native speakers, three males and two females at a higher education level (Master/PhD).
  3. Distribution and Maintenance: TUNIZI dataset is made public for all upcoming research and development activitieson Github. TUNIZI is maintained by iCompass team that can be contacted through emails or through the Github repository. Updates will be available on the same Github link.
  4. Conclusion: As the interest in Natural Language Processing, particularly for African languages is growing, a natural future step would involve building Arabizi datasets for other underrepresented north African dialects such as Algerian and Moroccan.