Paper Accepted at COLING 2025

Excited to share that our paper AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs has been accepted at COLING 2025 in Abu Dhabi.

Arabic isn’t just one language — it’s a family of dialects that vary dramatically from region to region. Yet most LLM evaluations treat Arabic as a monolith, using only Modern Standard Arabic (MSA). This paper addresses that gap by introducing AraDiCE, a benchmark that evaluates LLMs on both dialectal and cultural dimensions. We evaluated several LLMs on these benchmarks and found an interesting pattern: Arabic-specific models like Fanar, Jais and AceGPT do outperform general multilingual models on dialectal tasks. But significant challenges remain — particularly in dialect identification, generation, and translation.

This work highlights why evaluating models in MSA alone gives an incomplete picture. If we want LLMs that truly work for Arabic speakers, we need benchmarks that reflect the real diversity of the language and its cultural contexts.

Congratulations to all the co-authors on this work!

Comments

Say something: