Fanar - An Arabic-Centric Multimodal Generative AI Platform • Fahim Dalvi

After over a year of combined effort of a large team at QCRI, I’m pleased to announce that Fanar is finally out for the world to try! This milestone represents our journey to make sure the Arabic language, the culture and the norms of this region are represented in key technologies such as generative AI. At a glance, here’s what the platform includes:

Two Arabic LLMs:

Fanar Star (7B parameters) — trained from scratch on ~1 trillion clean Arabic, English, and Code tokens
Fanar Prime (9B parameters) — continually trained on the Gemma-2 9B base model using the same data. Both models are concurrently deployed with a custom orchestrator that routes prompts transparently to the right model.

Beyond text:

Islamic RAG — a retrieval-augmented system specifically for handling religious prompts with verified sources
Recency RAG — summarizes information about events that occurred after the model’s pre-training cutoff
Bilingual speech recognition — supports multiple Arabic dialects
Voice and image generation — fine-tuned to reflect regional characteristics
Attribution service — verifies the authenticity of fact-based generated content

The Fanar models are among the best-performing in their size class on established benchmarks for Arabic. You can explore the models on Hugging Face and check out the technical report for more details.

Congratulations to the entire Fanar team on this momentous occasion!

Fanar - An Arabic-Centric Multimodal Generative AI Platform

Comments

Say something: