Fanar - An Arabic-Centric Multimodal Generative AI Platform
After over a year of combined effort of a large team at QCRI, I’m pleased to announce that Fanar is finally out for the world to try! This milestone represents our journey to make sure the Arabic language, the culture and the norms of this region are represented in key technologies such as generative AI. At a glance, here’s what the platform includes:
Two Arabic LLMs:
- Fanar Star (7B parameters) — trained from scratch on ~1 trillion clean Arabic, English, and Code tokens
- Fanar Prime (9B parameters) — continually trained on the Gemma-2 9B base model using the same data. Both models are concurrently deployed with a custom orchestrator that routes prompts transparently to the right model.
Beyond text:
- Islamic RAG — a retrieval-augmented system specifically for handling religious prompts with verified sources
- Recency RAG — summarizes information about events that occurred after the model’s pre-training cutoff
- Bilingual speech recognition — supports multiple Arabic dialects
- Voice and image generation — fine-tuned to reflect regional characteristics
- Attribution service — verifies the authenticity of fact-based generated content
The Fanar models are among the best-performing in their size class on established benchmarks for Arabic. You can explore the models on Hugging Face and check out the technical report for more details.
Congratulations to the entire Fanar team on this momentous occasion!
Comments
No comments yet.
Say something: