Paper Accepted at Interspeech 2025!

Fri, 15 Aug 2025 13:00:00 +0300

Our paper From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models has been accepted at Interspeech 2025.

LLMs have shown that text-only training can give models remarkable reasoning abilities and abstract semantic understanding. This raises a fascinating question: do speech models develop similar conceptual structures when trained only on audio? And when models are trained on both speech and text together, do they build a richer understanding?

We used Latent Concept Analysis from our prior work on interpretability to examine how semantic abstractions form across modalities, and find lots of interesting differences on how speech and text modalities differ in their internal representations. We released our code and a curated audio version of the SST-2 dataset on GitHub and Hugging Face to support reproducibility.

Speech on Fahim Dalvi

Paper Accepted at Interspeech 2025!