Skip to Content

Paper Accepted at Interspeech 2025!

Our paper From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models has been accepted at Interspeech 2025.

LLMs have shown that text-only training can give models remarkable reasoning abilities and abstract semantic understanding. This raises a fascinating question: do speech models develop similar conceptual structures when trained only on audio? And when models are trained on both speech and text together, do they build a richer understanding?

We used Latent Concept Analysis from our prior work on interpretability to examine how semantic abstractions form across modalities, and find lots of interesting differences on how speech and text modalities differ in their internal representations. We released our code and a curated audio version of the SST-2 dataset on GitHub and Hugging Face to support reproducibility.

Congratulations to all the authors on this work!

Comments

No comments yet.

Say something: