DiscoverDaily Paper CastStableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Update: 2025-10-01
Share

Description

🤗 Upvotes: 58 | cs.CL



Authors:

Yuhan Song, Linhao Zhang, Chuhan Wu, Aiwei Liu, Wei Jia, Houfeng Wang, Xiao Zhou



Title:

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs



Arxiv:

http://arxiv.org/abs/2509.22220v1



Abstract:

Prevalent semantic speech tokenizers, designed to capture linguistic content, are surprisingly fragile. We find they are not robust to meaning-irrelevant acoustic perturbations; even at high Signal-to-Noise Ratios (SNRs) where speech is perfectly intelligible, their output token sequences can change drastically, increasing the learning burden for downstream LLMs. This instability stems from two flaws: a brittle single-path quantization architecture and a distant training signal indifferent to intermediate token stability. To address this, we introduce StableToken, a tokenizer that achieves stability through a consensus-driven mechanism. Its multi-branch architecture processes audio in parallel, and these representations are merged via a powerful bit-wise voting mechanism to form a single, stable token sequence. StableToken sets a new state-of-the-art in token stability, drastically reducing Unit Edit Distance (UED) under diverse noise conditions. This foundational stability translates directly to downstream benefits, significantly improving the robustness of SpeechLLMs on a variety of tasks.

Comments 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Jingwen Liang, Gengyu Wang