• Home
  • ニュース
  • Interspeech 2026に当研究室の論文が採録
  • Interspeech 2026に当研究室の論文が採録

    ■書誌情報
    Haoyu Zhang, Jiaxian Guo, Dong Yang, Yusuke Iwasawa, Yutaka Matsuo: AQA-TTRL: Self-Adaptation in Audio Question Answering with Test-Time Reinforcement Learning, Annual Conference of the International Speech Communication Association (Interspeech 2026), September 2026
    ■概要
    Large Audio Language Models (LALMs) exhibit strong capabilities in general audio understanding but remain static after deployment, limiting their adaptability to real-world data. Since supervised fine-tuning is costly, we propose AQA-TTRL, a novel framework for audio understanding that enables on-the-fly evolution via test-time reinforcement learning using only unlabeled test data. It generates pseudo-labels via majority voting and optimizes the model through reinforcement learning. To address the noise in self-generated labels, we introduce confidence weighting to adjust training signals. Furthermore, multiple-attempt sampling mitigates advantage collapse and stabilizes training. Across MMAU, MMAR, and MMSU, AQA-TTRL achieves significant average improvements of 4.42% for Qwen2.5-Omni 7B and 11.04% for the 3B model. Notably, the adapted 3B model outperforms direct inference of the unadapted 7B model, highlighting the effectiveness of test-time adaptation in audio understanding.