Biography
I received my M.S. degree from Zhejiang University, where I was advised by Prof. Zhou Zhao, and my B.S. degree from Shandong University. My research focuses on generative AI for audio and multimodal understanding. My recent work includes ThinkSound, a pioneering framework that introduces Chain-of-Thought reasoning to interactive audio generation; FlashAudio, which leverages Rectified Flows for ultra-fast text-to-audio synthesis; and OmniAudio, a novel system for generating spatial audio from 360-degree videos.
I am always open to collaboration and discussion. Please feel free to reach out if you are interested in my work!
🔥 News
- [Sep. 2025] Our work ThinkSound was accepted by NeurIPS 2025.
- [Jul. 2025] Our work MEDIC was accepted by ACM-MM 2025.
- [May. 2025] Our work OmniAudio was accepted by ICML 2025.
- [May. 2025] Our work FlashAudio was accepted by ACL 2025 as an Oral (SAC Highlight, 1/47).
- [Jul. 2024] Our work AudioLCM was accepted by ACM MM 2024.
- [Mar. 2024] Awarded Outstanding Graduate of Zhejiang Province and Zhejiang University.
📝 Publications
(* denotes co-first author)
-
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Huadai Liu, K. Luo, J. Wang, et al.
Conference on Neural Information Processing Systems (NeurIPS), 2025.
[Paper] [Code] -
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu, T. Luo, K. Luo, et al.
International Conference on Machine Learning (ICML), 2025.
[Paper] [Code] -
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Huadai Liu, J. Wang, R. Huang, et al.
Annual Meeting of the Association for Computational Linguistics (ACL), 2025 (Oral, SAC Highlight Reward).
[Paper] [Code] -
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
Huadai Liu, J. Wang, X. Li, et al.
ACM International Conference on Multimedia (ACM MM), 2025.
[Paper] -
AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps
Huadai Liu, R. Huang, Y. Liu, et al.
ACM International Conference on Multimedia (ACM MM), 2024.
[Paper] [Code] -
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Huadai Liu, R. Huang, J. He, et al.
Annual Meeting of the Association for Computational Linguistics (ACL), 2024.
[Paper] -
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
Huadai Liu*, R. Huang, X. Lin, et al.
International Conference on Learning Representations (ICLR), 2023.
[Paper] -
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Huadai Liu, R. Huang, X. Lin, et al.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
[Paper] -
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Huadai Liu*, R. Huang, X. Lin, et al.
Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
[Paper] -
ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech
Huadai Liu*, Y. Ren, R. Huang, et al.
ACM International Conference on Multimedia (ACM MM), 2022.
[Paper] [Code]
🎖 Honors and Awards
- 2024 Outstanding Graduate of Zhejiang Province
- 2024 Outstanding Graduate of Zhejiang University
- 2023 National Scholarship (Top 1%)
- 2021–2023 Outstanding Graduate Student Leader, Zhejiang University
💬 Academic Service
- Conference Reviewer:
- ICML, NeurIPS, ICLR, ACL, EMNLP, ACM-MM