Biography

I received my M.S. degree from Zhejiang University, where I was advised by Prof. Zhou Zhao, and my B.S. degree from Shandong University. My research focuses on generative AI for audio and multimodal understanding. My recent work includes ThinkSound, a pioneering framework that introduces Chain-of-Thought reasoning to interactive audio generation; FlashAudio, which leverages Rectified Flows for ultra-fast text-to-audio synthesis; and OmniAudio, a novel system for generating spatial audio from 360-degree videos.

I am always open to collaboration and discussion. Please feel free to reach out if you are interested in my work!


🔥 News

  • [Sep. 2025] Our work ThinkSound was accepted by NeurIPS 2025.
  • [Jul. 2025] Our work MEDIC was accepted by ACM-MM 2025.
  • [May. 2025] Our work OmniAudio was accepted by ICML 2025.
  • [May. 2025] Our work FlashAudio was accepted by ACL 2025 as an Oral (SAC Highlight, 1/47).
  • [Jul. 2024] Our work AudioLCM was accepted by ACM MM 2024.
  • [Mar. 2024] Awarded Outstanding Graduate of Zhejiang Province and Zhejiang University.

📝 Publications

(* denotes co-first author)

  1. ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
    Huadai Liu, K. Luo, J. Wang, et al.
    Conference on Neural Information Processing Systems (NeurIPS), 2025.
    [Paper] [Code]
  2. OmniAudio: Generating Spatial Audio from 360-Degree Video
    Huadai Liu, T. Luo, K. Luo, et al.
    International Conference on Machine Learning (ICML), 2025.
    [Paper] [Code]
  3. FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
    Huadai Liu, J. Wang, R. Huang, et al.
    Annual Meeting of the Association for Computational Linguistics (ACL), 2025 (Oral, SAC Highlight Reward).
    [Paper] [Code]
  4. MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
    Huadai Liu, J. Wang, X. Li, et al.
    ACM International Conference on Multimedia (ACM MM), 2025.
    [Paper]
  5. AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps
    Huadai Liu, R. Huang, Y. Liu, et al.
    ACM International Conference on Multimedia (ACM MM), 2024.
    [Paper] [Code]
  6. Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
    Huadai Liu, R. Huang, J. He, et al.
    Annual Meeting of the Association for Computational Linguistics (ACL), 2024.
    [Paper]
  7. TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
    Huadai Liu*, R. Huang, X. Lin, et al.
    International Conference on Learning Representations (ICLR), 2023.
    [Paper]
  8. ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
    Huadai Liu, R. Huang, X. Lin, et al.
    Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
    [Paper]
  9. AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
    Huadai Liu*, R. Huang, X. Lin, et al.
    Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
    [Paper]
  10. ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech
    Huadai Liu*, Y. Ren, R. Huang, et al.
    ACM International Conference on Multimedia (ACM MM), 2022.
    [Paper] [Code]

🎖 Honors and Awards

  • 2024 Outstanding Graduate of Zhejiang Province
  • 2024 Outstanding Graduate of Zhejiang University
  • 2023 National Scholarship (Top 1%)
  • 2021–2023 Outstanding Graduate Student Leader, Zhejiang University

💬 Academic Service

  • Conference Reviewer:
    • ICML, NeurIPS, ICLR, ACL, EMNLP, ACM-MM