Kai Yu, Shanghai Jiao Tong Univerisity


		Kai Yu Ph.D. (Cantab) Distinguished Professor Cross-media Language Intelligence (X-LANCE) Lab (Former SpeechLab) Department of Computer Science and Engineering Shanghai Jiao Tong University Email: kai.yu [AT] sjtu [DOT] edu [DOT] cn Address: Computer Science Department, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China [中文]\|[English]

Kai Yu

Ph.D. (Cantab)

Distinguished Professor
Cross-media Language Intelligence (X-LANCE) Lab (Former SpeechLab)
Department of Computer Science and Engineering
Shanghai Jiao Tong University

Email: kai.yu [AT] sjtu [DOT] edu [DOT] cn
Address: Computer Science Department, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China

[中文]|[English]

Biography

I am currently a distinguished professor and the director of Machine Intelligence Institute of the School of Computer Science at Shanghai Jiao Tong University (SJTU), as well as the co-founder and chief scientist of AISpeech. I am a fellow of ISCA, senior member of IEEE and distinguished member of CCF (China Computer Federation).

My academic journey began at the Department of Automation at Tsinghua University, where I completed my bachelor and master degrees in 1999 and 2002 respectively. I obtained my PhD at the Machine Intelligence Lab of the Engineering Department, Cambridge University, U.K. in 2006 and then worked as a senior research associate there. I joined SJTU in 2012 and founded SpeechLab at SJTU. Later, SpeechLab is extended and renamed as Cross-media Language Intelligence (X-LANCE) Lab as it is now. I am a senior member of the IEEE and have served as a member of IEEE Speech and Language Processing Technical Committee (2017-2019) as well as an associate editor of IEEE/ACM Transactions on Audio, Speech, and Language Processing (2019-2024). I am currently a board member of the IEEE Signal Processing Society Conferences Board and Membership Board. I am also a member of the CCF (China Computer Federation) council and serve as the director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF.

My research interests primarily lie in the field of conversational AI, including rich aspects of speech and language processing as well as multi-modal linguistic computing. The goal of my research is to build cognitive conversational agent which can operate in complex real-world environment, deal with uncertainty, deliver information in a humanized way and evolve via interacting with environment. I have published over 200 peer-reviewed journal and conference papers and won numerous paper awards. I used to serve as program chairs for Interspeech, ICMI and SigDial, general chair for National Conference on Man-machine Communication (the largest domestic speech conference in China), as well as area chairs of speech processing or dialogue systems for Interspeech, ACL, EMNLP etc.

The outcome of my research have been both recognized in academia and successfully industrialized. I founded AISpeech to commercialize state-of-the-art speech and language processing technology. AISpeech has been selected into the “AI Key Players” list in the Equity Research Report of AI by Goldman Sachs in 2016 and one of the Cool Vendors for AI (East Asia) by Gartner in 2017. On behalf of AISpeech, I am also leading the National AI Open Innovation Platform on Language Computing, granted by Ministry of Science and Technology of China in 2022.

SJTU X-LANCE Lab

We are looking for self-motivated Ph.D./master/undergraduate students and postdocs interested in speech and language processing. Please send your CV to me if you want to join us.

Research Interests

Speech and Audio Processing: neural speech signal processing, robust speech and speaker recognition, high-fidelity speech synthesis, audio analysis and auditory cognition, multi-modal speech processing and universal audio model
Natural Language Processing: structured language understanding, KBQA and machine reading comprehension, statistical dialogue systems, multi-lingual language processing, foundation language model, large language model agent
Multi-modal interaction: digital avatar, GUI understanding and manipulation, AGI for science

Selected Publication [Google Scholar][More Papers]

Speech and Audio Processing

ASR TDT-KWS: Fast and Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu and Kai Yu
ICASSP 2024

Signal Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking
Wenbin Jiang and Kai Yu
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1758-1770, 2023

TTS Text-To-Speech With Latent Diffusion
Zhijun Liu, Yiwei Guo and Kai Yu
ICASSP 2023
TTS VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
Chenpeng Du, Yiwei Guo, Xie Chen and Kai Yu
Interspeech 2022
RAA Towards Duration Robust Weakly Supervised Sound Event Detection
Heinrich Dinkel, Mengyue Wu and Kai Yu
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 887-900, 2021

Natural Language Processing

LLM SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen and Kai Yu
AAAI 2024
LLM Large Language Models Are Semi-Parametric Reinforcement Learning Agents.
Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao and Kai Yu
NeurIPS 2023
NLP A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL
Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, Kai Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 11, pp. 13796-13813, 2023

NLP OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue
Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu and Kai Yu
Transactions of the Association for Computational Linguistics (TACL), vol.11, pp. 68-84, 2022

NLP LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations
Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu and Kai Yu
ACL 2021

Multi-modal Interaction

Avatar DIFFDUB: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen and Kai Yu
ICASSP 2024
Avatar DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao and Jiang Bian
ACM-MM 2023
GUI Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu and Kai Yu
EMNLP 2022
GUI TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages
Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen and Kai Yu
NAACL 2022

Professional Qualification and Service

Institute of Electrical and Electronics Engineers (IEEE)

Senior member of IEEE
Board Member of IEEE Signal Processing Society Conferences Board
Board Member of IEEE Signal Processing Society Membership Board
Member of IEEE Speech and Language Processing Technical Committee (2017-2019)
Associate Editor of IEEE/ACM Transactions on Audio Speech and Language Processing (2019-2024)

China Computer Federation (CCF)

Distinguished Member of CCF
Member of the 13th Council of CCF
Director of the Speech, Dialogue and Auditory Processing Technical Committee of CCF
Associate Director of the Corporation Development Forum (Suzhou) of CCF
Standing Committee Member of the Large Model Forum of CCF

Chinese Information Processing Society of China (CIPSC)

Member of the 9th Council of CIPSC
Associate Director of the Speech Information Processing Technical Committee of CIPSC

Industry Service

Director of the National AI Open Innovation Platform on Language Computing, Ministry of Science and Technology of China (MOST)
Member of the AI Key Technology and Application Evaluation Academic Committee of the Key Laboratory of the Ministry of Industry and Information Technology of China
Member of the Information System User Interfaces Branch (TC28/SC35) of the National Information Technology Standardization Technical Committee
Member of the 4th National Computer Science and Technology Terminology Approval Committee
Director of the Academic and Intellectual Property Working Group of the China Artificial Intelligence Industry Alliance (AIIA)
Associate Director of the Technical Committee of the Alliance of Intelligent Speech Technology Industry of China

Other Service

Vice President of the Shanghai Overseas Returned Scholar Association (SORSA)
Chairman of the AI Branch of SORSA
Member of the Young Scientists Committee of the World Laureates Forum

Academic Conference Service

ICASSP

IEEE SLTC Member

Interspeech

Program Chair, Area Chair (Speech Recognition/Dialogue Systems)

EUSIPCO

Area chair (Speech Processing)

ACL

(Senior) Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems/Spoken Language Technology)

NAACL

Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems)

EMNLP

Area chair/Meta-reviewer/Action Editor of ARR (Dialogue Systems)

NeurIPS

Area Chair

SigDial

Program Chair

ICMI

Program Chair

NCMMSC

General Chair, Program Chair

Reviewer Service

Journal

IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Signal Processing Letters
IEEE Signal Processing Magazine
Speech Communication
Computer Speech and Language
Journal of Computer Science (Chinese)
Journal of Automation (Chinese)

Conference

ICASSP, Interspeech, IEEE ASRU, IEEE SLT, APSIPA, ISCSLP, NCMMSC
ACL/NAACL/EACL, EMNLP, SigDial
AAAI, NeurIPS

Proposal and Award

EPSRC, U.K.
Science and Engineering Research Council, Agency for Science and Technology Research, Singapore
Israel Science Foundation (ISF), Israel
Foundation for Polish Science
Research Grants Council (RGC) of Hong Kong
National Natural Science Foundation of China
Ministry of Science and Technology of China
Ministry of Industry and Information Technology of China
Ministry of Education of China
Chinese Academy of Sciences

Award

Best Paper Award

EURASIP Speech Communication Best Paper Award
International Symposium on Chinese Spoken Language Processing Best Paper Award
ISCA Computer Speech and Language Best Paper Award
Interspeech Best Paper Award
IEEE SLT Best Paper Award
NCMMSC Best Paper Award

National and Provincial Award

Leading Talents in Scientific and Technological Innovation by Ministry of Science and Technology of China
Excellent Young Researcher Fund by National Science Foundation of China (NSFC)
Chinese Patent Excellence Award by China National Intellectual Property Administration
Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning by Shanghai Municipal Education Commission

Professional Society Academic Award

Bamboo Award by China Computer Federation (CCF)
Distinguished Lecturer of Advanced Disciplines Lectures by China Computer Federation (CCF)
Second Prize for Scientific and Technological Progress, WuWenJun AI Science and Technology Award by Chinese Association for Artificial Intelligence (CAAI)
First Prize for Natural Science, WuWenJun AI Science and Technology Award by Chinese Association for Artificial Intelligence (CAAI)

Other Award

Scientific Chinese (2016) Person of the Year by Scientific Chinese Magazine

Last updated on 2025-06-18.