Kai YuPh.D. (Cantab) Distinguished Professor Cross-media Language Intelligence (X-LANCE) Lab (Former SpeechLab) Department of Computer Science and Engineering Shanghai Jiao Tong University Email: kai.yu [AT] sjtu [DOT] edu [DOT] cn Address: Computer Science Department, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China [中文]|[English] |
I am currently a distinguished professor in the Department of Computer Science and Engineering at Shanghai Jiao Tong University (SJTU), as well as the co-founder and chief scientist of AISpeech. I am now leading the Institute of Intelligent Human-Computer Interaction of the Department of Computer Science, as well as the Center for Intelligent Speech and Natural Language Processing of the AI Institute of SJTU.
My academic journey began at the Department of Automation at Tsinghua University, where I completed my bachelor and master degrees in 1999 and 2002 respectively. I obtained my PhD at the Machine Intelligence Lab of the Engineering Department, Cambridge University, U.K. in 2006 and then worked as a senior research associate there. I joined SJTU in 2012 and founded SpeechLab at SJTU. Later, SpeechLab is extended and renamed as Cross-media Language Intelligence (X-LANCE) Lab as it is now.
My research interests primarily lie in the field of conversational AI, including rich aspects of speech and language processing as well as multi-modal linguistic computing. The goal of my research is to build cognitive conversational agent which can operate in complex real-world environment, deal with uncertainty, deliver information in a humanized way and evolve via interacting with environment. I have published over 200 peer-reviewed journal and conference papers and won numerous paper awards. I used to serve as program chairs for Interspeech, ICMI and SigDial, as well as area chairs of speech processing or dialogue systems for Interspeech, ACL, EMNLP etc.
The outcome of my research have been both recognized in academia and successfully industrialized. I founded AISpeech to commercialize state-of-the-art speech and language processing technology. AISpeech has been selected into the “AI Key Players” list in the Equity Research Report of AI by Goldman Sachs in 2016 and one of the Cool Vendors for AI (East Asia) by Gartner in 2017. On behalf of AISpeech, I am also leading the National AI Open Innovation Platform on Language Computing, granted by Ministry of Science and Technology of China in 2022.
ASR TDT-KWS: Fast and Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu and Kai Yu
ICASSP 2024
Signal Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking
Wenbin Jiang and Kai Yu
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1758-1770, 2023
TTS Text-To-Speech With Latent Diffusion
Zhijun Liu, Yiwei Guo and Kai Yu
ICASSP 2023
TTS VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
Chenpeng Du, Yiwei Guo, Xie Chen and Kai Yu
Interspeech 2022
RAA Towards Duration Robust Weakly Supervised Sound Event Detection
Heinrich Dinkel, Mengyue Wu and Kai Yu
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 887-900, 2021
LLM SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen and Kai Yu
AAAI 2024
LLM Large Language Models Are Semi-Parametric Reinforcement Learning Agents.
Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao and Kai Yu
NeurIPS 2023
NLP A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL
Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, Kai Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, no. 11, pp. 13796-13813, 2023
NLP OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue
Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu and Kai Yu
Transactions of the Association for Computational Linguistics (TACL), vol.11, pp. 68-84, 2022
NLP LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations
Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu and Kai Yu
ACL 2021
Avatar DIFFDUB: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen and Kai Yu
ICASSP 2024
Avatar DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao and Jiang Bian
ACM-MM 2023
GUI Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu and Kai Yu
EMNLP 2022
GUI TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages
Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen and Kai Yu
NAACL 2022