📃Papers

Publications are listed in reversed chronological order.

2024

  1. Advanced Long-Content Speech Recognition With Factorized Neural Transducer
    Xun Gong , Yu Wu , Jinyu Li , Shujie Liu , Rui Zhao , Xie Chen, and Yanmin Qian
    IEEE ACM Trans. Audio Speech Lang. Process., 2024
  2. EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
    Wenxi Chen , Yuzhe Liang , Ziyang Ma , Zhisheng Zheng , and Xie Chen
    CoRR, 2024
  3. ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
    Yakun Song , Zhuo Chen , Xiaofei Wang , Ziyang Ma , and Xie Chen
    CoRR, 2024
  4. BAT: Learning to Reason about Spatial Sounds with Large Language Models
    Zhisheng Zheng , Puyuan Peng , Ziyang Ma , Xie Chen, Eunsol Choi , and David Harwath
    CoRR, 2024
  5. An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
    Ziyang Ma , Guanrou Yang , Yifan Yang , Zhifu Gao , Jiaming Wang , Zhihao Du , Fan Yu , Qian Chen , Siqi Zheng , Shiliang Zhang , and Xie Chen
    CoRR, 2024
  6. Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
    Xuenan Xu , Zeyu Xie , Mengyue Wu, and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., 2024
  7. Towards Weakly Supervised Text-to-Audio Grounding
    Xuenan Xu , Ziyang Ma , Mengyue Wu, and Kai Yu
    CoRR, 2024
  8. VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
    Chenpeng Du , Yiwei Guo , Hankun Wang , Yifan Yang , Zhikang Niu , Shuai Wang , Hui Zhang , Xie Chen, and Kai Yu
    CoRR, 2024
  9. ChemDFM: Dialogue Foundation Model for Chemistry
    Zihan Zhao , Da Ma , Lu Chen, Liangtai Sun , Zihao Li , Hongshen Xu , Zichen Zhu , Su Zhu , Shuai Fan , Guodong Shen , Xin Chen , and Kai Yu
    CoRR, 2024
  10. MULTI: Multimodal Understanding Leaderboard with Text and Images
    Zichen Zhu, Yang Xu , Lu Chen, Jingkai Yang , Yichuan Ma , Yiming Sun , Hailin Wen , Jiaqi Liu , Jinyu Cai , Yingzi Ma , Situo Zhang , Zihan Zhao , Liangtai Sun , and Kai Yu
    CoRR, 2024

2023

  1. A Unified Framework From Face Image Restoration to Data Augmentation Using Generative Prior
    Jiawei You , Ganyu Huang , Tianyuan Han , Haoze Yang , and Liping Shen
    IEEE Access, 2023
  2. Human Pose Estimation with Combined Feature Maps and Joint Embeddings
    Tianyuan Han , Ganyu Huang , Chunhui Li , and Liping Shen
    In Proceedings of the 2023 International Conference on Advances in Artificial Intelligence and Applications, AAIA 2023, Wuhan, China, November 18-20, 2023 , 2023
  3. Assessing and Enhancing LLMs: A Physics and History Dataset and One-More-Check Pipeline Method
    Chaofan He , Chunhui Li , Tianyuan Han , and Liping Shen
    In Neural Information Processing - 30th International Conference, ICONIP 2023, Changsha, China, November 20-23, 2023, Proceedings, Part XIII , 2023
  4. GAN Latent Space Manipulation Based Augmentation for Unbalanced Emotion Datasets
    Yuhan Xiong , Jiawei You , and Liping Shen
    In International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, June 18-23, 2023 , 2023
  5. LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer
    Xun Gong , Yu Wu , Jinyu Li , Shujie Liu , Rui Zhao , Xie Chen, and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  6. Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR
    Xun Gong , Wei Wang , Hang Shao , Xie Chen, and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  7. Exploring Binary Classification Loss for Speaker Verification
    Bing Han , Zhengyang Chen , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  8. Improving Dino-Based Self-Supervised Speaker Verification with Progressive Cluster-Aware Training
    Bing Han , Wen Huang , Zhengyang Chen , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
  9. Robust Audio-Visual ASR with Unified Cross-Modal Attention
    Jiahong Li , Chenda Li , Yifei Wu , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  10. Target Sound Extraction with Variable Cross-Modality Clues
    Chenda Li , Yao Qian , Zhuo Chen , Dongmei Wang , Takuya Yoshioka , Shujie Liu , Yanmin Qian , and Michael Zeng
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  11. Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech Separation
    Chenda Li , Yifei Wu , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  12. Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge
    Tao Liu , Zhengyang Chen , Yanmin Qian , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  13. Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech Recognition
    Hang Shao , Tian Tan , Wei Wang , Xun Gong , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  14. Lowbit Neural Network Quantization for Speaker Verification
    Haoyu Wang , Bei Liu , Yifei Wu , Zhengyang Chen , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
  15. Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit
    Hongji Wang , Chengdong Liang , Shuai Wang , Zhengyang Chen , Binbin Zhang , Xu Xiang , Yanlei Deng , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  16. HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition
    Wei Wang , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  17. Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation
    Yifei Wu , Chenda Li , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , 2023
  18. Code-Switching Text Generation and Injection in Mandarin-English ASR
    Haibin Yu , Yuxuan Hu , Yao Qian , Ma Jin , Linquan Liu , Shujie Liu , Yu Shi , Yanmin Qian , Edward Lin , and Michael Zeng
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  19. Adaptive Large Margin Fine-Tuning For Robust Speaker Verification
    Leying Zhang , Zhengyang Chen , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , 2023
  20. ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
    Chenyang Le , Yao Qian , Long Zhou , Shujie Liu , Yanmin Qian , Michael Zeng , and Xuedong Huang
    In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , 2023
  21. Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
    Yoshiki Masuyama , Xuankai Chang , Wangyou Zhang , Samuele Cornell , Zhong-Qiu Wang , Nobutaka Ono , Yanmin Qian , and Shinji Watanabe
    In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023, New Paltz, NY, USA, October 22-25, 2023 , 2023
  22. Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310) (Version 1)
    Yen-Ju Lu , Xuankai Chang , Chenda Li , Wangyou Zhang , Samuele Cornell , Zhaoheng Ni , Yoshiki Masuyama , Brian Yan , Robin Scheibler , Zhong-Qiu Wang , Yu Tsao , Yanmin Qian , and Shinji Watanabe
    Oct 2023
    Accessed on YYYY-MM-DD.
  23. Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification
    Bing Han , Zhengyang Chen , and Yanmin Qian
    CoRR, Oct 2023
  24. Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor
    Zhengyang Chen , Bing Han , Shuai Wang , and Yanmin Qian
    CoRR, Oct 2023
  25. Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
    Hang Shao , Wei Wang , Bei Liu , Xun Gong , Haoyu Wang , and Yanmin Qian
    CoRR, Oct 2023
  26. Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
    Wangyou Zhang , and Yanmin Qian
    CoRR, Oct 2023
  27. Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
    Chenda Li , Yao Qian , Zhuo Chen , Naoyuki Kanda , Dongmei Wang , Takuya Yoshioka , Yanmin Qian , and Michael Zeng
    CoRR, Oct 2023
  28. InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models
    Bing Han , Junyu Dai , Xuchen Song , Weituo Hao , Xinyan He , Dong Guo , Jitong Chen , Yuxuan Wang , and Yanmin Qian
    CoRR, Oct 2023
  29. Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
    Zhengyang Chen , Bing Han , Shuai Wang , and Yanmin Qian
    CoRR, Oct 2023
  30. USED: Universal Speaker Extraction and Diarization
    Junyi Ao , Mehmet Sinan Yildirim , Meng Ge , Shuai Wang , Ruijie Tao , Yanmin Qian , Liqun Deng , Longshuai Xiao , and Haizhou Li
    CoRR, Oct 2023
  31. Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition
    Shuai Wang , Qibing Bai , Qi Liu , Jianwei Yu , Zhengyang Chen , Bing Han , Yanmin Qian , and Haizhou Li
    CoRR, Oct 2023
  32. The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR
    Yuhao Liang , Mohan Shi , Fan Yu , Yangze Li , Shiliang Zhang , Zhihao Du , Qian Chen , Lei Xie , Yanmin Qian , Jian Wu , Zhuo Chen , Kong Aik Lee , Zhijie Yan , and Hui Bu
    CoRR, Oct 2023
  33. Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
    Leying Zhang , Yao Qian , Linfeng Yu , Heming Wang , Xinkai Wang , Hemin Yang , Long Zhou , Shujie Liu , Yanmin Qian , and Michael Zeng
    CoRR, Oct 2023
  34. Toward Universal Speech Enhancement for Diverse Input Conditions
    Wangyou Zhang , Kohei Saijo , Zhong-Qiu Wang , Shinji Watanabe , and Yanmin Qian
    CoRR, Oct 2023
  35. One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
    Hang Shao , Bei Liu , and Yanmin Qian
    CoRR, Oct 2023
  36. FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition
    Dongning Yang , Wei Wang , and Yanmin Qian
    CoRR, Oct 2023
  37. Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature
    Chenpeng Du , Yiwei Guo , Xie Chen, and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2023
  38. Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning
    Guanrou Yang , Ziyang Ma , Zhisheng Zheng , Yakun Song , Zhikang Niu , and Xie Chen
    In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, December 16-20, 2023 , Oct 2023
  39. Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
    Qi Chen , Ziyang Ma , Tao Liu , Xu Tan , Qu Lu , Kai Yu , and Xie Chen
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
  40. Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition
    Xie Chen, Ziyang Ma , Changli Tang , Yujin Wang , and Zhisheng Zheng
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
  41. Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
    Yiwei Guo , Chenpeng Du , Xie Chen, and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
  42. DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
    Chenpeng Du , Qi Chen , Tianyu He , Xu Tan , Xie ChenKai Yu, Sheng Zhao , and Jiang Bian
    In Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023 , Oct 2023
  43. Blank-regularized CTC for Frame Skipping in Neural Transducer
    Yifan Yang , Xiaoyu Yang , Liyong Guo , Zengwei Yao , Wei Kang , Fangjun Kuang , Long Lin , Xie Chen, and Daniel Povey
    CoRR, Oct 2023
  44. UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
    Chenpeng Du , Yiwei Guo , Feiyu Shen , Zhijun Liu , Zheng Liang , Xie Chen, Shuai Wang , Hui Zhang , and Kai Yu
    CoRR, Oct 2023
  45. Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation
    Zheng Liang , Zheshu Song , Ziyang Ma , Chenpeng Du , Kai Yu , and Xie Chen
    CoRR, Oct 2023
  46. Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
    Ziyang Ma , Zhisheng Zheng , Guanrou Yang , Yu Wang , Chao Zhang , and Xie Chen
    CoRR, Oct 2023
  47. Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
    Mingyu Cui , Jiawen Kang , Jiajun Deng , Xi Yin , Yutao Xie , Xie Chen, and Xunying Liu
    CoRR, Oct 2023
  48. DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
    Sen Liu , Yiwei Guo , Chenpeng Du , Xie Chen, and Kai Yu
    CoRR, Oct 2023
  49. Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
    Zhisheng Zheng , Ziyang Ma , Yu Wang , and Xie Chen
    CoRR, Oct 2023
  50. VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
    Yiwei Guo , Chenpeng Du , Ziyang Ma , Xie Chen, and Kai Yu
    CoRR, Oct 2023
  51. Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
    Yifan Yang , Feiyu Shen , Chenpeng Du , Ziyang Ma , Kai Yu, Daniel Povey , and Xie Chen
    CoRR, Oct 2023
  52. Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
    Peng Wang , Yifan Yang , Zheng Liang , Tian Tan , Shiliang Zhang , and Xie Chen
    CoRR, Oct 2023
  53. Improved Factorized Neural Transducer Model For text-only Domain Adaptation
    Junzhe Liu , Jianwei Yu , and Xie Chen
    CoRR, Oct 2023
  54. Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
    Ziyang Ma , Wen Wu , Zhisheng Zheng , Yiwei Guo , Qian Chen , Shiliang Zhang , and Xie Chen
    CoRR, Oct 2023
  55. Acoustic BPE for Speech Generation with Discrete Tokens
    Feiyu Shen , Yiwei Guo , Chenpeng Du , Xie Chen, and Kai Yu
    CoRR, Oct 2023
  56. Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations
    Hanglei Zhang , Yiwei Guo , Sen Liu , Xie Chen, and Kai Yu
    CoRR, Oct 2023
  57. emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
    Ziyang Ma , Zhisheng Zheng , Jiaxin Ye , Jinchao Li , Zhifu Gao , Shiliang Zhang , and Xie Chen
    CoRR, Oct 2023
  58. OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue
    Zhi Chen , Yuncong Liu , Lu Chen , Su Zhu , Mengyue Wu, and Kai Yu
    Trans. Assoc. Comput. Linguistics, Oct 2023
  59. Transcribing Vocal Communications of Domestic Shiba lnu Dogs
    Jieyi Huang , Chunhao Zhang , Mengyue Wu , and Kenny Q. Zhu
    In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
  60. Detection of Multiple Mental Disorders from Social Media with Two-Stream Psychiatric Experts
    Siyuan Chen , Zhiling Zhang , Mengyue Wu , and Kenny Q. Zhu
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
  61. Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation
    Zhiling Zhang , Mengyue Wu , and Kenny Q. Zhu
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
  62. Diverse and Vivid Sound Generation from Text Descriptions
    Guangwei Li , Xuenan Xu , Lingfeng Dai , Mengyue Wu, and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
  63. Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning
    Xuenan Xu , Mengyue Wu, and Kai Yu
    In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2023 - Workshops, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
  64. BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
    Xuenan Xu , Zhiling Zhang , Zelin Zhou , Pingyue Zhang , Zeyu Xie , Mengyue Wu , and Kenny Q. Zhu
    In Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023 , Oct 2023
  65. LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation
    Siyuan Chen , Mengyue Wu , Kenny Q. Zhu , Kunyao Lan , Zhiling Zhang , and Lyuchun Cui
    CoRR, Oct 2023
  66. Enhance Temporal Relations in Audio Captioning with Sound Event Detection
    Zeyu Xie , Xuenan Xu , Mengyue Wu, and Kai Yu
    CoRR, Oct 2023
  67. Improving Audio Caption Fluency with Automatic Error Correction
    Hanxue Zhang , Zeyu Xie , Xuenan Xu , Mengyue Wu, and Kai Yu
    CoRR, Oct 2023
  68. A Large-scale Dataset for Audio-Language Representation Learning
    Luoyi Sun , Xuenan Xu , Mengyue Wu, and Weidi Xie
    CoRR, Oct 2023
  69. Does My Dog "Speak" Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners
    Jieyi Huang , Chunhao Zhang , Yufei Wang , Mengyue Wu , and Kenny Q. Zhu
    CoRR, Oct 2023
  70. Towards Lexical Analysis of Dog Vocalizations via Online Videos
    Yufei Wang , Chunhao Zhang , Jieyi Huang , Mengyue Wu , and Kenny Q. Zhu
    CoRR, Oct 2023
  71. PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health
    Haoan Jin , Siyuan Chen , Mengyue Wu , and Kenny Q. Zhu
    CoRR, Oct 2023
  72. A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL
    Ruisheng Cao , Lu Chen, Jieyu Li , Hanchong Zhang , Hongshen Xu , Wangyou Zhang , and Kai Yu
    IEEE Trans. Pattern Anal. Mach. Intell., Oct 2023
  73. Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking
    Wenbin Jiang , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2023
  74. SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling
    Sheng Jiang , Su Zhu , Ruisheng Cao , Qingliang Miao , and Kai Yu
    In Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
  75. Exploring Schema Generalizability of Text-to-SQL
    Jieyu Li , Lu Chen, Ruisheng Cao , Su Zhu , Hongshen Xu , Zhi Chen , Hanchong Zhang , and Kai Yu
    In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
  76. TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation
    Yiming Ai , Zhiwei He , Kai Yu, and Rui Wang
    In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
  77. CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset
    Hanchong Zhang , Jieyu Li , Lu Chen, Ruisheng Cao , Yunyan Zhang , Yu Huang , Yefeng Zheng , and Kai Yu
    In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , Oct 2023
  78. ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought
    Hanchong Zhang , Ruisheng Cao , Lu Chen, Hongshen Xu , and Kai Yu
    In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 , Oct 2023
  79. Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge
    Chenpeng Du , Yiwei Guo , Feiyu Shen , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
  80. DiffVoice: Text-to-Speech with Latent Diffusion
    Zhijun Liu , Yiwei Guo , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023 , Oct 2023
  81. Large Language Models Are Semi-Parametric Reinforcement Learning Agents
    Danyang Zhang , Lu Chen, Situo Zhang , Hongshen Xu , Zihan Zhao , and Kai Yu
    In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023 , Oct 2023
  82. Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction
    Danyang Zhang , Lu Chen, and Kai Yu
    CoRR, Oct 2023
  83. SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
    Liangtai Sun , Yang Han , Zihan Zhao , Da Ma , Zhennan Shen , Baocai Chen , Lu Chen, and Kai Yu
    CoRR, Oct 2023
  84. ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL
    Ruisheng Cao , Hanchong Zhang , Hongshen Xu , Jieyu Li , Da Ma , Lu Chen, and Kai Yu
    CoRR, Oct 2023
  85. DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
    Tao Liu , Chenpeng Du , Shuai Fan , Feilong Chen , and Kai Yu
    CoRR, Oct 2023
  86. SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
    Junjie Li , Yiwei Guo , Xie Chen, and Kai Yu
    CoRR, Oct 2023

2022

  1. Heterogeneous Graph Representation for Knowledge Tracing
    Jisen Chen , Jian Shen , Ting Long , Liping Shen, Weinan Zhang , and Yong Yu
    In Neural Information Processing - 29th International Conference, ICONIP 2022, Virtual Event, November 22-26, 2022, Proceedings, Part I , Oct 2022
  2. A simple but practical method: How to improve the usage of entities in the Chinese question generation
    Haoze Yang , Kunyao Lan , Jiawei You , and Liping Shen
    In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , Oct 2022
  3. From Uniform Models To Generic Representations: Stock Return Prediction With Pre-training
    Jiawei You , Tianyuan Han , and Liping Shen
    In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022 , Oct 2022
  4. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
    Sanyuan Chen , Chengyi Wang , Zhengyang Chen , Yu Wu , Shujie Liu , Zhuo Chen , Jinyu Li , Naoyuki Kanda , Takuya Yoshioka , Xiong Xiao , Jian Wu , Long Zhou , Shuo Ren , Yanmin Qian , Yao Qian , Jian Wu , Michael Zeng , Xiangzhan Yu , and Furu Wei
    IEEE J. Sel. Top. Signal Process., Oct 2022
  5. Optimizing Data Usage for Low-Resource Speech Recognition
    Yanmin Qian , and Zhikai Zhou
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
  6. Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation
    Chenda Li , Zhuo Chen , and Yanmin Qian
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
  7. Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
    Yanmin Qian , Xun Gong , and Houjun Huang
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
  8. End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party
    Wangyou Zhang , Xuankai Chang , Christoph Böddeker , Tomohiro Nakatani , Shinji Watanabe , and Yanmin Qian
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
  9. Time-Domain Audio-Visual Speech Separation on Low Quality Videos
    Yifei Wu , Chenda Li , Jinfeng Bai , Zhongqin Wu , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  10. Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech Separation
    Chenda Li , Lei Yang , Weiqin Wang , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  11. Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
    Zhengyang Chen , Sanyuan Chen , Yu Wu , Yao Qian , Chengyi Wang , Shujie Liu , Yanmin Qian , and Michael Zeng
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  12. Local Information Modeling with Self-Attention for Speaker Verification
    Bing Han , Zhengyang Chen , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  13. Punctuation Prediction for Streaming On-Device Speech Recognition
    Zhikai Zhou , Tian Tan , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  14. MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification
    Bing Han , Zhengyang Chen , Bei Liu , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  15. Self-Knowledge Distillation via Feature Enhancement for Speaker Verification
    Bei Liu , Haoyu Wang , Zhengyang Chen , Shuai Wang , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  16. Optimizing Alignment of Speech and Language Latent Spaces for End-To-End Speech Recognition and Understanding
    Wei Wang , Shuo Ren , Yao Qian , Shujie Liu , Yu Shi , Yanmin Qian , and Michael Zeng
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  17. Exploring Effective Data Utilization for Low-Resource Speech Recognition
    Zhikai Zhou , Wei Wang , Wangyou Zhang , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  18. Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge
    Fan Yu , Shiliang Zhang , Pengcheng Guo , Yihui Fu , Zhihao Du , Siqi Zheng , Weilong Huang , Lei Xie , Zheng-Hua Tan , DeLiang Wang , Yanmin Qian , Kong Aik Lee , Zhijie Yan , Bin Ma , Xin Xu , and Hui Bu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  19. The Sjtu System For Multimodal Information Based Speech Processing Challenge 2021
    Wei Wang , Xun Gong , Yifei Wu , Zhikai Zhou , Chenda Li , Wangyou Zhang , Bing Han , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  20. Attentive Feature Fusion for Robust Speaker Verification
    Bei Liu , Zhengyang Chen , and Yanmin Qian
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  21. Dual Path Embedding Learning for Speaker Verification with Triplet Attention
    Bei Liu , Zhengyang Chen , and Yanmin Qian
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  22. DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design
    Bei Liu , Zhengyang Chen , Shuai Wang , Haoyu Wang , Bing Han , and Yanmin Qian
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  23. Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification
    Leying Zhang , Zhengyang Chen , and Yanmin Qian
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  24. MSDWild: Multi-modal Speaker Diarization Dataset in the Wild
    Tao Liu , Shuai Fan , Xu Xiang , Hongbo Song , Shaoxiong Lin , Jiaqi Sun , Tianyuan Han , Siyuan Chen , Binwei Yao , Sen Liu , Yifei Wu , Yanmin Qian , and Kai Yu
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  25. Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition
    Xun Gong , Zhikai Zhou , and Yanmin Qian
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  26. Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction
    Bing Han , Zhengyang Chen , and Yanmin Qian
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  27. Separating Long-Form Speech with Group-wise Permutation Invariant Training
    Wangyou Zhang , Zhuo Chen , Naoyuki Kanda , Shujie Liu , Jinyu Li , Sefik Emre Eskimez , Takuya Yoshioka , Xiong Xiao , Zhong Meng , Yanmin Qian , and Furu Wei
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  28. ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
    Yen-Ju Lu , Xuankai Chang , Chenda Li , Wangyou Zhang , Samuele Cornell , Zhaoheng Ni , Yoshiki Masuyama , Brian Yan , Robin Scheibler , Zhong-Qiu Wang , Yu Tsao , Yanmin Qian , and Shinji Watanabe
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  29. Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models
    Bowen Qu , Chenda Li , Jinfeng Bai , and Yanmin Qian
    In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
  30. Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition
    Wei Wang , Wangyou Zhang , Shaoxiong Lin , and Yanmin Qian
    In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
  31. Medical Difficult Airway Detection using Speech Technology
    Zhikai Zhou , Shuang Cao , Zhengyang Chen , Bei Liu , Ming Xia , Hong Jiang , and Yanmin Qian
    In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
  32. Speaking style compensation on synthetic audio for robust keyword spotting
    Houjun Huang , and Yanmin Qian
    In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
  33. The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
    Gaofeng Cheng , Yifan Chen , Runyan Yang , Qingxuan Li , Zehui Yang , Lingxuan Ye , Pengyuan Zhang , Qingqing Zhang , Lei Xie , Yanmin Qian , Kong Aik Lee , and Yonghong Yan
    In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
  34. The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022
    Tao Liu , Xu Xiang , Zhengyang Chen , Bing Han , Kai Yu, and Yanmin Qian
    In 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, December 11-14, 2022 , Oct 2022
  35. End-to-End Multi-Speaker ASR with Independent Vector Analysis
    Robin Scheibler , Wangyou Zhang , Xuankai Chang , Shinji Watanabe , and Yanmin Qian
    In IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023 , Oct 2022
  36. A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning
    Zhengyang Chen , Yao Qian , Bing Han , Yanmin Qian , and Michael Zeng
    In IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023 , Oct 2022
  37. The SJTU X-LANCE Lab System for CNSRC 2022
    Zhengyang Chen , Bei Liu , Bing Han , Leying Zhang , and Yanmin Qian
    CoRR, Oct 2022
  38. SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022
    Zhengyang Chen , Bing Han , Xu Xiang , Houjun Huang , Bei Liu , and Yanmin Qian
    CoRR, Oct 2022
  39. Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
    Zhengyang Chen , Bing Han , Xu Xiang , Houjun Huang , Bei Liu , and Yanmin Qian
    CoRR, Oct 2022
  40. Factorized Neural Transducer for Efficient Language Model Adaptation
    Xie Chen, Zhong Meng , Sarangarajan Parthasarathy , and Jinyu Li
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  41. VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
    Chenpeng Du , Yiwei Guo , Xie Chen, and Kai Yu
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  42. Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
    Zhong Meng , Yashesh Gaur , Naoyuki Kanda , Jinyu Li , Xie Chen , Yu Wu , and Yifan Gong
    In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022 , Oct 2022
  43. Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
    Yujin Wang , Changli Tang , Ziyang Ma , Zhisheng Zheng , Xie Chen, and Wei-Qiang Zhang
    CoRR, Oct 2022
  44. MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
    Ziyang Ma , Zhisheng Zheng , Changli Tang , Yujin Wang , and Xie Chen
    CoRR, Oct 2022
  45. EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
    Yiwei Guo , Chenpeng Du , Xie Chen, and Kai Yu
    CoRR, Oct 2022
  46. Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models
    Changli Tang , Yujin Wang , Xie Chen, and Wei-Qiang Zhang
    CoRR, Oct 2022
  47. D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat
    Binwei Yao , Chao Shi , Likai Zou , Lingfeng Dai , Mengyue WuLu Chen, Zhen Wang , and Kai Yu
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
  48. Symptom Identification for Interpretable Detection of Multiple Mental Disorders on Social Media
    Zhiling Zhang , Siyuan Chen , Mengyue Wu , and Kenny Q. Zhu
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
  49. Category-Adapted Sound Event Enhancement with Weakly Labeled Data
    Guangwei Li , Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  50. Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition
    Xuenan Xu , Mengyue Wu, and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  51. Can Audio Captions Be Evaluated With Image Caption Metrics?
    Zelin Zhou , Zhiling Zhang , Xuenan Xu , Zeyu Xie , Mengyue Wu , and Kenny Q. Zhu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  52. Navigating Audio-Visual Event Detection Across Mismatched Modalities
    Guangwei Li , Xuenan Xu , Mengyue Wu, and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  53. Audio-Text Retrieval in Context
    Siyu Lou , Xuenan Xu , Mengyue Wu, and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  54. Climate and Weather: Inspecting Depression Detection via Emotion Recognition
    Wen Wu , Mengyue Wu, and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  55. Psychiatric Scale Guided Risky Post Screening for Early Detection of Depression
    Zhiling Zhang , Siyuan Chen , Mengyue Wu , and Kenny Q. Zhu
    In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022 , Oct 2022
  56. A Comprehensive Survey of Automated Audio Captioning
    Xuenan Xu , Mengyue Wu, and Kai Yu
    CoRR, Oct 2022
  57. DialogZoo: Large-Scale Dialog-Oriented Task Learning
    Zhi Chen , Jijia Bao , Lu Chen, Yuncong Liu , Da Ma , Bei Chen , Mengyue Wu , Su Zhu , Jian-Guang Lou , and Kai Yu
    CoRR, Oct 2022
  58. Data augmentation based non-parallel voice conversion with frame-level speaker disentangler
    Bo Chen , Zhihang Xu , and Kai Yu
    Speech Commun., Oct 2022
  59. Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis
    Chenpeng Du , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
  60. Neural Fusion for Voice Cloning
    Bo Chen , Chenpeng Du , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2022
  61. META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
    Liangtai Sun , Xingyu Chen , Lu Chen, Tianle Dai , Zichen Zhu, and Kai Yu
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
  62. AdapterShare: Task Correlation Modeling with Adapter Differentiation
    Zhi Chen , Bei Chen , Lu ChenKai Yu, and Jian-Guang Lou
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022 , Oct 2022
  63. LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition
    Lingfeng Dai , Lu Chen, Zhikai Zhou , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  64. Text Adaptive Detection for Customizable Keyword Spotting
    Yu Xi , Tian Tan , Wangyou Zhang , Baochen Yang , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  65. Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis
    Yiwei Guo , Chenpeng Du , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022 , Oct 2022
  66. The AISP-SJTU Simultaneous Translation System for IWSLT 2022
    Qinpei Zhu , Renshou Wu , Guangfeng Liu , Xinyu Zhu , Xingyu Chen , Yang Zhou , Qingliang Miao , Rui Wang , and Kai Yu
    In Proceedings of the 19th International Conference on Spoken Language Translation, IWSLT@ACL 2022, Dublin, Ireland (in-person and online), May 26-27, 2022 , Oct 2022
  67. TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages
    Zihan Zhao , Lu Chen, Ruisheng Cao , Hongshen Xu , Xingyu Chen , and Kai Yu
    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022 , Oct 2022
  68. UniDU: Towards A Unified Generative Dialogue Understanding Framework
    Zhi Chen , Lu Chen , Bei Chen , Libo Qin , Yuncong Liu , Su Zhu , Jian-Guang Lou , and Kai Yu
    In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2022, Edinburgh, UK, 07-09 September 2022 , Oct 2022
  69. The AISP-SJTU Translation System for WMT 2022
    Guangfeng Liu , Qinpei Zhu , Xingyu Chen , Renjie Feng , Jianxin Ren , Renshou Wu , Qingliang Miao , Rui Wang , and Kai Yu
    In Proceedings of the Seventh Conference on Machine Translation, WMT 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 7-8, 2022 , Oct 2022

2021

  1. Modified Magnitude-Phase Spectrum Information for Spoofing Detection
    Jichen Yang , Hongji Wang , Rohan Kumar Das , and Yanmin Qian
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
  2. Audio-Visual Deep Neural Network for Robust Person Verification
    Yanmin Qian , Zhengyang Chen , and Shuai Wang
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
  3. Dual-Path Modeling for Long Recording Speech Separation in Meetings
    Chenda Li , Zhuo Chen , Yi Luo , Cong Han , Tianyan Zhou , Keisuke Kinoshita , Marc Delcroix , Shinji Watanabe , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  4. Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification
    Zhengyang Chen , Shuai Wang , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  5. SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification
    Chenpeng Du , Bing Han , Shuai Wang , Yanmin Qian , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  6. Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification
    Houjun Huang , Xu Xiang , Fei Zhao , Shuai Wang , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  7. AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge
    Houjun Huang , Xu Xiang , Yexin Yang , Rao Ma , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  8. AISpeech-SJTU ASR System for the Accented English Speech Recognition Challenge
    Tian Tan , Yizhou Lu , Rao Ma , Sen Zhu , Jiaqi Guo , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  9. Towards Data Selection on TTS Data for Children’s Speech Recognition
    Wei Wang , Zhikai Zhou , Yizhou Lu , Hongji Wang , Chenpeng Du , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  10. End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
    Wangyou Zhang , Christoph Böddeker , Shinji Watanabe , Tomohiro Nakatani , Marc Delcroix , Keisuke Kinoshita , Tsubasa Ochiai , Naoyuki Kamo , Reinhold Haeb-Umbach , and Yanmin Qian
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  11. The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
    Xian Shi , Fan Yu , Yizhou Lu , Yuhao Liang , Qiangze Feng , Daliang Wang , Yanmin Qian , and Lei Xie
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  12. Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation
    Christoph Böddeker , Wangyou Zhang , Tomohiro Nakatani , Keisuke Kinoshita , Tsubasa Ochiai , Marc Delcroix , Naoyuki Kamo , Yanmin Qian , and Reinhold Haeb-Umbach
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  13. Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
    Xun Gong , Yizhou Lu , Zhikai Zhou , and Yanmin Qian
    In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
  14. Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification
    Leying Zhang , Zhengyang Chen , and Yanmin Qian
    In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
  15. Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
    Zhengxi Liu , and Yanmin Qian
    In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
  16. The SJTU System for Short-Duration Speaker Verification Challenge 2021
    Bing Han , Zhengyang Chen , Zhikai Zhou , and Yanmin Qian
    In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
  17. Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party
    Yifei Wu , Chenda Li , Song Yang , Zhongqin Wu , and Yanmin Qian
    In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
  18. Speaker Embedding Augmentation with Noise Distribution Matching
    Xun Gong , Zhengyang Chen , Yexin Yang , Shuai Wang , Lan Wang , and Yanmin Qian
    In 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
  19. Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning
    Shuai Wang , Yexin Yang , Yanmin Qian , and Kai Yu
    In 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
  20. Data Augmentation for end-to-end Code-Switching Speech Recognition
    Chenpeng Du , Hao Li , Yizhou Lu , Lan Wang , and Yanmin Qian
    In IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021 , Oct 2021
  21. Dual-Path RNN for Long Recording Speech Separation
    Chenda Li , Yi Luo , Cong Han , Jinyu Li , Takuya Yoshioka , Tianyan Zhou , Marc Delcroix , Keisuke Kinoshita , Christoph Böddeker , Yanmin Qian , Shinji Watanabe , and Zhuo Chen
    In IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021 , Oct 2021
  22. Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions
    Wangyou Zhang , Jing Shi , Chenda Li , Shinji Watanabe , and Yanmin Qian
    In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021, New Paltz, NY, USA, October 17-20, 2021 , Oct 2021
  23. Towards Duration Robust Weakly Supervised Sound Event Detection
    Heinrich Dinkel , Mengyue Wu, and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
  24. Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training
    Heinrich Dinkel , Shuai Wang , Xuenan Xu , Mengyue Wu, and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2021
  25. Building Interpretable Interaction Trees for Deep NLP Models
    Die Zhang , Hao Zhang , Huilin Zhou , Xiaoyi Bao , Da Huo , Ruizhao Chen , Xu Cheng , Mengyue Wu, and Quanshi Zhang
    In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 , Oct 2021
  26. Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL
    Zhi Chen , Lu Chen, Hanqi Li , Ruisheng Cao , Da Ma , Mengyue Wu, and Kai Yu
    In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021 , Oct 2021
  27. Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging
    Zhiling Zhang , Zelin Zhou , Haifeng Tang , Guangwei Li , Mengyue Wu , and Kenny Q. Zhu
    In CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021 , Oct 2021
  28. Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events
    Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  29. Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning
    Xuenan Xu , Heinrich Dinkel , Mengyue Wu, Zeyu Xie , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021 , Oct 2021
  30. A Lightweight Framework for Online Voice Activity Detection in the Wild
    Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu
    In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
  31. Audio Caption in a Car Setting with a Sentence-Level Loss
    Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu
    In 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021, Hong Kong, January 24-27, 2021 , Oct 2021
  32. DEPA: Self-Supervised Audio Embedding for Depression Detection
    Pingyue Zhang , Mengyue Wu, Heinrich Dinkel , and Kai Yu
    In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021 , Oct 2021
  33. LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching
    Boer Lyu , Lu Chen , Su Zhu , and Kai Yu
    In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 , Oct 2021
  34. LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations
    Ruisheng Cao , Lu Chen , Zhi Chen , Yanbin Zhao , Su Zhu , and Kai Yu
    In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021 , Oct 2021
  35. WebSRC: A Dataset for Web-Based Structural Reading Comprehension
    Xingyu Chen , Zihan Zhao , Lu Chen, Jiabao Ji , Danyang Zhang , Ao Luo , Yuxuan Xiong , and Kai Yu
    In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021 , Oct 2021
  36. Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction
    Boer Lyu , Lu Chen, and Kai Yu
    In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021 , Oct 2021
  37. Class-Based Neural Network Language Model for Second-Pass Rescoring in ASR
    Lingfeng Dai , Qi Liu , and Kai Yu
    In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
  38. Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network
    Chenpeng Du , and Kai Yu
    In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021 , Oct 2021
  39. ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser
    Zhi Chen , Lu Chen, Yanbin Zhao , Ruisheng Cao , Zihan Xu , Su Zhu , and Kai Yu
    In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021 , Oct 2021
  40. Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF
    Su Zhu , Lu Chen, Ruisheng Cao , Zhi Chen , Qingliang Miao , and Kai Yu
    In Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13-17, 2021, Proceedings, Part I , Oct 2021
  41. Relation-Aware Multi-hop Reasoning forVisual Dialog
    Yao Zhao , Lu Chen, and Kai Yu
    In Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13-17, 2021, Proceedings, Part I , Oct 2021
  42. Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis
    Chenpeng Du , and Kai Yu
    CoRR, Oct 2021
  43. Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling
    Chenpeng Du , and Kai Yu
    CoRR, Oct 2021

2020

  1. Improving End-to-End Single-Channel Multi-Talker Speech Recognition
    Wangyou Zhang , Xuankai Chang , Yanmin Qian , and Shinji Watanabe
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
  2. Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition
    Shuai Wang , Yexin Yang , Zhanghao Wu , Yanmin Qian , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
  3. End-To-End Multi-Speaker Speech Recognition With Transformer
    Xuankai Chang , Wangyou Zhang , Yanmin Qian , Jonathan Le Roux , and Shinji Watanabe
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  4. Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings
    Yexin Yang , Shuai Wang , Xun Gong , Yanmin Qian , and Kai Yu
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  5. Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training
    Zhengyang Chen , Shuai Wang , Yanmin Qian , and Kai Yu
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  6. Deep Audio-Visual Speech Separation with Attention Mechanism
    Chenda Li , and Yanmin Qian
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  7. Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition
    Wangyou Zhang , and Yanmin Qian
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  8. End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming
    Wangyou Zhang , Aswin Shanmugam Subramanian , Xuankai Chang , Shinji Watanabe , and Yanmin Qian
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  9. Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection
    Hongji Wang , Heinrich Dinkel , Shuai Wang , Yanmin Qian , and Kai Yu
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  10. Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation
    Chenda Li , and Yanmin Qian
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  11. Multi-Modality Matters: A Performance Leap on VoxCeleb
    Zhengyang Chen , Shuai Wang , and Yanmin Qian
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  12. Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network
    Zhengyang Chen , Shuai Wang , and Yanmin Qian
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  13. Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts
    Yizhou Lu , Mingkun Huang , Hao Li , Jiaqi Guo , and Yanmin Qian
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  14. End-to-End Speaker-Dependent Voice Activity Detection
    Yefei Chen , Shuai Wang , Yanmin Qian , and Kai Yu
    CoRR, Oct 2020
  15. A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning
    Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu
    In Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), Tokyo, Japan (full virtual), November 2-4, 2020 , Oct 2020
  16. Multiple Sound Sources Localization from Coarse to Fine
    Rui Qian , Di Hu , Heinrich Dinkel , Mengyue Wu, Ning Xu , and Weiyao Lin
    In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XX , Oct 2020
  17. Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection
    Yefei Chen , Heinrich Dinkel , Mengyue Wu, and Kai Yu
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  18. GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection
    Heinrich Dinkel , Yefei Chen , Mengyue Wu, and Kai Yu
    CoRR, Oct 2020
  19. Interpreting Hierarchical Linguistic Interactions in DNNs
    Die Zhang , Huilin Zhou , Xiaoyi Bao , Da Huo , Ruizhao Chen , Xu Cheng , Hao Zhang , Mengyue Wu, and Quanshi Zhang
    CoRR, Oct 2020
  20. Towards a new generation of artificial intelligence in China
    Fei Wu , Cewu Lu , Mingjie Zhu , Hao Chen , Jun Zhu , Kai Yu, Lei Li , Ming Li , Qianfeng Chen , Xi Li , Xudong Cao , Zhongyuan Wang , Zhengjun Zha , Yueting Zhuang , and Yunhe Pan
    Nat. Mach. Intell., Oct 2020
  21. Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding
    Su Zhu , Zijian Zhao , Rao Ma , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
  22. Dual Learning for Semi-Supervised Natural Language Understanding
    Su Zhu , Ruisheng Cao , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
  23. Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model
    Qi Liu , Zhehuai Chen , Hao Li , Mingkun Huang , Yizhou Lu , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
  24. Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management
    Zhi Chen , Lu Chen, Xiaoyuan Liu , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
  25. Neural Network Language Model Compression With Product Quantization and Soft Binarization
    Kai Yu, Rao Ma , Kaiyu Shi , and Qi Liu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2020
  26. Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks
    Lu Chen, Boer Lv , Chi Wang , Su Zhu , Bowen Tan , and Kai Yu
    In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 , Oct 2020
  27. Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders
    Yanbin Zhao , Lu Chen , Zhi Chen , and Kai Yu
    In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 , Oct 2020
  28. Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks
    Yanbin Zhao , Lu Chen , Zhi Chen , Ruisheng Cao , Su Zhu , and Kai Yu
    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
  29. Neural Graph Matching Networks for Chinese Short Text Matching
    Lu Chen, Yanbin Zhao , Boer Lyu , Lesheng Jin , Zhi Chen , Su Zhu , and Kai Yu
    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
  30. Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing
    Ruisheng Cao , Su Zhu , Chenyu Yang , Chen Liu , Rao Ma , Yanbin Zhao , Lu Chen, and Kai Yu
    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 , Oct 2020
  31. Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking
    Su Zhu , Jieyu Li , Lu Chen, and Kai Yu
    In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 , Oct 2020
  32. Duration Robust Weakly Supervised Sound Event Detection
    Heinrich Dinkel , and Kai Yu
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  33. Investigation of Specaugment for Deep Speaker Embedding Learning
    Shuai Wang , Johan Rohdin , Oldrich Plchot , Lukás Burget , Kai Yu, and Jan Cernocký
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  34. Speaker Augmentation for Low Resource Speech Recognition
    Chenpeng Du , and Kai Yu
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  35. Neural Lattice Search for Speech Recognition
    Rao Ma , Hao Li , Qi Liu , Lu Chen, and Kai Yu
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  36. A Hierarchical Tracker for Multi-Domain Dialogue State Tracking
    Jieyu Li , Su Zhu , and Kai Yu
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  37. Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense Embeddings
    Rao Ma , Lesheng Jin , Qi Liu , Lu Chen, and Kai Yu
    In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020 , Oct 2020
  38. CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs
    Han Zhao , Weihao Cui , Quan Chen , Jingwen Leng , Kai Yu, Deze Zeng , Chao Li , and Minyi Guo
    In 40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020, Singapore, November 29 - December 1, 2020 , Oct 2020
  39. Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding
    Chen Liu , Su Zhu , Zijian Zhao , Ruisheng Cao , Lu Chen, and Kai Yu
    In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020 , Oct 2020
  40. Memory Attention Neural Network for Multi-domain Dialogue State Tracking
    Zihan Xu , Zhi Chen , Lu Chen , Su Zhu , and Kai Yu
    In Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
  41. Robust Spoken Language Understanding with RL-Based Value Error Recovery
    Chen Liu , Su Zhu , Lu Chen, and Kai Yu
    In Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
  42. An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models
    Zihan Zhao , Yuncong Liu , Lu Chen, Qi Liu , Rao Ma , and Kai Yu
    In Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14-18, 2020, Proceedings, Part I , Oct 2020
  43. An Investigation on Deep Learning with Beta Stabilizer
    Qi Liu , Tian Tan , and Kai Yu
    CoRR, Oct 2020
  44. Vector Projection Network for Few-shot Slot Tagging in Natural Language Understanding
    Su Zhu , Ruisheng Cao , Lu Chen, and Kai Yu
    CoRR, Oct 2020
  45. Deep Reinforcement Learning for On-line Dialogue State Tracking
    Zhi Chen , Lu Chen, Xiang Zhou , and Kai Yu
    CoRR, Oct 2020
  46. Structured Hierarchical Dialogue Policy with Graph Neural Networks
    Zhi Chen , Xiaoyuan Liu , Lu Chen, and Kai Yu
    CoRR, Oct 2020
  47. Dual Learning for Dialogue State Tracking
    Zhi Chen , Lu Chen, Yanbin Zhao , Su Zhu , and Kai Yu
    CoRR, Oct 2020
  48. CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking
    Zhi Chen , Lu Chen, Zihan Xu , Yanbin Zhao , Su Zhu , and Kai Yu
    CoRR, Oct 2020

2019

  1. Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem
    Yanmin Qian , Chao Weng , Xuankai Chang , Shuai Wang , and Dong Yu
    Frontiers Inf. Technol. Electron. Eng., Oct 2019
  2. Binary neural networks for speech recognition
    Yanmin Qian , and Xu Xiang
    Frontiers Inf. Technol. Electron. Eng., Oct 2019
  3. Data augmentation using generative adversarial networks for robust speech recognition
    Yanmin Qian , Hu Hu , and Tian Tan
    Speech Commun., Oct 2019
  4. Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification
    Shuai Wang , Zili Huang , Yanmin Qian , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2019
  5. Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
    Xu Xiang , Shuai Wang , Houjun Huang , Yanmin Qian , and Kai Yu
    In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Lanzhou, China, November 18-21, 2019 , Oct 2019
  6. GANs for Children: A Generative Data Augmentation Strategy for Children Speech Recognition
    Peiyao Sheng , Zhuolin Yang , and Yanmin Qian
    In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
  7. MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition
    Xuankai Chang , Wangyou Zhang , Yanmin Qian , Jonathan Le Roux , and Shinji Watanabe
    In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
  8. Exploring Model Units and Training Strategies for End-to-End Speech Recognition
    Mingkun Huang , Yizhou Lu , Lan Wang , Yanmin Qian , and Kai Yu
    In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
  9. End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform
    Wangyou Zhang , Man Sun , Lan Wang , and Yanmin Qian
    In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
  10. Knowledge Distillation for Small Foot-print Deep Speaker Embedding
    Shuai Wang , Yexin Yang , Tianzhe Wang , Yanmin Qian , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
  11. End-to-end Monaural Multi-speaker ASR System without Pretraining
    Xuankai Chang , Yanmin Qian , Kai Yu, and Shinji Watanabe
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
  12. The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge
    Yexin Yang , Hongji Wang , Heinrich Dinkel , Zhengyang Chen , Shuai Wang , Yanmin Qian , and Kai Yu
    In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
  13. On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction
    Shuai Wang , Johan Rohdin , Lukás Burget , Oldrich Plchot , Yanmin Qian , Kai Yu, and Jan Cernocký
    In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
  14. Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification
    Zhanghao Wu , Shuai Wang , Yanmin Qian , and Kai Yu
    In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
  15. Joint Decoding of CTC Based Systems for Speech Recognition
    Jiaqi Guo , Yongbin You , Yanmin Qian , and Kai Yu
    In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
  16. Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System
    Wangyou Zhang , Xuankai Chang , and Yanmin Qian
    In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
  17. Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking
    Wangyou Zhang , Ying Zhou , and Yanmin Qian
    In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
  18. Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training
    Hongji Wang , Heinrich Dinkel , Shuai Wang , Yanmin Qian , and Kai Yu
    In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
  19. Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech
    Chenda Li , and Yanmin Qian
    In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019 , Oct 2019
  20. Audio Caption: Listen and Tell
    Mengyue Wu, Heinrich Dinkel , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
  21. Text-based Depression Detection: What Triggers An Alert
    Heinrich Dinkel , Mengyue Wu, and Kai Yu
    CoRR, Oct 2019
  22. What does a Car-ssette tape tell?
    Xuenan Xu , Heinrich Dinkel , Mengyue Wu, and Kai Yu
    CoRR, Oct 2019
  23. AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning
    Lu Chen , Zhi Chen , Bowen Tan , Sishan Long , Milica Gasic , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2019
  24. Semantic Parsing with Dual Learning
    Ruisheng Cao , Su Zhu , Chen Liu , Jieyu Li , and Kai Yu
    In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers , Oct 2019
  25. Highly Efficient Neural Network Language Model Compression Using Soft Binarization Training
    Rao Ma , Qi Liu , and Kai Yu
    In IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019 , Oct 2019
  26. Data Augmentation with Atomic Templates for Spoken Language Understanding
    Zijian Zhao , Su Zhu , and Kai Yu
    In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019 , Oct 2019
  27. A Hierarchical Decoding Model for Spoken Language Understanding from Unaligned Data
    Zijian Zhao , Su Zhu , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019 , Oct 2019
  28. CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding Challenge
    Su Zhu , Zijian Zhao , Tiejun Zhao , Chengqing Zong , and Kai Yu
    In International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019 , Oct 2019
  29. Robust Spoken Language Understanding with Acoustic and Domain Knowledge
    Hao Li , Chen Liu , Su Zhu , and Kai Yu
    In International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019 , Oct 2019
  30. Cross Aggregation of Multi-head Attention for Neural Machine Translation
    Juncheng Cao , Hai Zhao , and Kai Yu
    In Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9-14, 2019, Proceedings, Part I , Oct 2019
  31. International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019
    Oct 2019

2018

  1. Past review, current progress, and challenges ahead on the cocktail party problem
    Yanmin Qian , Chao Weng , Xuankai Chang , Shuai Wang , and Dong Yu
    Frontiers Inf. Technol. Electron. Eng., Oct 2018
  2. Erratum to: Past review, current progress, and challenges ahead on the cocktail party problem
    Yanmin Qian , Chao Weng , Xuankai Chang , Shuai Wang , and Dong Yu
    Frontiers Inf. Technol. Electron. Eng., Oct 2018
  3. Sequence discriminative training for deep learning based acoustic keyword spotting
    Zhehuai Chen , Yanmin Qian , and Kai Yu
    Speech Commun., Oct 2018
  4. Single-channel multi-talker speech recognition with permutation invariant training
    Yanmin Qian , Xuankai Chang , and Dong Yu
    Speech Commun., Oct 2018
  5. Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition
    Tian Tan , Yanmin Qian , Hu Hu , Ying Zhou , Wen Ding , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
  6. Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection
    Heinrich Dinkel , Yanmin Qian , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
  7. Robust Mask Estimation By Integrating Neural Network-Based and Clustering-Based Approaches for Adaptive Acoustic Beamforming
    Ying Zhou , and Yanmin Qian
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  8. Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech Recognition
    Tian Tan , Yanmin Qian , and Dong Yu
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  9. Joint I-Vector with End-to-End System for Short Duration Text-Independent Speaker Verification
    Zili Huang , Shuai Wang , and Yanmin Qian
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  10. Generative Adversarial Networks Based Data Augmentation for Noise Robust Speech Recognition
    Hu Hu , Tian Tan , and Yanmin Qian
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  11. Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification
    Shuai Wang , Yanmin Qian , and Kai Yu
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  12. Noise Robust Speech Recognition on Aurora4 by Humans and Machines
    Yanmin Qian , Tian Tan , Hu Hu , and Qi Liu
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  13. Fast Adaptation on Deepmixture Generative Network Based Acoustic Modeling
    Wen Ding , Tian Tan , and Yanmin Qian
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  14. Adaptive Permutation Invariant Training with Auxiliary Information for Monaural Multi-Talker Speech Recognition
    Xuankai Chang , Yanmin Qian , and Dong Yu
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  15. Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
    Lianwu Chen , Meng Yu , Yanmin Qian , Dan Su , and Dong Yu
    In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
  16. Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
    Jun Wang , Jie Chen , Dan Su , Lianwu Chen , Meng Yu , Yanmin Qian , and Dong Yu
    In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
  17. Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
    Xuankai Chang , Yanmin Qian , and Dong Yu
    In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
  18. Knowledge Distillation for Sequence Model
    Mingkun Huang , Yongbin You , Zhehuai Chen , Yanmin Qian , and Kai Yu
    In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
  19. Covariance Based Deep Feature for Text-Dependent Speaker Verification
    Shuai Wang , Heinrich Dinkel , Yanmin Qian , and Kai Yu
    In Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers , Oct 2018
  20. Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition
    Peiyao Sheng , Zhuolin Yang , Hu Hu , Tian Tan , and Yanmin Qian
    In 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
  21. Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition
    Shuai Wang , Zili Huang , Yanmin Qian , and Kai Yu
    In 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
  22. Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification
    Yexin Yang , Shuai Wang , Man Sun , Yanmin Qian , and Kai Yu
    In 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018, Taipei City, Taiwan, November 26-29, 2018 , Oct 2018
  23. Rich Short Text Conversation Using Semantic-Key-Controlled Sequence Generation
    Kai Yu, Zijian Zhao , Xueyang Wu , Hongtao Lin , and Xuan Liu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2018
  24. Structured Dialogue Policy with Graph Neural Networks
    Lu Chen, Bowen Tan , Sishan Long , and Kai Yu
    In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018 , Oct 2018
  25. Towards Universal Dialogue State Tracking
    Liliang Ren , Kaige Xie , Lu Chen, and Kai Yu
    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018 , Oct 2018
  26. On Modular Training of Neural Acoustics-to-Word Model for LVCSR
    Zhehuai Chen , Qi Liu , Hao Li , and Kai Yu
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  27. Semi-Supervised Training Using Adversarial Multi-Task Learning for Spoken Language Understanding
    Ouyu Lan , Su Zhu , and Kai Yu
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  28. Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management
    Lu Chen, Cheng Chang , Zhi Chen , Bowen Tan , Milica Gasic , and Kai Yu
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  29. Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation
    Su Zhu , Ouyu Lan , and Kai Yu
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018 , Oct 2018
  30. MLN: Moment localization Network and Samples Selection for Moment Retrieval
    Bo Huang , Ya Zhang , and Kai Yu
    In Proceedings of the 2nd International Conference on Video and Image Processing, ICVIP 2018, Hong Kong, China, December 29-31, 2018 , Oct 2018
  31. Angular Softmax for Short-Duration Text-independent Speaker Verification
    Zili Huang , Shuai Wang , and Kai Yu
    In Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018 , Oct 2018
  32. Joint Spoken Language Understanding and Domain Adaptive Language Modeling
    Huifeng Zhang , Su Zhu , Shuai Fan , and Kai Yu
    In Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers , Oct 2018
  33. Binarized LSTM Language Model
    Xuan Liu , Di Cao , and Kai Yu
    In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers) , Oct 2018
  34. Cost-Sensitive Active Learning for Dialogue State Tracking
    Kaige Xie , Cheng Chang , Liliang Ren , Lu Chen, and Kai Yu
    In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 , Oct 2018
  35. Concept Transfer Learning for Adaptive Language Understanding
    Su Zhu , and Kai Yu
    In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia, July 12-14, 2018 , Oct 2018
  36. Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers
    Oct 2018

2017

  1. Phone Synchronous Speech Recognition With CTC Lattices
    Zhehuai Chen , Yimeng Zhuang , Yanmin Qian , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2017
  2. Deep Feature Engineering for Noise Robust Spoofing Detection
    Yanmin Qian , Nanxin Chen , Heinrich Dinkel , and Zhizheng Wu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2017
  3. Integrating online i-vector into GMM-UBM for text-dependent speaker verification
    Xiaowei Jiang , Shuai Wang , Xu Xiang , and Yanmin Qian
    In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017, Kuala Lumpur, Malaysia, December 12-15, 2017 , Oct 2017
  4. Future vector enhanced LSTM language model for LVCSR
    Qi Liu , Yanmin Qian , and Kai Yu
    In 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017 , Oct 2017
  5. Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR
    Yue Wu , Tianxing He , Zhehuai Chen , Yanmin Qian , and Kai Yu
    In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 16th China National Conference, CCL 2017, - and - 5th International Symposium, NLP-NABD 2017, Nanjing, China, October 13-15, 2017, Proceedings , Oct 2017
  6. End-to-end spoofing detection with raw waveform CLDNNS
    Heinrich Dinkel , Nanxin Chen , Yanmin Qian , and Kai Yu
    In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
  7. Small-footprint convolutional neural network for spoofing detection
    Heinrich Dinkel , Yanmin Qian , and Kai Yu
    In 2017 International Joint Conference on Neural Networks, IJCNN 2017, Anchorage, AK, USA, May 14-19, 2017 , Oct 2017
  8. Binary Deep Neural Networks for Speech Recognition
    Xu Xiang , Yanmin Qian , and Kai Yu
    In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
  9. What Does the Speaker Embedding Encode?
    Shuai Wang , Yanmin Qian , and Kai Yu
    In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
  10. Recognizing Multi-Talker Speech with Permutation Invariant Training
    Dong Yu , Xuankai Chang , and Yanmin Qian
    In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
  11. A Unified Confidence Measure Framework Using Auxiliary Normalization Graph
    Zhehuai Chen , Yanmin Qian , and Kai Yu
    In Intelligence Science and Big Data Engineering - 7th International Conference, IScIDE 2017, Dalian, China, September 22-23, 2017, Proceedings , Oct 2017
  12. Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition
    Khe Chai Sim , Yanmin Qian , Gautam Mantena , Lahiru Samarakoon , Souvik Kundu , and Tian Tan
    In New Era for Robust Speech Recognition, Exploiting Deep Learning , Oct 2017
  13. On-line Dialogue Policy Learning with Companion Teaching
    Lu Chen, Runzhe Yang , Cheng Chang , Zihao Ye , Xiang Zhou , and Kai Yu
    In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 2: Short Papers , Oct 2017
  14. Affordable On-line Dialogue Policy Learning
    Cheng Chang , Runzhe Yang , Lu Chen, Xiang Zhou , and Kai Yu
    In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , Oct 2017
  15. Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning
    Lu Chen, Xiang Zhou , Cheng Chang , Runzhe Yang , and Kai Yu
    In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 , Oct 2017
  16. Confidence measures for CTC-based phone synchronous decoding
    Zhehuai Chen , Yimeng Zhuang , and Kai Yu
    In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
  17. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding
    Su Zhu , and Kai Yu
    In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017 , Oct 2017
  18. Discrete Duration Model for Speech Synthesis
    Bo Chen , Tianling Bian , and Kai Yu
    In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017 , Oct 2017
  19. Deep Attentive Structured Language Model Based on LSTM
    Di Cao , and Kai Yu
    In Intelligence Science and Big Data Engineering - 7th International Conference, IScIDE 2017, Dalian, China, September 22-23, 2017, Proceedings , Oct 2017
  20. splab at the NTCIR-13 STC-2 Task
    Xuan Liu , Xueyang Wu , Ruinian Chen , Zijian Zhao , Hongtao Lin , and Kai Yu
    In The 13th NTCIR Conference, Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, December 5-8, 2017 , Oct 2017

2016

  1. Deep features for automatic spoofing detection
    Yanmin Qian , Nanxin Chen , and Kai Yu
    Speech Commun., Oct 2016
  2. Cluster Adaptive Training for Deep Neural Network Based Acoustic Model
    Tian Tan , Yanmin Qian , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
  3. Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition
    Yanmin Qian , Tian Tan , and Dong Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
  4. Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
    Yanmin Qian , Mengxiao Bi , Tian Tan , and Kai Yu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2016
  5. Overview of BTAS 2016 speaker anti-spoofing competition
    Pavel Korshunov , Sébastien Marcel , Hannah Muckenhirn , André R. Gonçalves , A. G. Souza Mello , Ricardo Paranhos Velloso Violato , Flávio Olmos Simões , M. U. Neto , Marcus Assis Angeloni , José Augusto Stuchi , Heinrich Dinkel , Nanxin Chen , Yanmin Qian , Dipjyoti Paul , Goutam Saha , and Md. Sahidullah
    In 8th IEEE International Conference on Biometrics Theory, Applications and Systems, BTAS 2016, Niagara Falls, NY, USA, September 6-9, 2016 , Oct 2016
  6. Joint acoustic factor learning for robust deep neural network based automatic speech recognition
    Souvik Kundu , Gautam Mantena , Yanmin Qian , Tian Tan , Marc Delcroix , and Khe Chai Sim
    In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
  7. Speaker-aware training of LSTM-RNNS for acoustic modelling
    Tian Tan , Yanmin Qian , Dong Yu , Souvik Kundu , Liang Lu , Khe Chai Sim , Xiong Xiao , and Yu Zhang
    In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
  8. Improved DNN-based segmentation for multi-genre broadcast audio
    Linlin Wang , Chao Zhang , Philip C. Woodland , Mark J. F. Gales , Panagiota Karanasou , Pierre Lanchantin , Xunying Liu , and Yanmin Qian
    In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
  9. An investigation into using parallel data for far-field speech recognition
    Yanmin Qian , Tian Tan , and Dong Yu
    In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
  10. Integrated adaptation with multi-factor joint-learning for far-field speech recognition
    Yanmin Qian , Tian Tan , Dong Yu , and Yu Zhang
    In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
  11. Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC
    Yimeng Zhuang , Xuankai Chang , Yanmin Qian , and Kai Yu
    In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
  12. Multi-task joint-learning for robust voice activity detection
    Yimeng Zhuang , Sibo Tong , Maofan Yin , Yanmin Qian , and Kai Yu
    In 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
  13. Very deep convolutional neural networks for robust speech recognition
    Yanmin Qian , and Philip C. Woodland
    In 2016 IEEE Spoken Language Technology Workshop, SLT 2016, San Diego, CA, USA, December 13-16, 2016 , Oct 2016
  14. Evolvable dialogue state tracking for statistical dialogue management
    Kai YuLu Chen, Kai Sun , Qizhe Xie , and Su Zhu
    Frontiers Comput. Sci., Oct 2016
  15. Discriminatively trained joint speaker and environment representations for adaptation of deep neural network acoustic models
    Maofan Yin , Sunil Sivadas , Kai Yu, and Bin Ma
    In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
  16. A comparative study of robustness of deep learning approaches for VAD
    Sibo Tong , Hao Gu , and Kai Yu
    In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , Oct 2016
  17. Phone Synchronous Decoding with CTC Lattice
    Zhehuai Chen , Wei Deng , Tao Xu , and Kai Yu
    In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
  18. Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues
    Kai Sun , Su Zhu , Lu Chen, Siqiu Yao , Xueyang Wu , and Kai Yu
    In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016 , Oct 2016
  19. On training bi-directional neural network language model with noise contrastive estimation
    Tianxing He , Yu Zhang , Jasha Droppo , and Kai Yu
    In 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
  20. Rich punctuations prediction using large-scale deep learning
    Xueyang Wu , Su Zhu , Yue Wu , and Kai Yu
    In 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
  21. Directed automatic speech transcription error correction using bidirectional LSTM
    Da Zheng , Zhehuai Chen , Yue Wu , and Kai Yu
    In 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016, Tianjin, China, October 17-20, 2016 , Oct 2016
  22. The splab at the NTCIR-12 Short Text Conversation Task
    Ke Wu , Xuan Liu , and Kai Yu
    In Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016 , Oct 2016

2015

  1. Deep feature for text-dependent speaker verification
    Yuan Liu , Yanmin Qian , Nanxin Chen , Tianfan Fu , Ya Zhang , and Kai Yu
    Speech Commun., Oct 2015
  2. Multi-task joint-learning of deep neural networks for robust speech recognition
    Yanmin Qian , Maofan Yin , Yongbin You , and Kai Yu
    In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
  3. Cambridge university transcription systems for the multi-genre broadcast challenge
    Philip C. Woodland , Xunying Liu , Yanmin Qian , Chao Zhang , Mark J. F. Gales , Penny Karanasou , Pierre Lanchantin , and Linlin Wang
    In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
  4. The development of the cambridge university alignment systems for the multi-genre broadcast challenge
    Pierre Lanchantin , Mark J. F. Gales , Penny Karanasou , Xunying Liu , Yanmin Qian , Linlin Wang , Philip C. Woodland , and Chao Zhang
    In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
  5. Speaker diarisation and longitudinal linking in multi-genre broadcast data
    Penny Karanasou , Mark J. F. Gales , Pierre Lanchantin , Xunying Liu , Yanmin Qian , Linlin Wang , Philip C. Woodland , and Chao Zhang
    In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, AZ, USA, December 13-17, 2015 , Oct 2015
  6. Local trajectory based speech enhancement for robust speech recognition with deep neural network
    Yongbin You , Yanmin Qian , and Kai Yu
    In IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015, Chengdu, China, July 12-15, 2015 , Oct 2015
  7. An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition
    Yongbin You , Yanmin Qian , Tianxing He , and Kai Yu
    In IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015, Chengdu, China, July 12-15, 2015 , Oct 2015
  8. Cluster adaptive training for deep neural network
    Tian Tan , Yanmin Qian , Maofan Yin , Yimeng Zhuang , and Kai Yu
    In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
  9. A novel static parameter calculation method for model compensation
    Suliang Bu , Yunxin Zhao , Yanmin Qian , and Kai Yu
    In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
  10. Recurrent neural network language model with structured word embeddings for speech recognition
    Tianxing He , Xu Xiang , Yanmin Qian , and Kai Yu
    In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015 , Oct 2015
  11. Automatic model redundancy reduction for fast back-propagation for deep neural networks in speech recognition
    Yanmin Qian , Tianxing He , Wei Deng , and Kai Yu
    In 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, July 12-17, 2015 , Oct 2015
  12. Multi-task learning for text-dependent speaker verification
    Nanxin Chen , Yanmin Qian , and Kai Yu
    In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
  13. Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge
    Nanxin Chen , Yanmin Qian , Heinrich Dinkel , Bo Chen , and Kai Yu
    In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
  14. Very deep convolutional neural networks for LVCSR
    Mengxiao Bi , Yanmin Qian , and Kai Yu
    In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
  15. Paragraph vector based topic model for language model adaptation
    Wengong Jin , Tianxing He , Yanmin Qian , and Kai Yu
    In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
  16. Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking
    Kai Yu, Kai Sun , Lu Chen , and Su Zhu
    IEEE ACM Trans. Audio Speech Lang. Process., Oct 2015
  17. An investigation of context clustering for statistical speech synthesis with deep neural network
    Bo Chen , Zhehuai Chen , Jiachen Xu , and Kai Yu
    In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015 , Oct 2015
  18. Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers
    Qizhe Xie , Kai Sun , Su Zhu , Lu Chen, and Kai Yu
    In Proceedings of the SIGDIAL 2015 Conference, The 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2-4 September 2015, Prague, Czech Republic , Oct 2015
  19. Hyper-parameter Optimisation of Gaussian Process Reinforcement Learning for Statistical Dialogue Management
    Lu Chen, Pei-Hao Su , and Milica Gasic
    In Proceedings of the SIGDIAL 2015 Conference, The 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2-4 September 2015, Prague, Czech Republic , Oct 2015

2014

  1. Stochastic data sweeping for fast DNN training
    Wei Deng , Yanmin Qian , Yuchen Fan , Tianfan Fu , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
  2. Reshaping deep neural network for fast decoding by node-pruning
    Tianxing He , Yuchen Fan , Yanmin Qian , Tian Tan , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
  3. Second order vector taylor series based robust speech recognition
    Suliang Bu , Yanmin Qian , Khe Chai Sim , Yongbin You , and Kai Yu
    In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4-9, 2014 , Oct 2014
  4. Speaker verification with deep features
    Yuan Liu , Tianfan Fu , Yuchen Fan , Yanmin Qian , and Kai Yu
    In 2014 International Joint Conference on Neural Networks, IJCNN 2014, Beijing, China, July 6-11, 2014 , Oct 2014
  5. Tandem deep features for text-dependent speaker verification
    Tianfan Fu , Yanmin Qian , Yuan Liu , and Kai Yu
    In INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014 , Oct 2014
  6. A novel dynamic parameters calculation approach for model compensation
    Suliang Bu , Yanmin Qian , and Kai Yu
    In INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014 , Oct 2014
  7. Acoustic emotion recognition using deep neural network
    Jianwei Niu , Yanmin Qian , and Kai Yu
    In The 9th International Symposium on Chinese Spoken Language Processing, Singapore, September 12-14, 2014 , Oct 2014
  8. The SJTU System for Dialog State Tracking Challenge 2
    Kai Sun , Lu Chen , Su Zhu , and Kai Yu
    In Proceedings of the SIGDIAL 2014 Conference, The 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 18-20 June 2014, Philadelphia, PA, USA , Oct 2014
  9. A generalized rule based tracker for dialogue state tracking
    Kai Sun , Lu Chen , Su Zhu , and Kai Yu
    In 2014 IEEE Spoken Language Technology Workshop, SLT 2014, South Lake Tahoe, NV, USA, December 7-10, 2014 , Oct 2014
  10. Semantic parser enhancement for dialogue domain extension with little data
    Su Zhu , Lu Chen, Kai Sun , Da Zheng , and Kai Yu
    In 2014 IEEE Spoken Language Technology Workshop, SLT 2014, South Lake Tahoe, NV, USA, December 7-10, 2014 , Oct 2014

2013

  1. Combination of data borrowing strategies for low-resource LVCSR
    Yanmin Qian , Kai Yu, and Jia Liu
    In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, December 8-12, 2013 , Oct 2013
  2. MLP-HMM two-stage unsupervised training for low-resource languages on conversational telephone speech recognition
    Yanmin Qian , and Jia Liu
    In INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013 , Oct 2013
  3. A New Word Language Model Evaluation Metric for Character Based Languages
    Peilu Wang , Ruihua Sun , Hai Zhao , and Kai Yu
    In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 12th China National Conference, CCL 2013 and First International Symposium, NLP-NABD 2013, Suzhou, China, October 10-12, 2013. Proceedings , Oct 2013

2012

  1. Introduction to the Issue on Advances in Spoken Dialogue Systems and Mobile Interface
    Jason D. Williams , Kai Yu, Brahim Chaib-draa , Oliver Lemon , Roberto Pieraccini , Olivier Pietquin , Pascal Poupart , and Steve J. Young
    IEEE J. Sel. Top. Signal Process., Oct 2012
  2. ICMI’12 grand challenge: haptic voice recognition
    Khe Chai Sim , Shengdong Zhao , Kai Yu, and Hank Liao
    In International Conference on Multimodal Interaction, ICMI ’12, Santa Monica, CA, USA, October 22-26, 2012 , Oct 2012
  3. Development of the 2012 SJTU HVR system
    Hainan Xu , Yuchen Fan , and Kai Yu
    In International Conference on Multimodal Interaction, ICMI ’12, Santa Monica, CA, USA, October 22-26, 2012 , Oct 2012