Geneface: Generalized and high-fidelity audio-driven 3d talking face synthesis Z Ye, Z Jiang, Y Ren, J Liu, J He, Z Zhao ICLR 2023, 2023 | 118 | 2023 |
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang, S Ji, R Huang, C Wang, ... Merged with Mega-TTS 2, 2023 | 77 | 2023 |
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang, C Zhang, P Wei, C Wang, ... ICLR 2024, 2024 | 65* | 2024 |
Textrolspeech: A text style control speech corpus with codec language text-to-speech models S Ji, J Zuo, M Fang, Z Jiang, F Chen, X Duan, B Huai, Z Zhao ICASSP 2024, 10301-10305, 2024 | 35 | 2024 |
Geneface++: Generalized and stable real-time audio-driven 3d talking face generation Z Ye, J He, Z Jiang, R Huang, J Huang, J Liu, Y Ren, X Yin, Z Ma, Z Zhao arXiv preprint arXiv:2305.00787, 2023 | 35 | 2023 |
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners R Huang, C Zhang, Y Wang, D Yang, J Tian, Z Ye, L Liu, Z Wang, Z Jiang, ... ACL 2024, 10929-10942, 2024 | 34* | 2024 |
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension Q Yang, J Xu, W Liu, Y Chu, Z Jiang, X Zhou, Y Leng, Y Lv, Z Zhao, ... ACL 2024, 2024 | 32 | 2024 |
Real3d-portrait: One-shot realistic 3d talking portrait synthesis Z Ye, T Zhong, Y Ren, J Yang, W Li, J Huang, Z Jiang, J He, R Huang, ... ICLR 2024, 2024 | 27 | 2024 |
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ... arXiv preprint arXiv:2408.16532, 2024 | 25 | 2024 |
Self-Supervised Spoofing Audio Detection Scheme. J Ziyue, Z Hongcheng, P Li, D Wenbing, R Yanzhen InterSpeech 2020, 4223-4227, 2020 | 25 | 2020 |
FedSpeech: Federated Text-to-Speech with Continual Learning Z Jiang, Y Ren, M Lei, Z Zhao IJCAI 2021, 3829-3835, 2021 | 22 | 2021 |
Clapspeech: Learning prosody from text context with contrastive language-audio pre-training Z Ye, R Huang, Y Ren, Z Jiang, J Liu, J He, X Yin, Z Zhao ACL 2023, 2023 | 19 | 2023 |
Language-codec: Reducing the gaps between discrete codec representation and speech language models S Ji, M Fang, Z Jiang, S Zheng, Q Chen, R Huang, J Zuo, S Wang, Z Zhao arXiv preprint arXiv:2402.12208, 2024 | 15 | 2024 |
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models Z Jiang, Q Yang, J Zuo, Z Ye, R Huang, Y Ren, Z Zhao ACL 2023, 2023 | 13 | 2023 |
FastDiff 2: Revisiting and incorporating GANs and diffusion models in high-fidelity speech synthesis R Huang, Y Ren, Z Jiang, C Cui, J Liu, Z Zhao ACL 2023, 6994-7009, 2023 | 9 | 2023 |
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech S Ji, Z Jiang, H Wang, J Zuo, Z Zhao ACL 2024, 2024 | 8 | 2024 |
Wavchat: A survey of spoken dialogue models S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ... arXiv preprint arXiv:2411.13577, 2024 | 7 | 2024 |
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ... arXiv preprint arXiv:2406.01205, 2024 | 7 | 2024 |
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech Z Jiang, Z Su, Z Zhao, Q Yang, Y Ren, J Liu, Z Ye NeurIPS 2022, 2022 | 5 | 2022 |
Ada-TTA: Towards adaptive high-quality text-to-talking avatar synthesis Z Ye, Z Jiang, Y Ren, J Liu, C Zhang, X Yin, Z Ma, Z Zhao arXiv preprint arXiv:2306.03504, 2023 | 4 | 2023 |