Follow
Zehan Wang
Title
Cited by
Cited by
Year
Chat-3d: Data-efficiently tuning large language model for universal dialogue of 3d scenes
Z Wang, H Huang, Y Zhao, Z Zhang, Z Zhao
NAACL 2025, 2023
572023
Connecting Multi-modal Contrastive Representations
Z Wang, Y Zhao, X Cheng, H Huang, J Liu, L Tang, L Li, Y Wang, A Yin, ...
NeurIPS 2023, 2023
352023
Make-a-voice: Revisiting voice large language models as scalable multilingual and multitask learners
R Huang, C Zhang, Y Wang, D Yang, J Tian, Z Ye, L Liu, Z Wang, Z Jiang, ...
Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024
34*2024
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling
S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ...
ICLR 2025, 2024
332024
Chat-3d v2: Bridging 3d scene and large language models with object identifiers
H Huang*, Z Wang*, R Huang, L Liu, X Cheng, Y Zhao, T Jin, Z Zhao
arXiv preprint arXiv:2312.08168, 2023
312023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
X Cheng, T Jin, R Huang, L Li, W Lin, Z Wang, Y Wang, H Liu, A Yin, ...
ICCV 2023, 15735-15745, 2023
252023
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
L Zhuo, R Du, H Xiao, Y Li, D Liu, R Huang, W Liu, L Zhao, FY Wang, ...
NeurIPS 2024, 2024
21*2024
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao
EMNLP 2023, 2023
192023
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao
ICCV 2023, 2023
192023
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Z Wang, Z Zhang, X Cheng, R Huang, L Liu, Z Ye, H Huang, Y Zhao, T Jin, ...
ICML 2024, 2024
18*2024
Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching
Y Wang, W Guo, R Huang, J Huang, Z Wang, F You, R Li, Z Zhao
NeurIPS 2024, 2024
17*2024
Chat-scene: Bridging 3d scene and large language models with object identifiers
H Huang*, Y Chen*, Z Wang*, R Huang, R Xu, T Wang, L Liu, X Cheng, ...
NeurIPS 2024, 2024
152024
Wavchat: A survey of spoken dialogue models
S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ...
arXiv preprint arXiv:2411.13577, 2024
152024
Extending multi-modal contrastive representations
Z Wang, Z Zhang, L Liu, Y Zhao, H Huang, T Jin, Z Zhao
NeurIPS 2024, 2023
102023
Omnibind: Large-scale omni multimodal representation via binding spaces
Z Wang, Z Zhang, H Zhang, L Liu, R Huang, X Cheng, H Zhao, Z Zhao
ICLR 2025, 2024
92024
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec
S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ...
arXiv preprint arXiv:2406.01205, 2024
92024
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
X Cheng, R Huang, L Li, T Jin, Z Wang, A Yin, M Li, X Duan, Z Zhao
ACL 2024, 2023
82023
Scene-robust natural language video localization via learning domain-invariant representations
Z Wang, Y Zhao, H Huang, Y Xia, Z Zhao
ACL 2023, 144-160, 2023
62023
Action Imitation in Common Action Space for Customized Action Image Synthesis
W Lin, J Chen, J Shi, Z Guo, Y Zhu, Z Wang, T Jin, Z Zhao, F Wu, ...
NeurIPS 2024, 2024
32024
VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation
R Huang, Y Wang, R Hu, X Xu, Z Hong, D Yang, X Cheng, Z Wang, ...
Proceedings of the 32nd ACM International Conference on Multimedia, 10630-10639, 2024
22024
The system can't perform the operation now. Try again later.
Articles 1–20