Yupan Huang

Cited by

	All	Since 2019
Citations	800	800
h-index	9	9
i10-index	9	9

420

210

105

315

2019202020212022202320242 13 49 180 408 146

Public access

View all

5 articles

1 article

available

not available

Based on funding mandates

Co-authors

Bei LiuMicrosoft ResearchVerified email at microsoft.com
Jianlong FuMicrosoft ResearchVerified email at microsoft.com
Furu WeiPartner Research Manager, Microsoft ResearchVerified email at microsoft.com
Lei CuiMicrosoft Research AsiaVerified email at microsoft.com
Qi DaiMicrosoft ResearchVerified email at microsoft.com
Nigel CollierProfessor of Natural Language Processing, University of CambridgeVerified email at cam.ac.uk

Yupan Huang

Sun Yat-sen University

Verified email at mail2.sysu.edu.cn - Homepage

Multimodal AI Computer Vision Natural Language Processing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Y Huang, T Lv, L Cui, Y Lu, F Wei Proceedings of the 30th ACM International Conference on Multimedia, 2022	279	2022
Seeing out of the box: End-to-end pre-training for vision-language representation learning Z Huang, Z Zeng, Y Huang*, B Liu, D Fu, J Fu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	249	2021
Probing inter-modality: Visual parsing with self-attention for vision-and-language pre-training H Xue, Y Huang, B Liu, H Peng, J Fu, H Li, J Luo Advances in Neural Information Processing Systems 34, 4514-4528, 2021	79	2021
Decoupling localization and classification in single shot temporal action detection Y Huang, Q Dai, Y Lu 2019 IEEE International Conference on Multimedia and Expo (ICME), 1288-1293, 2019	57	2019
Unifying multimodal transformer for bi-directional image and text generation Y Huang, H Xue, B Liu, Y Lu Proceedings of the 29th ACM International Conference on Multimedia, 1138-1147, 2021	55	2021
Reinforced short-length hashing X Liu, X Nie, Q Dai, Y Huang, L Lian, Y Yin IEEE Transactions on Circuits and Systems for Video Technology 31 (9), 3655-3668, 2020	21	2020
Textdiffuser: Diffusion models as text painters J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei Advances in Neural Information Processing Systems 36, 2024	16	2024
Kosmos-2.5: A Multimodal Literate Model T Lv, Y Huang, J Chen, L Cui, S Ma, Y Chang, S Huang, W Wang, ... arXiv preprint arXiv:2309.11419, 2023	16	2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models Y Huang, Z Meng, F Liu, Y Su, N Collier, Y Lu arXiv preprint arXiv:2308.16463, 2023	12	2023
A picture is worth a thousand words: A unified system for diverse captions and rich images generation Y Huang, B Liu, J Fu, Y Lu Proceedings of the 29th ACM International Conference on Multimedia, 2792-2794, 2021	8	2021
Be specific, be clear: Bridging machine and human captions by scene-guided transformer Y Huang, Z Zeng, Y Lu Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia …, 2021	5	2021
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei arXiv preprint arXiv:2311.16465, 2023	3	2023

The system can't perform the operation now. Try again later.

Articles 1–12

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors