LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Y Huang, T Lv, L Cui, Y Lu, F Wei Proceedings of the 30th ACM International Conference on Multimedia, 2022 | 393 | 2022 |
Seeing out of the box: End-to-end pre-training for vision-language representation learning Z Huang*, Z Zeng*, Y Huang*, B Liu, D Fu, J Fu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 282 | 2021 |
Probing inter-modality: Visual parsing with self-attention for vision-and-language pre-training H Xue, Y Huang, B Liu, H Peng, J Fu, H Li, J Luo Advances in Neural Information Processing Systems 34, 4514-4528, 2021 | 87 | 2021 |
Unifying multimodal transformer for bi-directional image and text generation Y Huang, H Xue, B Liu, Y Lu Proceedings of the 29th ACM International Conference on Multimedia, 1138-1147, 2021 | 57 | 2021 |
Decoupling localization and classification in single shot temporal action detection Y Huang, Q Dai, Y Lu 2019 IEEE International Conference on Multimedia and Expo (ICME), 1288-1293, 2019 | 57 | 2019 |
Textdiffuser: Diffusion models as text painters J Chen*, Y Huang*, T Lv, L Cui, Q Chen, F Wei Advances in Neural Information Processing Systems 36, 2024 | 51 | 2024 |
Kosmos-2.5: A Multimodal Literate Model T Lv*, Y Huang*, J Chen*, L Cui*, S Ma, Y Chang, S Huang, W Wang, ... arXiv preprint arXiv:2309.11419, 2023 | 31 | 2023 |
Reinforced short-length hashing X Liu, X Nie, Q Dai, Y Huang, L Lian, Y Yin IEEE Transactions on Circuits and Systems for Video Technology 31 (9), 3655-3668, 2020 | 25 | 2020 |
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei arXiv preprint arXiv:2311.16465, 2023 | 23 | 2023 |
Sparkles: Unlocking chats across multiple images for multimodal instruction-following models Y Huang, Z Meng, F Liu, Y Su, N Collier, Y Lu arXiv preprint arXiv:2308.16463, 2023 | 17 | 2023 |
A picture is worth a thousand words: A unified system for diverse captions and rich images generation Y Huang, B Liu, J Fu, Y Lu Proceedings of the 29th ACM International Conference on Multimedia, 2792-2794, 2021 | 7 | 2021 |
Be specific, be clear: Bridging machine and human captions by scene-guided transformer Y Huang, Z Zeng, Y Lu Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia …, 2021 | 7 | 2021 |