Uniformer: Unifying convolution and self-attention for visual recognition K Li, Y Wang, J Zhang, P Gao, G Song, Y Liu, H Li, Y Qiao IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 | 652* | 2023 |
Videochat: Chat-centric video understanding KC Li, Y He, Y Wang, Y Li, W Wang, P Luo, Y Wang, L Wang, Y Qiao SCIENCE CHINA Information Sciences, 2023 | 492 | 2023 |
Adaptive pyramid context network for semantic segmentation J He, Z Deng, L Zhou, Y Wang, Y Qiao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 429 | 2019 |
Lstd: A low-shot transfer detector for object detection H Chen, Y Wang, G Wang, Y Qiao Proceedings of the AAAI conference on artificial intelligence 32 (1), 2018 | 389 | 2018 |
Learning attentive pairwise interaction for fine-grained classification P Zhuang, Y Wang, Y Qiao Proceedings of the AAAI conference on artificial intelligence 34 (07), 13130 …, 2020 | 377 | 2020 |
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking L Wang, B Huang, Z Zhao, Z Tong, Y He, Y Wang, Y Wang, Y Qiao CVPR2023, 2023 | 335 | 2023 |
Internvideo: General video foundation models via generative and discriminative learning Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ... arXiv preprint arXiv:2212.03191, 2022 | 294 | 2022 |
Recurrent spatial-temporal attention network for action recognition in videos W Du, Y Wang, Y Qiao IEEE Transactions on Image Processing 27 (3), 1347-1360, 2017 | 227 | 2017 |
Rpan: An end-to-end recurrent pose-attention network for action recognition in videos W Du, Y Wang, Y Qiao Proceedings of the IEEE international conference on computer vision, 3725-3734, 2017 | 221 | 2017 |
Mvbench: A comprehensive multi-modal video understanding benchmark K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ... CVPR2024, 2024 | 185 | 2024 |
Internvid: A large-scale video-text dataset for multimodal understanding and generation Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ... ICLR 2024, 2024 | 175 | 2024 |
Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao ICCV2023, 2023 | 161* | 2023 |
Unmasked teacher: Towards training-efficient video foundation models K Li, Y Wang, Y Li, Y Wang, Y He, L Wang, Y Qiao ICCV2023, 2023 | 134 | 2023 |
Smallbignet: Integrating core and contextual views for video classification X Li, Y Wang, Z Zhou, Y Qiao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 119 | 2020 |
Videomamba: State space model for efficient video understanding K Li, X Li, Y Wang, Y He, Y Wang, L Wang, Y Qiao ECCV 2024, 2024 | 115 | 2024 |
Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition W Zhang, Y Wang, Y Qiao Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 114 | 2019 |
Mining Inter-Video Proposal Relations for Video Object Detection M Han, Y Wang, X Chang, Y Qiao European Conference on Computer Vision (ECCV), 2020 | 103 | 2020 |
Weakly supervised patchnets: Describing and aggregating local patches for scene recognition Z Wang, L Wang, Y Wang, B Zhang, Y Qiao IEEE Transactions on Image Processing 26 (4), 2028-2041, 2017 | 102 | 2017 |
PA3D: Pose-action 3D machine for video recognition A Yan, Y Wang, Z Li, Y Qiao Proceedings of the ieee/cvf conference on computer vision and pattern …, 2019 | 100 | 2019 |
Context-transformer: Tackling object confusion for few-shot detection Z Yang, Y Wang, X Chen, J Liu, Y Qiao Proceedings of the AAAI Conference on Artificial Intelligence 34 (07), 12653 …, 2020 | 94 | 2020 |