WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021 | 140 | 2021 |
Unpaired cross-lingual image caption generation with self-supervised rewards Y Song, S Chen, Y Zhao, Q Jin Proceedings of the 27th ACM international conference on multimedia, 784-792, 2019 | 45 | 2019 |
Towards diverse paragraph captioning for untrimmed videos Y Song, S Chen, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021 | 44 | 2021 |
Unifying event detection and captioning as sequence generation via pre-training Q Zhang, Y Song, Q Jin European Conference on Computer Vision, 363-379, 2022 | 30 | 2022 |
Progressive learning for image retrieval with hybrid-modality queries Y Zhao, Y Song, Q Jin Proceedings of the 45th International ACM SIGIR Conference on Research and …, 2022 | 28 | 2022 |
Product-oriented machine translation with cross-modal cross-lingual pre-training Y Song, S Chen, Q Jin, W Luo, J Xie, F Huang Proceedings of the 29th ACM International Conference on Multimedia, 2843-2852, 2021 | 16 | 2021 |
Activitynet 2019 task 3: Exploring contexts for dense captioning events in videos S Chen, Y Song, Y Zhao, Q Jin, Z Zeng, B Liu, J Fu, A Hauptmann arXiv preprint arXiv:1907.05092, 2019 | 12 | 2019 |
Enhancing neural machine translation with dual-side multimodal awareness Y Song, S Chen, Q Jin, W Luo, J Xie, F Huang IEEE Transactions on Multimedia 24, 3013-3024, 2021 | 10 | 2021 |
Accommodating audio modality in CLIP for multimodal processing L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9641-9649, 2023 | 8 | 2023 |
RUC_AIM3 at TRECVID 2020: Ad-hoc Video Search & Video to Text Description. Y Zhao, Y Song, S Chen, Q Jin TRECVID 1, 2, 2020 | 7 | 2020 |
RUC+ CMU: system report for dense captioning events in videos S Chen, Y Song, Y Zhao, J Qiu, Q Jin, A Hauptmann arXiv preprint arXiv:1806.08854, 2018 | 7 | 2018 |
Team ruc_aim3 technical report at activitynet 2020 task 2: Exploring sequential events detection for dense video captioning Y Song, S Chen, Y Zhao, Q Jin arXiv preprint arXiv:2006.07896, 2020 | 4 | 2020 |
RUC_AIM3 at TRECVID 2019: Video to Text. Y Song, Y Zhao, S Chen, Q Jin TRECVID, 2019 | 2 | 2019 |
Team RUC_AIM3 technical report at activityNet 2021: Entities object localization L Ruan, J Chen, Y Song, S Chen, Q Jin arXiv preprint arXiv:2106.06138, 2021 | 1 | 2021 |
iMakeup: Makeup Instructional Video Dataset for Fine-Grained Dense Video Captioning X Lin, Q Jin, S Chen, Y Song, Y Zhao Advances in Multimedia Information Processing–PCM 2018: 19th Pacific-Rim …, 2018 | 1 | 2018 |
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019 S Chen, Y Zhao, Y Song, Q Jin, Q Wu arXiv preprint arXiv:1910.06737, 2019 | | 2019 |
Supplementary Material for “Unifying Event Detection and Captioning as Sequence Generation via Pre-Training” Q Zhang, Y Song, Q Jin | | |
RUC_AIM3 at TRECVID 2021: Video to Text L Zhang, Y Song, Q Jin | | |
Team RUC AIˇ M3 Technical Report at VMT Challenge 2020: Enhancing Neural Machine Translation with Multimodal Rewards Y Song, S Chen, Q Jin | | |
Supplementary Material for “Towards Diverse Paragraph Captioning for Untrimmed Videos” Y Song, S Chen, Q Jin | | |