Sipeng Zheng

Cited by

	All	Since 2019
Citations	158	158
h-index	8	8
i10-index	6	6

202020212022202320247 13 15 47 76

Public access

View all

4 articles

3 articles

available

not available

Based on funding mandates

Sipeng Zheng

Beijing Academy of Artificial Intelligence (BAAI)

Verified email at baai.ac.cn - Homepage

Computer Vision Large Multimodal Model Agent Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Few-shot action recognition with hierarchical matching and contrastive learning S Zheng, S Chen, Q Jin European Conference on Computer Vision, 297-313, 2022	36	2022
Visual relation detection with multi-level attention S Zheng, S Chen, Q Jin Proceedings of the 27th ACM international conference on multimedia, 121-129, 2019	22	2019
Skeleton-based interactive graph network for human object interaction detection S Zheng, S Chen, Q Jin 2020 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2020	18	2020
Relation understanding in videos S Zheng, X Chen, S Chen, Q Jin Proceedings of the 27th ACM International Conference on Multimedia, 2662-2666, 2019	16	2019
Vrdformer: End-to-end video visual relation detection with transformers S Zheng, S Chen, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	12	2022
Open-category human-object interaction pre-training via language modeling framework S Zheng, B Xu, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	10	2023
Towards general computer control: A multimodal agent for red dead redemption ii as a case study W Tan, Z Ding, W Zhang, B Li, B Zhou, J Yue, H Xia, J Jiang, L Zheng, ... arXiv preprint arXiv:2403.03186, 2024	9	2024
Llama rider: Spurring large language models to explore the open world Y Feng, Y Wang, J Liu, S Zheng, Z Lu arXiv preprint arXiv:2310.08922, 2023	9	2023
Steve-eye: Equipping llm-based embodied agents with visual perception in open worlds S Zheng, Y Feng, Z Lu The Twelfth International Conference on Learning Representations, 2023	8	2023
Accommodating audio modality in CLIP for multimodal processing L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9641-9649, 2023	5	2023
Exploring anchor-based detection for ego4d natural language query S Zheng, Q Zhang, B Liu, Q Jin, J Fu arXiv preprint arXiv:2208.05375, 2022	5	2022
Unicode: Learning a unified codebook for multimodal large language models S Zheng, B Zhou, Y Feng, Y Wang, Z Lu arXiv preprint arXiv:2403.09072, 2024	4	2024
Anchor-based detection for natural language localization in ego-centric videos B Liu, S Zheng, J Fu, WH Cheng 2023 IEEE International Conference on Consumer Electronics (ICCE), 01-04, 2023	2	2023
SPAFormer: Sequential 3D Part Assembly with Transformers B Xu, S Zheng, Q Jin arXiv preprint arXiv:2403.05874, 2024	1	2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World B Xu, S Zheng, Q Jin Proceedings of the 31st ACM International Conference on Multimedia, 2807-2816, 2023	1	2023
QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds Y Wang, Y Mei, S Zheng, Q Jin arXiv preprint arXiv:2406.16578, 2024		2024
EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? B Xu, Z Wang, Y Du, S Zheng, Z Song, Q Jin arXiv preprint arXiv:2405.17719, 2024		2024
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection Q Zhang, S Zheng, Q Jin arXiv preprint arXiv:2307.10567, 2023		2023
Supplementary Material for Open-Category Human-Object Interaction Pre-training via Language Modeling Framework S Zheng, B Xu, Q Jin relation 50 (100), 100, 0
Supplementary Material for VRDFormer: End-to-End Video Visual Relation Detection with Transformers S Zheng, S Chen, Q Jin

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by