Follow
Sipeng Zheng
Sipeng Zheng
Beijing Academy of Artificial Intelligence (BAAI)
Verified email at baai.ac.cn - Homepage
Title
Cited by
Cited by
Year
Few-shot action recognition with hierarchical matching and contrastive learning
S Zheng, S Chen, Q Jin
European Conference on Computer Vision, 297-313, 2022
342022
Visual relation detection with multi-level attention
S Zheng, S Chen, Q Jin
Proceedings of the 27th ACM international conference on multimedia, 121-129, 2019
232019
Skeleton-based interactive graph network for human object interaction detection
S Zheng, S Chen, Q Jin
2020 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2020
172020
Relation understanding in videos
S Zheng, X Chen, S Chen, Q Jin
Proceedings of the 27th ACM International Conference on Multimedia, 2662-2666, 2019
152019
Vrdformer: End-to-end video visual relation detection with transformers
S Zheng, S Chen, Q Jin
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
132022
Open-category human-object interaction pre-training via language modeling framework
S Zheng, B Xu, Q Jin
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
92023
Llama rider: Spurring large language models to explore the open world
Y Feng, Y Wang, J Liu, S Zheng, Z Lu
arXiv preprint arXiv:2310.08922, 2023
62023
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
S Zheng, Y Feng, Z Lu
The Twelfth International Conference on Learning Representations, 2023
62023
Towards general computer control: A multimodal agent for red dead redemption ii as a case study
W Tan, Z Ding, W Zhang, B Li, B Zhou, J Yue, H Xia, J Jiang, L Zheng, ...
arXiv preprint arXiv:2403.03186, 2024
52024
Accommodating audio modality in CLIP for multimodal processing
L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin
Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9641-9649, 2023
52023
Exploring anchor-based detection for ego4d natural language query
S Zheng, Q Zhang, B Liu, Q Jin, J Fu
arXiv preprint arXiv:2208.05375, 2022
52022
Unicode: Learning a unified codebook for multimodal large language models
S Zheng, B Zhou, Y Feng, Y Wang, Z Lu
arXiv preprint arXiv:2403.09072, 2024
32024
Anchor-based detection for natural language localization in ego-centric videos
B Liu, S Zheng, J Fu, WH Cheng
2023 IEEE International Conference on Consumer Electronics (ICCE), 01-04, 2023
22023
SPAFormer: Sequential 3D Part Assembly with Transformers
B Xu, S Zheng, Q Jin
arXiv preprint arXiv:2403.05874, 2024
12024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World
B Xu, S Zheng, Q Jin
Proceedings of the 31st ACM International Conference on Multimedia, 2807-2816, 2023
12023
EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
B Xu, Z Wang, Y Du, S Zheng, Z Song, Q Jin
arXiv preprint arXiv:2405.17719, 2024
2024
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Q Zhang, S Zheng, Q Jin
arXiv preprint arXiv:2307.10567, 2023
2023
Supplementary Material for Open-Category Human-Object Interaction Pre-training via Language Modeling Framework
S Zheng, B Xu, Q Jin
relation 50 (100), 100, 0
Supplementary Material for VRDFormer: End-to-End Video Visual Relation Detection with Transformers
S Zheng, S Chen, Q Jin
The system can't perform the operation now. Try again later.
Articles 1–19