Follow
Yuanhan Zhang
Yuanhan Zhang
PhD Candidate, MMLab@NTU
Verified email at e.ntu.edu.sg - Homepage
Title
Cited by
Cited by
Year
Mimic-it: Multi-modal in-context instruction tuning
B Li, Y Zhang, L Chen, J Wang, F Pu, J Yang, C Li, Z Liu
arXiv preprint arXiv:2306.05425, 2023
4222023
Mmbench: Is your multi-modal model an all-around player?
Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao, Y Yuan, J Wang, C He, ...
arXiv preprint arXiv:2307.06281, 2023
2932023
Celeba-spoof: Large-scale face anti-spoofing dataset with rich annotations
Y Zhang, ZF Yin, Y Li, G Yin, J Yan, J Shao, Z Liu
Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020
1752020
Neural Prompt Search
Y Zhang, K Zhou, Z Liu
arXiv preprint arXiv:2206.04673, 2022
1312022
Llava-next: Improved reasoning, ocr, and world knowledge
H Liu, C Li, Y Li, B Li, Y Zhang, S Shen, YJ Lee
802024
What makes good examples for visual in-context learning?
Y Zhang, K Zhou, Z Liu
Advances in Neural Information Processing Systems 36, 2024
532024
Vbench: Comprehensive benchmark suite for video generative models
Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
422024
Octopus: Embodied vision-language programmer from environmental feedback
J Yang, Y Dong, S Liu, B Li, Z Wang, C Jiang, H Tan, J Kang, Y Zhang, ...
arXiv preprint arXiv:2310.08588, 2023
282023
Learning without forgetting for vision-language models
DW Zhou, Y Zhang, J Ning, HJ Ye, DC Zhan, Z Liu
arXiv preprint arXiv:2305.19270, 2023
192023
Benchmarking omni-vision representation through the lens of visual realms
Y Zhang, Z Yin, J Shao, Z Liu
European Conference on Computer Vision, 594-611, 2022
192022
Bamboo: Building mega-scale vision dataset continually with human-machine synergy
Y Zhang, Q Sun, Y Zhou, Z He, Z Yin, K Wang, L Sheng, Y Qiao, J Shao, ...
arXiv preprint arXiv:2203.07845, 2022
152022
Celeba-spoof challenge 2020 on face anti-spoofing: Methods and results
Y Zhang, Z Yin, J Shao, Z Liu, S Yang, Y Xiong, W Xia, Y Xu, M Luo, J Liu, ...
arXiv preprint arXiv:2102.12642, 2021
142021
Funqa: Towards surprising video comprehension
B Xie, S Zhang, Z Zhou, B Li, Y Zhang, J Hessel, J Yang, Z Liu
arXiv preprint arXiv:2306.14899, 2023
92023
3d point cloud pre-training with knowledge distillation from 2d images
Y Yao, Y Zhang, Z Yin, J Luo, W Ouyang, X Huang
arXiv preprint arXiv:2212.08974, 2022
72022
Multimodal foundation models for zero-shot animal species recognition in camera trap images
Z Fabian, Z Miao, C Li, Y Zhang, Z Liu, A Hernández, A Montes-Rojas, ...
arXiv preprint arXiv:2311.01064, 2023
52023
On-device domain generalization
K Zhou, Y Zhang, Y Zang, J Yang, CC Loy, Z Liu
arXiv preprint arXiv:2209.07521, 2022
52022
Robust face anti-spoofing with dual probabilistic modeling
Y Zhang, Y Wu, Z Yin, J Shao, Z Liu
arXiv preprint arXiv:2204.12685, 2022
32022
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Y Zhang, K Zhang, B Li, F Pu, CA Setiadharma, J Yang, Z Liu
arXiv preprint arXiv:2405.03272, 2024
2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
R Zhang, L Gui, Z Sun, Y Feng, K Xu, Y Zhang, D Fu, C Li, A Hauptmann, ...
arXiv preprint arXiv:2404.01258, 2024
2024
Knowledge Augmented Instruction Tuning for Zero-shot Animal Species Recognition
Z Fabian, Z Miao, C Li, Y Zhang, Z Liu, A Hernandez, P Arbelaez, A Link, ...
NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following, 0
The system can't perform the operation now. Try again later.
Articles 1–20