Follow
Xiaotian Han
Xiaotian Han
TikTok
Verified email at bytedance.com - Homepage
Title
Cited by
Cited by
Year
Exploring the reasoning abilities of multimodal large language models (mllms): A comprehensive survey on emerging trends in multimodal reasoning
Y Wang, W Chen, X Han, X Lin, H Zhao, Y Liu, B Zhai, J Yuan, Q You, ...
arXiv preprint arXiv:2401.06805, 2024
68*2024
Real-time micro-scale temperature imaging at low cost based on fluorescent intensity ratio
J Xiong, M Zhao, X Han, Z Cao, X Wei, Y Chen, C Duan, M Yin
Scientific Reports 7 (1), 41311, 2017
372017
Mmptrack: Large-scale densely annotated multi-camera multiple people tracking benchmark
X Han, Q You, C Wang, Z Zhang, P Chu, H Hu, J Wang, Z Liu
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2023
35*2023
Image scene graph generation (sgg) benchmark
X Han, J Yang, H Hu, L Zhang, J Gao, P Zhang
arXiv preprint arXiv:2107.12604, 2021
332021
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
X Han, Q You, Y Liu, W Chen, H Zheng, K Mrini, X Lin, Y Wang, B Zhai, ...
arXiv e-prints, arXiv: 2311.11567, 2023
12*2023
ViTAR: Vision Transformer with Any Resolution
Q Fan, Q You, X Han, Y Liu, Y Tao, H Huang, R He, H Yang
arXiv preprint arXiv:2403.18361, 2024
92024
Infimm-webmath-40b: Advancing multimodal pre-training for enhanced mathematical reasoning
X Han, Y Jian, X Hu, H Liu, Y Wang, Q Fan, Y Ai, H Huang, R He, Z Yang, ...
arXiv preprint arXiv:2409.12568, 2024
52024
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model
H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y Tao, ...
Findings of the Association for Computational Linguistics ACL 2024, 485-492, 2024
5*2024
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
H Liu, Q You, X Han, Y Wang, B Zhai, Y Liu, Y Tao, H Huang, R He, ...
arXiv preprint arXiv:2403.01487, 2024
5*2024
Quanzeng You, and Hongxia Yang. Dreamclear: High-capacity real-world image restoration with privacy-safe dataset curation
Y Ai, X Zhou, H Huang, X Han, Z Chen
NeurIPS 5 (6), 7, 2024
32024
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
H Liu, Q You, X Han, Y Liu, H Huang, R He, H Yang
arXiv preprint arXiv:2405.17815, 2024
22024
COCO is" ALL''You Need for Visual Instruction Fine-tuning
X Han, Y Wang, B Zhai, Q You, H Yang
arXiv preprint arXiv:2401.08968, 2024
12024
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Y Liu, P Li, Z Wei, C Xie, X Hu, X Xu, S Zhang, X Han, H Yang, F Wu
arXiv preprint arXiv:2501.04575, 2025
2025
BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
X Wang, Q Cui, Y Tao, Y Wang, Z Chai, X Han, B Liu, J Yuan, J Su, ...
arXiv preprint arXiv:2410.00773, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–14