Boundary proposal network for two-stage natural language video localization S Xiao, L Chen, S Zhang, W Ji, J Shao, L Ye, J Xiao Proceedings of the AAAI Conference on Artificial Intelligence 35 (4), 2986-2994, 2021 | 159 | 2021 |
Video relation detection with spatio-temporal graph X Qian, Y Zhuang, Y Li, S Xiao, S Pu, J Xiao Proceedings of the 27th ACM international conference on multimedia, 84-93, 2019 | 90 | 2019 |
Deep learning for weakly-supervised object detection and localization: A survey F Shao, L Chen, J Shao, W Ji, S Xiao, L Ye, Y Zhuang, J Xiao Neurocomputing 496, 192-207, 2022 | 76 | 2022 |
Natural language video localization with learnable moment proposals S Xiao, L Chen, J Shao, Y Zhuang, J Xiao arXiv preprint arXiv:2109.10678, 2021 | 42 | 2021 |
Deep learning for weakly-supervised object detection and object localization: A survey F Shao, L Chen, J Shao, W Ji, S Xiao, L Ye, Y Zhuang, J Xiao arXiv preprint arXiv:2105.12694, 2021 | 25 | 2021 |
Rethinking the evaluation of unbiased scene graph generation X Li, L Chen, J Shao, S Xiao, S Zhang, J Xiao arXiv preprint arXiv:2208.01909, 2022 | 12 | 2022 |
Hierarchical temporal fusion of multi-grained attention features for video question answering S Xiao, Y Li, Y Ye, L Chen, S Pu, Z Zhao, J Shao, J Xiao Neural Processing Letters 52, 993-1003, 2020 | 8 | 2020 |
Rethinking multi-modal alignment in video question answering from feature and sample perspectives S Xiao, L Chen, K Gao, Z Wang, Y Yang, Z Zhang, J Xiao arXiv preprint arXiv:2204.11544, 2022 | 6 | 2022 |
Rethinking multi-modal alignment in multi-choice videoQA from feature and sample perspectives S Xiao, L Chen, K Gao, Z Wang, Y Yang, Z Zhang, J Xiao Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022 | 4 | 2022 |
Video question answering via multi-granularity temporal attention network learning S Xiao, Y Li, Y Ye, Z Zhao, J Xiao, F Wu, J Zhu, Y Zhuang Proceedings of the 10th International Conference on Internet Multimedia …, 2018 | 2 | 2018 |