Pre-trained models: Past, present and future X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu, Y Yao, A Zhang, ... AI Open 2, 225-250, 2021 | 966 | 2021 |
Ppt: Pre-trained prompt tuning for few-shot learning Y Gu, X Han, Z Liu, M Huang arXiv preprint arXiv:2109.04332, 2021 | 485 | 2021 |
Knowledge distillation of large language models Y Gu, L Dong, F Wei, M Huang arXiv preprint arXiv:2306.08543, 2023 | 282* | 2023 |
CPM: A large-scale generative Chinese pre-trained language model Z Zhang, X Han, H Zhou, P Ke, Y Gu, D Ye, Y Qin, Y Su, H Ji, J Guan, F Qi, ... AI Open 2, 93-99, 2021 | 129 | 2021 |
Adapting meta knowledge graph information for multi-hop reasoning over few-shot relations X Lv, Y Gu, X Han, L Hou, J Li, Z Liu arXiv preprint arXiv:1908.11513, 2019 | 112 | 2019 |
Cpm-2: Large-scale cost-effective pre-trained language models Z Zhang, Y Gu, X Han, S Chen, C Xiao, Z Sun, Y Yao, F Qi, J Guan, P Ke, ... AI Open 2, 216-224, 2021 | 97 | 2021 |
Train no evil: Selective masking for task-guided pre-training Y Gu, Z Zhang, X Wang, Z Liu, M Sun arXiv preprint arXiv:2004.09733, 2020 | 70 | 2020 |
Eva: An open-domain chinese dialogue system with large-scale generative pre-training H Zhou, P Ke, Z Zhang, Y Gu, Y Zheng, C Zheng, Y Wang, CH Wu, H Sun, ... arXiv preprint arXiv:2108.01547, 2021 | 50 | 2021 |
Structured prompting: Scaling in-context learning to 1,000 examples Y Hao, Y Sun, L Dong, Z Han, Y Gu, F Wei arXiv preprint arXiv:2212.06713, 2022 | 49 | 2022 |
Eva2. 0: Investigating open-domain chinese dialogue systems with large-scale pre-training Y Gu, J Wen, H Sun, Y Song, P Ke, C Zheng, Z Zhang, J Yao, L Liu, X Zhu, ... Machine Intelligence Research 20 (2), 207-219, 2023 | 44 | 2023 |
Pre-training to learn in context Y Gu, L Dong, F Wei, M Huang arXiv preprint arXiv:2305.09137, 2023 | 39 | 2023 |
Synthetic data (almost) from scratch: Generalized instruction tuning for language models H Li, Q Dong, Z Tang, C Wang, X Zhang, H Huang, S Huang, X Huang, ... arXiv preprint arXiv:2402.13064, 2024 | 30 | 2024 |
When does further pre-training MLM help? An empirical study on task-oriented dialog pre-training Q Zhu, Y Gu, L Luo, B Li, C Li, W Peng, M Huang, X Zhu Proceedings of the Second Workshop on Insights from Negative Results in NLP …, 2021 | 23 | 2021 |
Instruction Pre-Training: Language Models are Supervised Multitask Learners D Cheng, Y Gu, S Huang, J Bi, M Huang, F Wei arXiv preprint arXiv:2406.14491, 2024 | 17* | 2024 |
Cuge: A chinese language understanding and generation evaluation benchmark Y Yao, Q Dong, J Guan, B Cao, Z Zhang, C Xiao, X Wang, F Qi, J Bao, ... arXiv preprint arXiv:2112.13610, 2021 | 16 | 2021 |
Learning instructions with unlabeled data for zero-shot cross-task generalization Y Gu, P Ke, X Zhu, M Huang arXiv preprint arXiv:2210.09175, 2022 | 12 | 2022 |
NVILA: Efficient frontier visual language models Z Liu, L Zhu, B Shi, Z Zhang, Y Lou, S Yang, H Xi, S Cao, Y Gu, D Li, X Li, ... arXiv preprint arXiv:2412.04468, 2024 | 5 | 2024 |
Direct preference knowledge distillation for large language models Y Li, Y Gu, L Dong, D Wang, Y Cheng, F Wei arXiv preprint arXiv:2406.19774, 2024 | 2 | 2024 |
MiniPLM: Knowledge Distillation for Pre-Training Language Models Y Gu, H Zhou, F Meng, J Zhou, M Huang arXiv preprint arXiv:2410.17215, 2024 | 1 | 2024 |
Data Selection via Optimal Control for Language Models Y Gu, L Dong, H Wang, Y Hao, Q Dong, F Wei, M Huang arXiv preprint arXiv:2410.07064, 2024 | 1 | 2024 |