DiscoverDaily Paper CastYuLan-Mini: An Open Data-efficient Language Model
YuLan-Mini: An Open Data-efficient Language Model

YuLan-Mini: An Open Data-efficient Language Model

Update: 2024-12-28
Share

Description

🤗 Upvotes: 27 | cs.CL



Authors:

Yiwen Hu, Huatong Song, Jia Deng, Jiapeng Wang, Jie Chen, Kun Zhou, Yutao Zhu, Jinhao Jiang, Zican Dong, Wayne Xin Zhao, Ji-Rong Wen



Title:

YuLan-Mini: An Open Data-efficient Language Model



Arxiv:

http://arxiv.org/abs/2412.17743v2



Abstract:

Effective pre-training of large language models (LLMs) has been challenging due to the immense resource demands and the complexity of the technical processes involved. This paper presents a detailed technical report on YuLan-Mini, a highly capable base model with 2.42B parameters that achieves top-tier performance among models of similar parameter scale. Our pre-training approach focuses on enhancing training efficacy through three key technical contributions: an elaborate data pipeline combines data cleaning with data schedule strategies, a robust optimization method to mitigate training instability, and an effective annealing approach that incorporates targeted data selection and long context training. Remarkably, YuLan-Mini, trained on 1.08T tokens, achieves performance comparable to industry-leading models that require significantly more data. To facilitate reproduction, we release the full details of the data composition for each training phase. Project details can be accessed at the following link: https://github.com/RUC-GSAI/YuLan-Mini.

Comments 
In Channel
1.58-bit FLUX

1.58-bit FLUX

2024-12-3122:59

loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

YuLan-Mini: An Open Data-efficient Language Model

YuLan-Mini: An Open Data-efficient Language Model

Jingwen Liang, Gengyu Wang