DiscoverDaily Paper CastXmodel-2 Technical Report
Xmodel-2 Technical Report

Xmodel-2 Technical Report

Update: 2025-01-03
Share

Description

🤗 Upvotes: 13 | cs.AI



Authors:

Wang Qun, Liu Yang, Lin Qingquan, Qu Zhijiu, Jiang Ling



Title:

Xmodel-2 Technical Report



Arxiv:

http://arxiv.org/abs/2412.19638v1



Abstract:

Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2

Comments 
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Xmodel-2 Technical Report

Xmodel-2 Technical Report

Jingwen Liang, Gengyu Wang