LLAMA PRO: Progressive LLaMA with Block Expansion

1The University of Hong Kong, 2ARC Lab, Tencent PCG 3Shanghai Jiao Tong University 4Beijing Language and Culture University
🔥What's New
  • [2024/02/23] We release MetaMath-Mistral-Pro that surpasses previous MetaMath series 7B models at both GSM8k and MATH. The evaluation is following the official MetaMath repo.
  • [2024/02/23] We release the evaluation code of Mistral-Pro-8B-v0.1 in lm-evaluation-harness.
  • [2024/02/23] We release the Mistral-Pro-8B-v0.1 with superior performance on a range of benchmarks. It enhances the code and math performance of Mistral and matches the performance of the recently dominant model, Gemma. assets/mistral_pro_performance.png
  • [2024/01/18] Add the training code in open-instruct.
  • [2024/01/07] Add how to run gradio demo locally in demo
  • [2024/01/06] We open source the LLaMA-Pro repository and Demo & Model.

An overview of LLaMA-Pro of two stages:

Abstract

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model’s knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro - Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.

Video

Thanks Yannic Kilcher for sharing our work.

BibTeX

@article{wu2024llama,
      title={Llama pro: Progressive llama with block expansion},
      author={Wu, Chengyue and Gan, Yukang and Ge, Yixiao and Lu, Zeyu and Wang, Jiahao and Feng, Ye and Luo, Ping and Shan, Ying},
      journal={arXiv preprint arXiv:2401.02415},
      year={2024}
    }