LLaMA Factory

Features

Various models: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Qwen2-VL, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.
Integrated methods: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.
Scalable resources: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
Advanced algorithms: GaLore, BAdam, Adam-mini, DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ, PiSSA and Agent tuning.
Practical tricks: FlashAttention-2, Unsloth, Liger Kernel, RoPE scaling, NEFTune and rsLoRA.
Experiment monitors: LlamaBoard, TensorBoard, Wandb, MLflow, etc.
Faster inference: OpenAI-style API, Gradio UI and CLI with vLLM worker.

Benchmark

Compared to ChatGLM's P-Tuning, LLaMA Factory's LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA Factory's QLoRA further improves the efficiency regarding the GPU memory.

Definitions

Changelog

[24/09/19] We support fine-tuning the Qwen2.5 models.

[24/08/30] We support fine-tuning the Qwen2-VL models. Thank @simonJJJ's PR.

[24/08/27] We support Liger Kernel. Try enable_liger_kernel: true for efficient training.

[24/08/09] We support Adam-mini optimizer. See examples for usage. Thank @relic-yuexi's PR.

Full Changelog

Supported Models

Model	Model size	Template
Baichuan 2	7B/13B	baichuan2
BLOOM/BLOOMZ	560M/1.1B/1.7B/3B/7.1B/176B	-
ChatGLM3	6B	chatglm3
Command R	35B/104B	cohere
DeepSeek (Code/MoE)	7B/16B/67B/236B	deepseek
Falcon	7B/11B/40B/180B	falcon
Gemma/Gemma 2/CodeGemma	2B/7B/9B/27B	gemma
GLM-4	9B	glm4
InternLM2/InternLM2.5	7B/20B	intern2
Llama	7B/13B/33B/65B	-
Llama 2	7B/13B/70B	llama2
Llama 3-3.2	1B/3B/8B/70B	llama3
LLaVA-1.5	7B/13B	llava
LLaVA-NeXT	7B/8B/13B/34B/72B/110B	llava_next
LLaVA-NeXT-Video	7B/34B	llava_next_video
MiniCPM	1B/2B/4B	cpm/cpm3
Mistral/Mixtral	7B/8x7B/8x22B	mistral
OLMo	1B/7B	-
PaliGemma	3B	paligemma
Phi-1.5/Phi-2	1.3B/2.7B	-
Phi-3	4B/7B/14B	phi
Qwen (1-2.5) (Code/Math/MoE)	0.5B/1.5B/3B/7B/14B/32B/72B/110B	qwen
Qwen2-VL	2B/7B/72B	qwen2_vl
StarCoder 2	3B/7B/15B	-
XVERSE	7B/13B/65B	xverse
Yi/Yi-1.5 (Code)	1.5B/6B/9B/34B	yi
Yi-VL	6B/34B	yi_vl
Yuan 2	2B/51B/102B	yuan

Note

For the "base" models, the template argument can be chosen from default, alpaca, vicuna etc. But make sure to use the corresponding template for the "instruct/chat" models.

Remember to use the SAME template in training and inference.

Please refer to constants.py for a full list of models we supported.

You also can add a custom chat template to template.py.

Supported Training Approaches

Approach	Full-tuning	Freeze-tuning	LoRA	QLoRA
Pre-Training	✅	✅	✅	✅
Supervised Fine-Tuning	✅	✅	✅	✅
Reward Modeling	✅	✅	✅	✅
PPO Training	✅	✅	✅	✅
DPO Training	✅	✅	✅	✅
KTO Training	✅	✅	✅	✅
ORPO Training	✅	✅	✅	✅
SimPO Training	✅	✅	✅	✅

Tip

The implementation details of PPO can be found in this blog.

Provided Datasets

Pre-training datasets

Supervised fine-tuning datasets

Preference datasets

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Requirement

Mandatory	Minimum	Recommend
python	3.8	3.11
torch	1.13.1	2.4.0
transformers	4.41.2	4.43.4
datasets	2.16.0	2.20.0
accelerate	0.30.1	0.32.0
peft	0.11.1	0.12.0
trl	0.8.6	0.9.6

Optional	Minimum	Recommend
CUDA	11.6	12.2
deepspeed	0.10.0	0.14.0
bitsandbytes	0.39.0	0.43.1
vllm	0.4.3	0.5.0
flash-attn	2.3.0	2.6.3

Hardware Requirement

* estimated

Method	Bits	7B	13B	30B	70B	110B	8x7B	8x22B
Full	AMP	120GB	240GB	600GB	1200GB	2000GB	900GB	2400GB
Full	16	60GB	120GB	300GB	600GB	900GB	400GB	1200GB
Freeze	16	20GB	40GB	80GB	200GB	360GB	160GB	400GB
LoRA/GaLore/BAdam	16	16GB	32GB	64GB	160GB	240GB	120GB	320GB
QLoRA	8	10GB	20GB	40GB	80GB	140GB	60GB	160GB
QLoRA	4	6GB	12GB	24GB	48GB	72GB	30GB	96GB
QLoRA	2	4GB	8GB	16GB	24GB	48GB	18GB	48GB

Getting Started

Installation

Important

Installation is mandatory.

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"

Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, awq, aqlm, vllm, galore, badam, adam-mini, qwen, modelscope, quality

Tip

Use pip install --no-deps -e . to resolve package conflicts.

For Windows users

For Ascend NPU users

Data Preparation

Please refer to data/README.md for checking the details about the format of dataset files. You can either use datasets on HuggingFace / ModelScope hub or load the dataset in local disk.

Note

Please update data/dataset_info.json to use your custom dataset.

Quickstart

Use the following 3 commands to run LoRA fine-tuning, inference and merging of the Llama3-8B-Instruct model, respectively.

llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

See examples/README.md for advanced usage (including distributed training).

Tip

Use llamafactory-cli help to show help information.

Fine-Tuning with LLaMA Board GUI (powered by Gradio)

llamafactory-cli webui

Build Docker

For CUDA users:

cd docker/docker-cuda/
docker compose up -d
docker compose exec llamafactory bash

For Ascend NPU users:

cd docker/docker-npu/
docker compose up -d
docker compose exec llamafactory bash

For AMD ROCm users:

cd docker/docker-rocm/
docker compose up -d
docker compose exec llamafactory bash

Build without Docker Compose

Details about volume

Deploy with OpenAI-style API and vLLM

API_PORT=8000 llamafactory-cli api examples/inference/llama3_vllm.yaml

Tip

Visit this page for API document.

Download from ModelScope Hub

If you have trouble with downloading models and datasets from Hugging Face, you can use ModelScope.

export USE_MODELSCOPE_HUB=1 # `set USE_MODELSCOPE_HUB=1` for Windows

Train the model by specifying a model ID of the ModelScope Hub as the model_name_or_path. You can find a full list of model IDs at ModelScope Hub, e.g., LLM-Research/Meta-Llama-3-8B-Instruct.

Use W&B Logger

To use Weights & Biases for logging experimental results, you need to add the following arguments to yaml files.

report_to: wandb
run_name: test_run # optional

Set WANDB_API_KEY to your key when launching training tasks to log in with your W&B account.

Projects using LLaMA Factory

If you have a project that should be incorporated, please contact via email or create a pull request.

Click to show

License

This repository is licensed under the Apache-2.0 License.

https://github.com/hiyouga/L