mirror of
https://github.com/QwenLM/Qwen.git
synced 2026-05-21 00:45:48 +08:00
update readme
This commit is contained in:
@@ -395,11 +395,9 @@ sh finetune/finetune_lora_single_gpu.sh
|
||||
sh finetune/finetune_lora_ds.sh
|
||||
```
|
||||
|
||||
In comparison with full-parameter finetuning, LoRA ([paper](https://arxiv.org/abs/2106.09685)) only updates the parameters of adapter layers but keeps the original large language model layers frozen. This allows much fewer memory costs and thus fewer computation costs. However, if you still suffer from insufficient memory, you can consider Q-LoRA ([paper](https://arxiv.org/abs/2305.14314)), which uses the quantized large language model and other techniques such as paged attention to allow even fewer memory costs. To run Q-LoRA, directly run the following script:
|
||||
In comparison with full-parameter finetuning, LoRA ([paper](https://arxiv.org/abs/2106.09685)) only updates the parameters of adapter layers but keeps the original large language model layers frozen. This allows much fewer memory costs and thus fewer computation costs. However, if you still suffer from insufficient memory, you can consider Q-LoRA ([paper](https://arxiv.org/abs/2305.14314)), which uses the quantized large language model and other techniques such as paged attention to allow even fewer memory costs. To run Q-LoRA, directly run the following script (In terms of QLoRA, temporarily we found problems with mixed precision training in the setup of single GPU. We'll fix it as soon as possible):
|
||||
|
||||
```bash
|
||||
# Single GPU training
|
||||
sh finetune/finetune_qlora_single_gpu.sh
|
||||
# Distributed training
|
||||
sh finetune/finetune_qlora_ds.sh
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user