Merge branch 'main' of github.com:QwenLM/Qwen-7B

2026-05-20 16:35:47 +08:00 · 2023-09-18 11:51:06 +08:00
parent b86a0f2c8a 44ce0781a9
commit fb3180d8f0
5 changed files with 8 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -15,6 +15,10 @@
 </p>
 <br><br>

+__Will be back soon...__
+
+---
+
 We opensource **Qwen-7B** and **Qwen-7B-Chat** on both **🤖 ModelScope** and **🤗 Hugging Face** (Click the logos on top to the repos with codes and checkpoints). This repo includes the brief introduction to Qwen-7B, the usage guidance, and also a technical memo [link](tech_memo.md) that provides more information.

 Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. The features of the Qwen-7B series include:
--- a/finetune/finetune_lora_ds.sh
+++ b/finetune/finetune_lora_ds.sh
@@ -44,4 +44,5 @@ torchrun $DISTRIBUTED_ARGS finetune.py \
    --model_max_length 2048 \
    --lazy_preprocess True \
    --use_lora \
+    --gradient_checkpointing \
    --deepspeed finetune/ds_config_zero2.json
--- a/finetune/finetune_lora_single_gpu.sh
+++ b/finetune/finetune_lora_single_gpu.sh
@@ -32,4 +32,5 @@ python finetune.py \
  --report_to "none" \
  --model_max_length 2048 \
  --lazy_preprocess True \
+  --gradient_checkpointing \
  --use_lora
--- a/finetune/finetune_qlora_ds.sh
+++ b/finetune/finetune_qlora_ds.sh
@@ -46,4 +46,5 @@ torchrun $DISTRIBUTED_ARGS finetune.py \
    --lazy_preprocess True \
    --use_lora \
    --q_lora \
+    --gradient_checkpointing \
    --deepspeed finetune/ds_config_zero2.json
--- a/finetune/finetune_qlora_single_gpu.sh
+++ b/finetune/finetune_qlora_single_gpu.sh
@@ -32,5 +32,6 @@ python finetune.py \
  --report_to "none" \
  --model_max_length 2048 \
  --lazy_preprocess True \
+  --gradient_checkpointing \
  --use_lora \
  --q_lora