init commit of recipes (#1027)

Add recipes
2026-05-20 16:35:47 +08:00 · 2024-01-30 01:57:09 -06:00
parent d275e5b91a
commit ee01f36ed9
30 changed files with 5146 additions and 0 deletions
--- a/recipes/finetune/deepspeed/finetune_lora_multi_gpu.ipynb
+++ b/recipes/finetune/deepspeed/finetune_lora_multi_gpu.ipynb
@@ -0,0 +1,267 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "6e6981ab-2d9a-4280-923f-235a166855ba",
+   "metadata": {},
+   "source": [
+    "# LoRA Fine-Tuning Qwen-Chat Large Language Model (Multiple GPUs)\n",
+    "\n",
+    "Tongyi Qianwen is a large language model developed by Alibaba Cloud based on the Transformer architecture, trained on an extensive set of pre-training data. The pre-training data is diverse and covers a wide range, including a large amount of internet text, specialized books, code, etc. In addition, an AI assistant called Qwen-Chat has been created based on the pre-trained model using alignment mechanism.\n",
+    "\n",
+    "This notebook uses Qwen-1.8B-Chat as an example to introduce how to LoRA fine-tune the Qianwen model using Deepspeed.\n",
+    "\n",
+    "## Environment Requirements\n",
+    "\n",
+    "Please refer to **requirements.txt** to install the required dependencies.\n",
+    "\n",
+    "## Preparation\n",
+    "\n",
+    "### Download Qwen-1.8B-Chat\n",
+    "\n",
+    "First, download the model files. You can choose to download directly from ModelScope."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "248488f9-4a86-4f35-9d56-50f8e91a8f11",
+   "metadata": {
+    "ExecutionIndicator": {
+     "show": true
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from modelscope.hub.snapshot_download import snapshot_download\n",
+    "model_dir = snapshot_download('Qwen/Qwen-1_8B-Chat', cache_dir='.', revision='master')"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "7b2a92b1-f08e-4413-9f92-8f23761e6e1f",
+   "metadata": {},
+   "source": [
+    "### Download Example Training Data\n",
+    "\n",
+    "Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from [Belle](https://github.com/LianjiaTech/BELLE).\n",
+    "\n",
+    "Disclaimer: the dataset can be only used for the research purpose."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce195f08-fbb2-470e-b6c0-9a03457458c7",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7226bed0-171b-4d45-a3f9-b3d81ec2bb9f",
+   "metadata": {},
+   "source": [
+    "You can also refer to this format to prepare the dataset. Below is a simple example list with 1 sample:\n",
+    "\n",
+    "```json\n",
+    "[\n",
+    "  {\n",
+    "    \"id\": \"identity_0\",\n",
+    "    \"conversations\": [\n",
+    "      {\n",
+    "        \"from\": \"user\",\n",
+    "        \"value\": \"你好\"\n",
+    "      },\n",
+    "      {\n",
+    "        \"from\": \"assistant\",\n",
+    "        \"value\": \"我是一个语言模型，我叫通义千问。\"\n",
+    "      }\n",
+    "    ]\n",
+    "  }\n",
+    "]\n",
+    "```\n",
+    "\n",
+    "You can also use multi-turn conversations as the training set. Here is a simple example:\n",
+    "\n",
+    "```json\n",
+    "[\n",
+    "  {\n",
+    "    \"id\": \"identity_0\",\n",
+    "    \"conversations\": [\n",
+    "      {\n",
+    "        \"from\": \"user\",\n",
+    "        \"value\": \"你好，能告诉我遛狗的最佳时间吗？\"\n",
+    "      },\n",
+    "      {\n",
+    "        \"from\": \"assistant\",\n",
+    "        \"value\": \"当地最佳遛狗时间因地域差异而异，请问您所在的城市是哪里？\"\n",
+    "      },\n",
+    "      {\n",
+    "        \"from\": \"user\",\n",
+    "        \"value\": \"我在纽约市。\"\n",
+    "      },\n",
+    "      {\n",
+    "        \"from\": \"assistant\",\n",
+    "        \"value\": \"纽约市的遛狗最佳时间通常在早晨6点至8点和晚上8点至10点之间，因为这些时间段气温较低，遛狗更加舒适。但具体时间还需根据气候、气温和季节变化而定。\"\n",
+    "      }\n",
+    "    ]\n",
+    "  }\n",
+    "]\n",
+    "```\n",
+    "\n",
+    "## Fine-Tune the Model\n",
+    "\n",
+    "You can directly run the prepared training script to fine-tune the model. **nproc_per_node** refers to the number of GPUs used fro training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7ab0581e-be85-45e6-a5b7-af9c42ea697b",
+   "metadata": {
+    "ExecutionIndicator": {
+     "show": true
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!torchrun --nproc_per_node 2 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 6601 ../../finetune.py \\\n",
+    "    --model_name_or_path \"Qwen/Qwen-1_8B-Chat/\" \\\n",
+    "    --data_path \"Belle_sampled_qwen.json\" \\\n",
+    "    --bf16 True \\\n",
+    "    --output_dir \"output_qwen\" \\\n",
+    "    --num_train_epochs 5 \\\n",
+    "    --per_device_train_batch_size 1 \\\n",
+    "    --per_device_eval_batch_size 1 \\\n",
+    "    --gradient_accumulation_steps 16 \\\n",
+    "    --evaluation_strategy \"no\" \\\n",
+    "    --save_strategy \"steps\" \\\n",
+    "    --save_steps 1000 \\\n",
+    "    --save_total_limit 10 \\\n",
+    "    --learning_rate 1e-5 \\\n",
+    "    --weight_decay 0.1 \\\n",
+    "    --adam_beta2 0.95 \\\n",
+    "    --warmup_ratio 0.01 \\\n",
+    "    --lr_scheduler_type \"cosine\" \\\n",
+    "    --logging_steps 1 \\\n",
+    "    --report_to \"none\" \\\n",
+    "    --model_max_length 512 \\\n",
+    "    --gradient_checkpointing True \\\n",
+    "    --lazy_preprocess True \\\n",
+    "    --deepspeed \"../../finetune/ds_config_zero2.json\" \\\n",
+    "    --use_lora"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35acf008-1dfe-4d32-8cf5-7022e042aadb",
+   "metadata": {},
+   "source": [
+    "## Merge Weights\n",
+    "\n",
+    "The training of both LoRA and Q-LoRA only saves the adapter parameters. You can load the fine-tuned model and merge weights as shown below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "61021499-4a44-45af-a682-943ed63c2fcb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoModelForCausalLM\n",
+    "from peft import PeftModel\n",
+    "import torch\n",
+    "\n",
+    "model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-1_8B-Chat/\", torch_dtype=torch.float16, device_map=\"auto\", trust_remote_code=True)\n",
+    "model = PeftModel.from_pretrained(model, \"output_qwen/\")\n",
+    "merged_model = model.merge_and_unload()\n",
+    "merged_model.save_pretrained(\"output_qwen_merged\", max_shard_size=\"2048MB\", safe_serialization=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0dfbd261-6451-4532-82e8-3ae19ed93ee1",
+   "metadata": {},
+   "source": [
+    "The tokenizer files are not saved in the new directory in this step. You can copy the tokenizer files or use the following code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ddcba069-340b-4a93-a145-2028b425dd23",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoTokenizer\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\n",
+    "    \"Qwen/Qwen-1_8B-Chat/\",\n",
+    "    trust_remote_code=True\n",
+    ")\n",
+    "\n",
+    "tokenizer.save_pretrained(\"output_qwen_merged\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe9f2878-79d3-4b1c-ba95-ac2f73aa6e1b",
+   "metadata": {},
+   "source": [
+    "## Test the Model\n",
+    "\n",
+    "After merging the weights, we can test the model as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
+    "from transformers.generation import GenerationConfig\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\"output_qwen_merged\", trust_remote_code=True)\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    \"output_qwen_merged\",\n",
+    "    device_map=\"auto\",\n",
+    "    trust_remote_code=True\n",
+    ").eval()\n",
+    "\n",
+    "response, history = model.chat(tokenizer, \"你好\", history=None)\n",
+    "print(response)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}