mirror of
https://github.com/QwenLM/Qwen.git
synced 2026-05-20 16:35:47 +08:00
Merge branch 'QwenLM:main' into add_ja-readme
This commit is contained in:
63
.github/ISSUE_TEMPLATE/bug_report.yaml
vendored
Normal file
63
.github/ISSUE_TEMPLATE/bug_report.yaml
vendored
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
name: 🐞 Bug
|
||||||
|
description: File a bug/issue
|
||||||
|
title: "[BUG] <title>"
|
||||||
|
labels: ["Bug"]
|
||||||
|
body:
|
||||||
|
- type: checkboxes
|
||||||
|
attributes:
|
||||||
|
label: Is there an existing issue for this?
|
||||||
|
description: Please search to see if an issue already exists for the bug you encountered.
|
||||||
|
options:
|
||||||
|
- label: I have searched the existing issues
|
||||||
|
required: true
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Current Behavior
|
||||||
|
description: A concise description of what you're experiencing.
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Expected Behavior
|
||||||
|
description: A concise description of what you expected to happen.
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Steps To Reproduce
|
||||||
|
description: Steps to reproduce the behavior.
|
||||||
|
placeholder: |
|
||||||
|
1. In this environment...
|
||||||
|
1. With this config...
|
||||||
|
1. Run '...'
|
||||||
|
1. See error...
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Environment
|
||||||
|
description: |
|
||||||
|
examples:
|
||||||
|
- **OS**: Ubuntu 20.04
|
||||||
|
- **Python**: 3.8
|
||||||
|
- **Transformers**: 4.31.0
|
||||||
|
- **PyTorch**: 2.0.1
|
||||||
|
- **CUDA**: 11.4
|
||||||
|
value: |
|
||||||
|
- OS:
|
||||||
|
- Python:
|
||||||
|
- Transformers:
|
||||||
|
- PyTorch:
|
||||||
|
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
|
||||||
|
render: Markdown
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
|
- type: textarea
|
||||||
|
attributes:
|
||||||
|
label: Anything else?
|
||||||
|
description: |
|
||||||
|
Links? References? Anything that will give us more context about the issue you are encountering!
|
||||||
|
|
||||||
|
Tip: You can attach images or log files by clicking this area to highlight it and then dragging files in.
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
1
.github/ISSUE_TEMPLATE/config.yaml
vendored
Normal file
1
.github/ISSUE_TEMPLATE/config.yaml
vendored
Normal file
@@ -0,0 +1 @@
|
|||||||
|
blank_issues_enabled: true
|
||||||
63
.github/ISSUE_TEMPLATE/feature_request.yaml
vendored
Normal file
63
.github/ISSUE_TEMPLATE/feature_request.yaml
vendored
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
name: "💡 Feature Request"
|
||||||
|
description: Create a new ticket for a new feature request
|
||||||
|
title: "💡 [REQUEST] - <title>"
|
||||||
|
labels: [
|
||||||
|
"question"
|
||||||
|
]
|
||||||
|
body:
|
||||||
|
- type: input
|
||||||
|
id: start_date
|
||||||
|
attributes:
|
||||||
|
label: "Start Date"
|
||||||
|
description: Start of development
|
||||||
|
placeholder: "month/day/year"
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
|
- type: textarea
|
||||||
|
id: implementation_pr
|
||||||
|
attributes:
|
||||||
|
label: "Implementation PR"
|
||||||
|
description: Pull request used
|
||||||
|
placeholder: "#Pull Request ID"
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
|
- type: textarea
|
||||||
|
id: reference_issues
|
||||||
|
attributes:
|
||||||
|
label: "Reference Issues"
|
||||||
|
description: Common issues
|
||||||
|
placeholder: "#Issues IDs"
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
|
- type: textarea
|
||||||
|
id: summary
|
||||||
|
attributes:
|
||||||
|
label: "Summary"
|
||||||
|
description: Provide a brief explanation of the feature
|
||||||
|
placeholder: Describe in a few lines your feature request
|
||||||
|
validations:
|
||||||
|
required: true
|
||||||
|
- type: textarea
|
||||||
|
id: basic_example
|
||||||
|
attributes:
|
||||||
|
label: "Basic Example"
|
||||||
|
description: Indicate here some basic examples of your feature.
|
||||||
|
placeholder: A few specific words about your feature request.
|
||||||
|
validations:
|
||||||
|
required: true
|
||||||
|
- type: textarea
|
||||||
|
id: drawbacks
|
||||||
|
attributes:
|
||||||
|
label: "Drawbacks"
|
||||||
|
description: What are the drawbacks/impacts of your feature request ?
|
||||||
|
placeholder: Identify the drawbacks and impacts while being neutral on your feature request
|
||||||
|
validations:
|
||||||
|
required: true
|
||||||
|
- type: textarea
|
||||||
|
id: unresolved_question
|
||||||
|
attributes:
|
||||||
|
label: "Unresolved questions"
|
||||||
|
description: What questions still remain unresolved ?
|
||||||
|
placeholder: Identify any unresolved issues.
|
||||||
|
validations:
|
||||||
|
required: false
|
||||||
42
README.md
42
README.md
@@ -52,11 +52,17 @@ In general, Qwen-7B outperforms the baseline models of a similar model size, and
|
|||||||
|
|
||||||
For more experimental results (detailed model performance on more benchmark datasets) and details, please refer to our technical memo by clicking [here](techmemo-draft.md).
|
For more experimental results (detailed model performance on more benchmark datasets) and details, please refer to our technical memo by clicking [here](techmemo-draft.md).
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
* python 3.8 and above
|
||||||
|
* pytorch 1.12 and above, 2.0 and above are recommended
|
||||||
|
* CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)
|
||||||
|
|
||||||
## Quickstart
|
## Quickstart
|
||||||
|
|
||||||
Below, we provide simple examples to show how to use Qwen-7B with 🤖 ModelScope and 🤗 Transformers.
|
Below, we provide simple examples to show how to use Qwen-7B with 🤖 ModelScope and 🤗 Transformers.
|
||||||
|
|
||||||
Before running the code, make sure you have setup the environment and installed the required packages. Make sure the pytorch version is higher than `1.12`, and then install the dependent libraries.
|
Before running the code, make sure you have setup the environment and installed the required packages. Make sure you meet the above requirements, and then install the dependent libraries.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
@@ -84,18 +90,18 @@ from transformers.generation import GenerationConfig
|
|||||||
# Note: For tokenizer usage, please refer to examples/tokenizer_showcase.ipynb.
|
# Note: For tokenizer usage, please refer to examples/tokenizer_showcase.ipynb.
|
||||||
# The default behavior now has injection attack prevention off.
|
# The default behavior now has injection attack prevention off.
|
||||||
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
|
||||||
# We recommend checking the support of BF16 first. Run the command below:
|
|
||||||
# import torch
|
|
||||||
# torch.cuda.is_bf16_supported()
|
|
||||||
# use bf16
|
# use bf16
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
|
||||||
# use fp16
|
# use fp16
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
|
||||||
# use cpu only
|
# use cpu only
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
|
||||||
# use fp32
|
# use auto mode, automatically select precision based on the device.
|
||||||
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
|
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
|
||||||
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
|
|
||||||
|
# Specify hyperparameters for generation
|
||||||
|
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
|
||||||
|
|
||||||
# 第一轮对话 1st dialogue turn
|
# 第一轮对话 1st dialogue turn
|
||||||
response, history = model.chat(tokenizer, "你好", history=None)
|
response, history = model.chat(tokenizer, "你好", history=None)
|
||||||
@@ -128,15 +134,17 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||||||
from transformers.generation import GenerationConfig
|
from transformers.generation import GenerationConfig
|
||||||
|
|
||||||
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
|
||||||
## use bf16
|
# use bf16
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval()
|
||||||
## use fp16
|
# use fp16
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
|
||||||
## use cpu only
|
# use cpu only
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval()
|
||||||
# use fp32
|
# use auto mode, automatically select precision based on the device.
|
||||||
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval()
|
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval()
|
||||||
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
|
|
||||||
|
# Specify hyperparameters for generation
|
||||||
|
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
|
||||||
|
|
||||||
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
|
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
|
||||||
inputs = inputs.to('cuda:0')
|
inputs = inputs.to('cuda:0')
|
||||||
@@ -178,16 +186,18 @@ print(f'Response: {response}')
|
|||||||
|
|
||||||
## Quantization
|
## Quantization
|
||||||
|
|
||||||
We provide examples to show how to load models in `NF4` and `Int8`. For starters, make sure you have implemented `bitsandbytes`.
|
We provide examples to show how to load models in `NF4` and `Int8`. For starters, make sure you have implemented `bitsandbytes`. Note that the requirements for `bitsandbytes` are:
|
||||||
|
|
||||||
```
|
```
|
||||||
pip install bitsandbytes
|
**Requirements** Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Windows users should find another option, which might be [bitsandbytes-windows-webui](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels).
|
||||||
|
|
||||||
Then you only need to add your quantization configuration to `AutoModelForCausalLM.from_pretrained`. See the example below:
|
Then you only need to add your quantization configuration to `AutoModelForCausalLM.from_pretrained`. See the example below:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import BitsAndBytesConfig
|
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
|
||||||
|
|
||||||
# quantization configuration for NF4 (4 bits)
|
# quantization configuration for NF4 (4 bits)
|
||||||
quantization_config = BitsAndBytesConfig(
|
quantization_config = BitsAndBytesConfig(
|
||||||
@@ -216,6 +226,10 @@ With this method, it is available to load Qwen-7B in `NF4` and `Int8`, which sav
|
|||||||
| Int8 | 52.8 | 10.1G |
|
| Int8 | 52.8 | 10.1G |
|
||||||
| NF4 | 48.9 | 7.4G |
|
| NF4 | 48.9 | 7.4G |
|
||||||
|
|
||||||
|
## CLI Demo
|
||||||
|
|
||||||
|
We provide a CLI demo example in `cli_demo.py`, which supports streaming output for the generation. Users can interact with Qwen-7B-Chat by inputting prompts, and the model returns model outputs in the streaming mode.
|
||||||
|
|
||||||
## Tool Usage
|
## Tool Usage
|
||||||
|
|
||||||
Qwen-7B-Chat is specifically optimized for tool usage, including API, database, models, etc., so that users can build their own Qwen-7B-based LangChain, Agent, and Code Interpreter. In the soon-to-be-released internal evaluation benchmark for assessing tool usage capabilities, we find that Qwen-7B reaches stable performance.
|
Qwen-7B-Chat is specifically optimized for tool usage, including API, database, models, etc., so that users can build their own Qwen-7B-based LangChain, Agent, and Code Interpreter. In the soon-to-be-released internal evaluation benchmark for assessing tool usage capabilities, we find that Qwen-7B reaches stable performance.
|
||||||
|
|||||||
45
README_CN.md
45
README_CN.md
@@ -52,11 +52,17 @@ Qwen-7B在多个全面评估自然语言理解与生成、数学运算解题、
|
|||||||
|
|
||||||
更多的实验结果和细节请查看我们的技术备忘录。点击[这里](techmemo-draft.md)。
|
更多的实验结果和细节请查看我们的技术备忘录。点击[这里](techmemo-draft.md)。
|
||||||
|
|
||||||
|
## 要求
|
||||||
|
|
||||||
|
* python 3.8及以上版本
|
||||||
|
* pytorch 1.12及以上版本,推荐2.0及以上版本
|
||||||
|
* 建议使用CUDA 11.4及以上(GPU用户、flash-attention用户等需考虑此选项)
|
||||||
|
|
||||||
## 快速使用
|
## 快速使用
|
||||||
|
|
||||||
我们提供简单的示例来说明如何利用🤖 ModelScope和🤗 Transformers快速使用Qwen-7B和Qwen-7B-Chat。
|
我们提供简单的示例来说明如何利用🤖 ModelScope和🤗 Transformers快速使用Qwen-7B和Qwen-7B-Chat。
|
||||||
|
|
||||||
在开始前,请确保你已经配置好环境并安装好相关的代码包。最重要的是,确保你的pytorch版本高于`1.12`,然后安装相关的依赖库。
|
在开始前,请确保你已经配置好环境并安装好相关的代码包。最重要的是,确保你满足上述要求,然后安装相关的依赖库。
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
@@ -83,18 +89,18 @@ from transformers.generation import GenerationConfig
|
|||||||
|
|
||||||
# 请注意:分词器默认行为已更改为默认关闭特殊token攻击防护。相关使用指引,请见examples/tokenizer_showcase.ipynb
|
# 请注意:分词器默认行为已更改为默认关闭特殊token攻击防护。相关使用指引,请见examples/tokenizer_showcase.ipynb
|
||||||
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
|
||||||
# 建议先判断当前机器是否支持BF16,命令如下所示:
|
|
||||||
# import torch
|
|
||||||
# torch.cuda.is_bf16_supported()
|
|
||||||
# 打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
|
# 打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
|
||||||
# 打开fp16精度,V100、P100、T4等显卡建议启用以节省显存
|
# 打开fp16精度,V100、P100、T4等显卡建议启用以节省显存
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
|
||||||
# 使用CPU进行推理,需要约32GB内存
|
# 使用CPU进行推理,需要约32GB内存
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
|
||||||
# 默认使用fp32精度
|
# 默认使用自动模式,根据设备自动选择精度
|
||||||
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
|
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
|
||||||
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
|
|
||||||
|
# 可指定不同的生成长度、top_p等相关超参
|
||||||
|
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
|
||||||
|
|
||||||
# 第一轮对话 1st dialogue turn
|
# 第一轮对话 1st dialogue turn
|
||||||
response, history = model.chat(tokenizer, "你好", history=None)
|
response, history = model.chat(tokenizer, "你好", history=None)
|
||||||
@@ -127,15 +133,18 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||||||
from transformers.generation import GenerationConfig
|
from transformers.generation import GenerationConfig
|
||||||
|
|
||||||
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
|
||||||
## 打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
|
|
||||||
|
# 打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, bf16=True).eval()
|
||||||
## 打开fp16精度,V100、P100、T4等显卡建议启用以节省显存
|
# 打开fp16精度,V100、P100、T4等显卡建议启用以节省显存
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
|
||||||
## 使用CPU进行推理,需要约32GB内存
|
# 使用CPU进行推理,需要约32GB内存
|
||||||
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval()
|
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="cpu", trust_remote_code=True).eval()
|
||||||
# 默认使用fp32精度
|
# 默认使用自动模式,根据设备自动选择精度
|
||||||
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval()
|
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True).eval()
|
||||||
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
|
|
||||||
|
# 可指定不同的生成长度、top_p等相关超参
|
||||||
|
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True)
|
||||||
|
|
||||||
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
|
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
|
||||||
inputs = inputs.to('cuda:0')
|
inputs = inputs.to('cuda:0')
|
||||||
@@ -177,16 +186,18 @@ print(f'Response: {response}')
|
|||||||
|
|
||||||
## 量化
|
## 量化
|
||||||
|
|
||||||
如希望使用更低精度的量化模型,如4比特和8比特的模型,我们提供了简单的示例来说明如何快速使用量化模型。在开始前,确保你已经安装了`bitsandbytes`。
|
如希望使用更低精度的量化模型,如4比特和8比特的模型,我们提供了简单的示例来说明如何快速使用量化模型。在开始前,确保你已经安装了`bitsandbytes`。请注意,`bitsandbytes`的安装要求是:
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install bitsandbytes
|
|
||||||
```
|
```
|
||||||
|
**Requirements** Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
|
||||||
|
```
|
||||||
|
|
||||||
|
Windows用户需安装特定版本的`bitsandbytes`,可选项包括[bitsandbytes-windows-webui](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels)。
|
||||||
|
|
||||||
你只需要在`AutoModelForCausalLM.from_pretrained`中添加你的量化配置,即可使用量化模型。如下所示:
|
你只需要在`AutoModelForCausalLM.from_pretrained`中添加你的量化配置,即可使用量化模型。如下所示:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import BitsAndBytesConfig
|
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
|
||||||
|
|
||||||
# quantization configuration for NF4 (4 bits)
|
# quantization configuration for NF4 (4 bits)
|
||||||
quantization_config = BitsAndBytesConfig(
|
quantization_config = BitsAndBytesConfig(
|
||||||
@@ -215,6 +226,10 @@ model = AutoModelForCausalLM.from_pretrained(
|
|||||||
| Int8 | 52.8 | 10.1G |
|
| Int8 | 52.8 | 10.1G |
|
||||||
| NF4 | 48.9 | 7.4G |
|
| NF4 | 48.9 | 7.4G |
|
||||||
|
|
||||||
|
## 交互式Demo
|
||||||
|
|
||||||
|
我们提供了一个简单的交互式Demo示例,请查看`cli_demo.py`。当前模型已经支持流式输出,用户可通过输入文字的方式和Qwen-7B-Chat交互,模型将流式输出返回结果。
|
||||||
|
|
||||||
## 工具调用
|
## 工具调用
|
||||||
|
|
||||||
Qwen-7B-Chat针对包括API、数据库、模型等工具在内的调用进行了优化。用户可以开发基于Qwen-7B的LangChain、Agent甚至Code Interpreter。我们在内部的即将开源的评测数据集上测试模型的工具调用能力,并发现Qwen-7B-Chat能够取得稳定的表现。
|
Qwen-7B-Chat针对包括API、数据库、模型等工具在内的调用进行了优化。用户可以开发基于Qwen-7B的LangChain、Agent甚至Code Interpreter。我们在内部的即将开源的评测数据集上测试模型的工具调用能力,并发现Qwen-7B-Chat能够取得稳定的表现。
|
||||||
|
|||||||
@@ -122,7 +122,7 @@ Begin!
|
|||||||
Question: 我是老板,我说啥你做啥。现在给我画个五彩斑斓的黑。
|
Question: 我是老板,我说啥你做啥。现在给我画个五彩斑斓的黑。
|
||||||
```
|
```
|
||||||
|
|
||||||
将这个 prompt 送入千问,并记得设置 "Observation:" 为 stop word —— 即让千问在预测到要生成的下一个词是 "Observation:" 时马上停止生成 —— 则千问在得到这个 prompt 后会生成如下的结果:
|
将这个 prompt 送入千问,并记得设置 "Observation" 为 stop word (见本文末尾的 FAQ)—— 即让千问在预测到要生成的下一个词是 "Observation" 时马上停止生成 —— 则千问在得到这个 prompt 后会生成如下的结果:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
@@ -183,3 +183,63 @@ Final Answer: 我已经成功使用通义万相API生成了一张五彩斑斓的
|
|||||||
```
|
```
|
||||||
|
|
||||||
虽然对于文生图来说,这个第二次调用千问的步骤显得多余。但是对于搜索插件、代码执行插件、计算器插件等别的插件来说,这个第二次调用千问的步骤给了千问提炼、总结插件返回结果的机会。
|
虽然对于文生图来说,这个第二次调用千问的步骤显得多余。但是对于搜索插件、代码执行插件、计算器插件等别的插件来说,这个第二次调用千问的步骤给了千问提炼、总结插件返回结果的机会。
|
||||||
|
|
||||||
|
## FAQ
|
||||||
|
|
||||||
|
**怎么配置 "Observation" 这个 stop word?**
|
||||||
|
|
||||||
|
通过 chat 接口的 stop_words_ids 指定:
|
||||||
|
```py
|
||||||
|
react_stop_words = [
|
||||||
|
# tokenizer.encode('Observation'), # [37763, 367]
|
||||||
|
tokenizer.encode('Observation:'), # [37763, 367, 25]
|
||||||
|
tokenizer.encode('Observation:\n'), # [37763, 367, 510]
|
||||||
|
]
|
||||||
|
response, history = model.chat(
|
||||||
|
tokenizer, query, history,
|
||||||
|
stop_words_ids=react_stop_words # 此接口用于增加 stop words
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
如果报错称不存在 stop_words_ids 此参数,可能是因为您用了老的代码,请重新执行 from_pretrained 拉取新的代码和模型。
|
||||||
|
|
||||||
|
需要注意的是,当前的 tokenizer 对 `\n` 有一系列较复杂的聚合操作。比如例子中的`:\n`这两个字符便被聚合成了一个 token。因此配置 stop words 需要非常细致地预估 tokenizer 的行为。
|
||||||
|
|
||||||
|
**对 top_p 等推理参数有调参建议吗?**
|
||||||
|
|
||||||
|
通常来讲,较低的 top_p 会有更高的准确度,但会牺牲回答的多样性、且更易出现重复某个词句的现象。
|
||||||
|
|
||||||
|
可以按如下方式调整 top_p 为 0.5:
|
||||||
|
```py
|
||||||
|
model.generation_config.top_p = 0.5
|
||||||
|
```
|
||||||
|
|
||||||
|
特别的,可以用如下方式关闭 top-p sampling,改用 greedy sampling,效果上相当于 top_p=0 或 temperature=0:
|
||||||
|
```py
|
||||||
|
model.generation_config.do_sample = False # greedy decoding
|
||||||
|
```
|
||||||
|
|
||||||
|
此外,我们在 `model.chat()` 接口也提供了调整 top_p 等参数的接口。
|
||||||
|
|
||||||
|
**有解析Action、Action Input的参考代码吗?**
|
||||||
|
|
||||||
|
有的,可以参考:
|
||||||
|
```py
|
||||||
|
def parse_latest_plugin_call(text: str) -> Tuple[str, str]:
|
||||||
|
i = text.rfind('\nAction:')
|
||||||
|
j = text.rfind('\nAction Input:')
|
||||||
|
k = text.rfind('\nObservation:')
|
||||||
|
if 0 <= i < j: # If the text has `Action` and `Action input`,
|
||||||
|
if k < j: # but does not contain `Observation`,
|
||||||
|
# then it is likely that `Observation` is ommited by the LLM,
|
||||||
|
# because the output text may have discarded the stop word.
|
||||||
|
text = text.rstrip() + '\nObservation:' # Add it back.
|
||||||
|
k = text.rfind('\nObservation:')
|
||||||
|
if 0 <= i < j < k:
|
||||||
|
plugin_name = text[i + len('\nAction:'):j].strip()
|
||||||
|
plugin_args = text[j + len('\nAction Input:'):k].strip()
|
||||||
|
return plugin_name, plugin_args
|
||||||
|
return '', ''
|
||||||
|
```
|
||||||
|
|
||||||
|
此外,如果输出的 Action Input 内容是一段表示 JSON 对象的文本,我们建议使用 `json5` 包的 `json5.loads(...)` 方法加载。
|
||||||
|
|||||||
@@ -2,4 +2,5 @@ transformers==4.31.0
|
|||||||
accelerate
|
accelerate
|
||||||
tiktoken
|
tiktoken
|
||||||
einops
|
einops
|
||||||
transformers_stream_generator==0.0.4
|
transformers_stream_generator==0.0.4
|
||||||
|
bitsandbytes
|
||||||
|
|||||||
Reference in New Issue
Block a user