mirror of
https://github.com/QwenLM/Qwen.git
synced 2026-05-20 16:35:47 +08:00
43
README.md
43
README.md
@@ -27,7 +27,6 @@ Qwen-7B is the 7B-parameter version of the large language model series, Qwen (ab
|
|||||||
|
|
||||||
The following sections include information that you might find it helpful. Specifically, we advise you to read the FAQ section before you launch issues.
|
The following sections include information that you might find it helpful. Specifically, we advise you to read the FAQ section before you launch issues.
|
||||||
|
|
||||||
|
|
||||||
## News
|
## News
|
||||||
|
|
||||||
* 2023.8.3 We release both Qwen-7B and Qwen-7B-Chat on ModelScope and Hugging Face. We also provide a technical memo for more details about the model, including training details and model performance.
|
* 2023.8.3 We release both Qwen-7B and Qwen-7B-Chat on ModelScope and Hugging Face. We also provide a technical memo for more details about the model, including training details and model performance.
|
||||||
@@ -251,7 +250,7 @@ Note: The GPU memory usage profiling in the above table is performed on single A
|
|||||||
We measured the average inference speed of generating 2K tokens under BF16 precision and Int8 or NF4 quantization levels, respectively.
|
We measured the average inference speed of generating 2K tokens under BF16 precision and Int8 or NF4 quantization levels, respectively.
|
||||||
|
|
||||||
| Quantization Level | Inference Speed with flash_attn (tokens/s) | Inference Speed w/o flash_attn (tokens/s) |
|
| Quantization Level | Inference Speed with flash_attn (tokens/s) | Inference Speed w/o flash_attn (tokens/s) |
|
||||||
| ------ | :---------------------------: | :---------------------------: |
|
| ---------------------- | :----------------------------------------: | :---------------------------------------: |
|
||||||
| BF16 (no quantization) | 30.06 | 27.55 |
|
| BF16 (no quantization) | 30.06 | 27.55 |
|
||||||
| Int8 (bnb) | 7.94 | 7.86 |
|
| Int8 (bnb) | 7.94 | 7.86 |
|
||||||
| NF4 (bnb) | 21.43 | 20.37 |
|
| NF4 (bnb) | 21.43 | 20.37 |
|
||||||
@@ -265,7 +264,7 @@ We also profile the peak GPU memory usage for encoding 2048 tokens as context (a
|
|||||||
When using flash attention, the memory usage is:
|
When using flash attention, the memory usage is:
|
||||||
|
|
||||||
| Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
|
| Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
|
||||||
| --- | :---: | :---: |
|
| ------------------ | :---------------------------------: | :-----------------------------------: |
|
||||||
| BF16 | 18.11GB | 23.52GB |
|
| BF16 | 18.11GB | 23.52GB |
|
||||||
| Int8 | 12.17GB | 17.60GB |
|
| Int8 | 12.17GB | 17.60GB |
|
||||||
| NF4 | 9.52GB | 14.93GB |
|
| NF4 | 9.52GB | 14.93GB |
|
||||||
@@ -273,7 +272,7 @@ When using flash attention, the memory usage is:
|
|||||||
When not using flash attention, the memory usage is:
|
When not using flash attention, the memory usage is:
|
||||||
|
|
||||||
| Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
|
| Quantization Level | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
|
||||||
| --- | :---: | :---: |
|
| ------------------ | :---------------------------------: | :-----------------------------------: |
|
||||||
| BF16 | 18.11GB | 24.40GB |
|
| BF16 | 18.11GB | 24.40GB |
|
||||||
| Int8 | 12.18GB | 18.47GB |
|
| Int8 | 12.18GB | 18.47GB |
|
||||||
| NF4 | 9.52GB | 15.81GB |
|
| NF4 | 9.52GB | 15.81GB |
|
||||||
@@ -282,13 +281,6 @@ The above speed and memory profiling are conducted using [this script](https://q
|
|||||||
|
|
||||||
## Demo
|
## Demo
|
||||||
|
|
||||||
### CLI Demo
|
|
||||||
|
|
||||||
We provide a CLI demo example in `cli_demo.py`, which supports streaming output for the generation. Users can interact with Qwen-7B-Chat by inputting prompts, and the model returns model outputs in the streaming mode. Run the command below:
|
|
||||||
|
|
||||||
```
|
|
||||||
python cli_demo.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### Web UI
|
### Web UI
|
||||||
|
|
||||||
@@ -304,16 +296,40 @@ Then run the command below and click on the generated link:
|
|||||||
python web_demo.py
|
python web_demo.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/web_demo.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
|
### CLI Demo
|
||||||
|
|
||||||
|
We provide a CLI demo example in `cli_demo.py`, which supports streaming output for the generation. Users can interact with Qwen-7B-Chat by inputting prompts, and the model returns model outputs in the streaming mode. Run the command below:
|
||||||
|
|
||||||
|
```
|
||||||
|
python cli_demo.py
|
||||||
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/cli_demo.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
We provide methods to deploy local API based on OpenAI API (thanks to @hanpenggit). Before you start, install the required packages:
|
We provide methods to deploy local API based on OpenAI API (thanks to @hanpenggit). Before you start, install the required packages:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install fastapi uvicorn openai pydantic sse_starlette
|
pip install fastapi uvicorn openai pydantic sse_starlette
|
||||||
```
|
```
|
||||||
|
|
||||||
Then run the command to deploy your API:
|
Then run the command to deploy your API:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python openai_api.py
|
python openai_api.py
|
||||||
```
|
```
|
||||||
|
|
||||||
You can change your arguments, e.g., `-c` for checkpoint name or path, `--cpu-only` for CPU deployment, etc. If you meet problems launching your API deployment, updating the packages to the latest version can probably solve them.
|
You can change your arguments, e.g., `-c` for checkpoint name or path, `--cpu-only` for CPU deployment, etc. If you meet problems launching your API deployment, updating the packages to the latest version can probably solve them.
|
||||||
|
|
||||||
Using the API is also simple. See the example below:
|
Using the API is also simple. See the example below:
|
||||||
@@ -345,6 +361,11 @@ response = openai.ChatCompletion.create(
|
|||||||
print(response.choices[0].message.content)
|
print(response.choices[0].message.content)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/openai_api.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
## Tool Usage
|
## Tool Usage
|
||||||
|
|
||||||
|
|||||||
40
README_CN.md
40
README_CN.md
@@ -280,19 +280,10 @@ model = AutoModelForCausalLM.from_pretrained(
|
|||||||
| Int8 | 12.18GB | 18.47GB |
|
| Int8 | 12.18GB | 18.47GB |
|
||||||
| NF4 | 9.52GB | 15.81GB |
|
| NF4 | 9.52GB | 15.81GB |
|
||||||
|
|
||||||
|
|
||||||
以上测速和显存占用情况,均可通过该[评测脚本](https://qianwen-res.oss-cn-beijing.aliyuncs.com/profile.py)测算得到。
|
以上测速和显存占用情况,均可通过该[评测脚本](https://qianwen-res.oss-cn-beijing.aliyuncs.com/profile.py)测算得到。
|
||||||
|
|
||||||
## Demo
|
## Demo
|
||||||
|
|
||||||
### 交互式Demo
|
|
||||||
|
|
||||||
我们提供了一个简单的交互式Demo示例,请查看`cli_demo.py`。当前模型已经支持流式输出,用户可通过输入文字的方式和Qwen-7B-Chat交互,模型将流式输出返回结果。运行如下命令:
|
|
||||||
|
|
||||||
```
|
|
||||||
python cli_demo.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### Web UI
|
### Web UI
|
||||||
|
|
||||||
我们提供了Web UI的demo供用户使用 (感谢 @wysaid 支持)。在开始前,确保已经安装如下代码库:
|
我们提供了Web UI的demo供用户使用 (感谢 @wysaid 支持)。在开始前,确保已经安装如下代码库:
|
||||||
@@ -307,16 +298,41 @@ pip install -r requirements_web_demo.txt
|
|||||||
python web_demo.py
|
python web_demo.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/web_demo.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
|
|
||||||
|
### 交互式Demo
|
||||||
|
|
||||||
|
我们提供了一个简单的交互式Demo示例,请查看`cli_demo.py`。当前模型已经支持流式输出,用户可通过输入文字的方式和Qwen-7B-Chat交互,模型将流式输出返回结果。运行如下命令:
|
||||||
|
|
||||||
|
```
|
||||||
|
python cli_demo.py
|
||||||
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/cli_demo.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
我们提供了OpenAI API格式的本地API部署方法(感谢@hanpenggit)。在开始之前先安装必要的代码库:
|
我们提供了OpenAI API格式的本地API部署方法(感谢@hanpenggit)。在开始之前先安装必要的代码库:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install fastapi uvicorn openai pydantic sse_starlette
|
pip install fastapi uvicorn openai pydantic sse_starlette
|
||||||
```
|
```
|
||||||
|
|
||||||
随后即可运行以下命令部署你的本地API:
|
随后即可运行以下命令部署你的本地API:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python openai_api.py
|
python openai_api.py
|
||||||
```
|
```
|
||||||
|
|
||||||
你也可以修改参数,比如`-c`来修改模型名称或路径, `--cpu-only`改为CPU部署等等。如果部署出现问题,更新上述代码库往往可以解决大多数问题。
|
你也可以修改参数,比如`-c`来修改模型名称或路径, `--cpu-only`改为CPU部署等等。如果部署出现问题,更新上述代码库往往可以解决大多数问题。
|
||||||
|
|
||||||
使用API同样非常简单,示例如下:
|
使用API同样非常简单,示例如下:
|
||||||
@@ -348,6 +364,11 @@ response = openai.ChatCompletion.create(
|
|||||||
print(response.choices[0].message.content)
|
print(response.choices[0].message.content)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/openai_api.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
## 工具调用
|
## 工具调用
|
||||||
|
|
||||||
@@ -405,7 +426,6 @@ For how to write and use prompts for ReAct Prompting, please refer to [the ReAct
|
|||||||
|
|
||||||
如遇到问题,敬请查阅[FAQ](FAQ_zh.md)以及issue区,如仍无法解决再提交issue。
|
如遇到问题,敬请查阅[FAQ](FAQ_zh.md)以及issue区,如仍无法解决再提交issue。
|
||||||
|
|
||||||
|
|
||||||
## 使用协议
|
## 使用协议
|
||||||
|
|
||||||
研究人员与开发者可使用Qwen-7B和Qwen-7B-Chat或进行二次开发。我们同样允许商业使用,具体细节请查看[LICENSE](LICENSE)。如需商用,请填写[问卷](https://dashscope.console.aliyun.com/openModelApply/qianwen)申请。
|
研究人员与开发者可使用Qwen-7B和Qwen-7B-Chat或进行二次开发。我们同样允许商业使用,具体细节请查看[LICENSE](LICENSE)。如需商用,请填写[问卷](https://dashscope.console.aliyun.com/openModelApply/qianwen)申请。
|
||||||
|
|||||||
35
README_JA.md
35
README_JA.md
@@ -285,14 +285,6 @@ Flash attentionを使用しない場合、メモリ使用量は次のように
|
|||||||
|
|
||||||
## デモ
|
## デモ
|
||||||
|
|
||||||
### CLI デモ
|
|
||||||
|
|
||||||
`cli_demo.py` に CLI のデモ例を用意しています。ユーザはプロンプトを入力することで Qwen-7B-Chat と対話することができ、モデルはストリーミングモードでモデルの出力を返します。以下のコマンドを実行する:
|
|
||||||
|
|
||||||
```
|
|
||||||
python cli_demo.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### ウェブ UI
|
### ウェブ UI
|
||||||
|
|
||||||
ウェブUIデモを構築するためのコードを提供します(@wysaidに感謝)。始める前に、以下のパッケージがインストールされていることを確認してください:
|
ウェブUIデモを構築するためのコードを提供します(@wysaidに感謝)。始める前に、以下のパッケージがインストールされていることを確認してください:
|
||||||
@@ -307,7 +299,28 @@ pip install -r requirements_web_demo.txt
|
|||||||
python web_demo.py
|
python web_demo.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/web_demo.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
|
### CLI デモ
|
||||||
|
|
||||||
|
`cli_demo.py` に CLI のデモ例を用意しています。ユーザはプロンプトを入力することで Qwen-7B-Chat と対話することができ、モデルはストリーミングモードでモデルの出力を返します。以下のコマンドを実行する:
|
||||||
|
|
||||||
|
```
|
||||||
|
python cli_demo.py
|
||||||
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/cli_demo.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
OpenAI APIをベースにローカルAPIをデプロイする方法を提供する(@hanpenggitに感謝)。始める前に、必要なパッケージをインストールしてください:
|
OpenAI APIをベースにローカルAPIをデプロイする方法を提供する(@hanpenggitに感謝)。始める前に、必要なパッケージをインストールしてください:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -351,6 +364,12 @@ response = openai.ChatCompletion.create(
|
|||||||
print(response.choices[0].message.content)
|
print(response.choices[0].message.content)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<br>
|
||||||
|
<img src="assets/openai_api.gif" width="600" />
|
||||||
|
<br>
|
||||||
|
<p>
|
||||||
|
|
||||||
## ツールの使用
|
## ツールの使用
|
||||||
|
|
||||||
Qwen-7B-Chat は、API、データベース、モデルなど、ツールの利用に特化して最適化されており、ユーザは独自の Qwen-7B ベースの LangChain、エージェント、コードインタプリタを構築することができます。ツール利用能力を評価するための評価[ベンチマーク](eval/EVALUATION.md)では、Qwen-7B は安定した性能に達しています。
|
Qwen-7B-Chat は、API、データベース、モデルなど、ツールの利用に特化して最適化されており、ユーザは独自の Qwen-7B ベースの LangChain、エージェント、コードインタプリタを構築することができます。ツール利用能力を評価するための評価[ベンチマーク](eval/EVALUATION.md)では、Qwen-7B は安定した性能に達しています。
|
||||||
|
|||||||
BIN
assets/cli_demo.gif
Normal file
BIN
assets/cli_demo.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.9 MiB |
BIN
assets/openai_api.gif
Normal file
BIN
assets/openai_api.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.1 MiB |
BIN
assets/web_demo.gif
Normal file
BIN
assets/web_demo.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 18 MiB |
Reference in New Issue
Block a user