update readme

2026-05-20 08:25:47 +08:00 · 2023-08-09 14:05:40 +08:00
parent f9870f5ce7
commit 6b8fd32248
4 changed files with 153 additions and 71 deletions
--- a/README_CN.md
+++ b/README_CN.md
@@ -33,18 +33,18 @@

 Qwen-7B在多个全面评估自然语言理解与生成、数学运算解题、代码生成等能力的评测数据集上，包括MMLU、C-Eval、GSM8K、HumanEval、WMT22等，均超出了同规模大语言模型的表现，甚至超出了如12-13B参数等更大规模的语言模型。

-| Model        | MMLU     |   C-Eval |    GSM8K | HumanEval | WMT22 (en-zh) |
-| :------------- | ---------- | ---------: | ---------: | ----------: | --------------: |
-| LLaMA-7B     | 35.1     |        - |     11.0 |      10.5 |           8.7 |
-| LLaMA 2-7B   | 45.3     |        - |     14.6 |      12.8 |          17.9 |
-| Baichuan-7B  | 42.3     |     42.8 |      9.7 |       9.2 |          26.6 |
-| ChatGLM2-6B  | 47.9     |     51.7 |     32.4 |       9.2 |             - |
-| InternLM-7B  | 51.0     |     52.8 |     31.2 |      10.4 |          14.8 |
-| Baichuan-13B | 51.6     |     53.6 |     26.6 |      12.8 |          30.0 |
-| LLaMA-13B    | 46.9     |     35.5 |     17.8 |      15.8 |          12.0 |
-| LLaMA 2-13B  | 54.8     |        - |     28.7 |      18.3 |          24.2 |
-| ChatGLM2-12B | 56.2     | **61.6** |     40.9 |         - |             - |
-| **Qwen-7B**  | **56.7** |     59.6 | **51.6** |  **24.4** |      **30.6** |
+| Model             | MMLU           |         C-Eval |          GSM8K |      HumanEval |  WMT22 (en-zh) |
+| :---------------- | :------------: | :------------: | :------------: | :------------: | :------------: |
+| LLaMA-7B          | 35.1           |              - |           11.0 |           10.5 |            8.7 |
+| LLaMA 2-7B        | 45.3           |              - |           14.6 |           12.8 |           17.9 |
+| Baichuan-7B       | 42.3           |           42.8 |            9.7 |            9.2 |           26.6 |
+| ChatGLM2-6B       | 47.9           |           51.7 |           32.4 |            9.2 |              - |
+| InternLM-7B       | 51.0           |           52.8 |           31.2 |           10.4 |           14.8 |
+| Baichuan-13B      | 51.6           |           53.6 |           26.6 |           12.8 |           30.0 |
+| LLaMA-13B         | 46.9           |           35.5 |           17.8 |           15.8 |           12.0 |
+| LLaMA 2-13B       | 54.8           |              - |           28.7 |           18.3 |           24.2 |
+| ChatGLM2-12B      | 56.2           |       **61.6** |           40.9 |              - |              - |
+| **Qwen-7B**       | **56.7**       |           59.6 |       **51.6** |       **24.4** |       **30.6** |

 <p align="center">
    <img src="assets/performance.png" width="1000"/>
@@ -83,7 +83,7 @@ cd flash-attention && pip install .

 #### 🤗 Transformers

-如希望使用Qwen-7B-chat进行推理，所需要写的只是如下所示的数行代码：
+如希望使用Qwen-7B-chat进行推理，所需要写的只是如下所示的数行代码。**请确保你使用的是最新代码。**

 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -192,7 +192,6 @@ print(f'Response: {response}')

 基于tiktoken的tokenizer有别于其他分词器，比如sentencepiece tokenizer。尤其在微调阶段，需要特别注意特殊token的使用。关于tokenizer的更多信息，以及微调时涉及的相关使用，请参阅[文档](tokenization_note_zh.md)。

-
 ## 量化

 如希望使用更低精度的量化模型，如4比特和8比特的模型，我们提供了简单的示例来说明如何快速使用量化模型。在开始前，确保你已经安装了`bitsandbytes`。请注意，`bitsandbytes`的安装要求是：
@@ -201,6 +200,12 @@ print(f'Response: {response}')
 **Requirements** Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
 ```

+随后运行如下命令安装`bitsandbytes`:
+
+```
+pip install bitsandbytes
+```
+
 Windows用户需安装特定版本的`bitsandbytes`，可选项包括[bitsandbytes-windows-webui](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels)。

 你只需要在`AutoModelForCausalLM.from_pretrained`中添加你的量化配置，即可使用量化模型。如下所示：
@@ -229,25 +234,45 @@ model = AutoModelForCausalLM.from_pretrained(

 上述方法可以让我们将模型量化成`NF4`和`Int8`精度的模型进行读取，帮助我们节省显存开销。我们也提供了相关性能数据。我们发现尽管模型在效果上存在损失，但模型的显存开销大幅降低。

-| Precision | MMLU | Memory |
-| :---------: | -------: | -----: |
-|   BF16   |  56.7 |   16.2G |
-|   Int8   |  52.8 |   10.1G |
-|    NF4    |  48.9 |    7.4G |
+| Precision   |   MMLU   |  Memory  |
+| :---------: | :------: | :------: |
+|   BF16      |   56.7   |   16.2G  |
+|   Int8      |   52.8   |   10.1G  |
+|    NF4      |   48.9   |   7.4G   |

-## 交互式Demo
+## Demo

-我们提供了一个简单的交互式Demo示例，请查看`cli_demo.py`。当前模型已经支持流式输出，用户可通过输入文字的方式和Qwen-7B-Chat交互，模型将流式输出返回结果。
+### 交互式Demo
+
+我们提供了一个简单的交互式Demo示例，请查看`cli_demo.py`。当前模型已经支持流式输出，用户可通过输入文字的方式和Qwen-7B-Chat交互，模型将流式输出返回结果。运行如下命令：
+
+```
+python cli_demo.py
+```
+
+### Web UI
+
+我们提供了Web UI的demo供用户使用 (感谢 @wysiad 支持)。在开始前，确保已经安装如下代码库：
+
+```
+pip install gradio mdtex2html
+```
+
+随后运行如下命令，并点击生成链接：
+
+```
+python web_demo.py
+```

 ## 工具调用

 Qwen-7B-Chat针对包括API、数据库、模型等工具在内的调用进行了优化。用户可以开发基于Qwen-7B的LangChain、Agent甚至Code Interpreter。在我们开源的[评测数据集](eval/EVALUATION.md)上测试模型的工具调用能力，并发现Qwen-7B-Chat能够取得稳定的表现。

-| Model       | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ |
-| ------------- | ------------------------- | ------------------------ | ------------------------ |
-| GPT-4       | 95%                     | **0.90**               | 15%                    |
-| GPT-3.5     | 85%                     | 0.88                   | 75%                    |
-| **Qwen-7B** | **99%**                 | 0.89                   | **9.7%**               |
+| Model       | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑)  | False Positive Error↓  |
+|:------------|:----------------------:|:----------------------:|:----------------------:|
+| GPT-4       | 95%                    | **0.90**               | 15%                    |
+| GPT-3.5     | 85%                    | 0.88                   | 75%                    |
+| **Qwen-7B** | **99%**                | 0.89                   | **9.7%**               |

 我们提供了文档说明如何根据ReAct Prompting的原则写作你的prompt。

@@ -255,12 +280,12 @@ For how to write and use prompts for ReAct Prompting, please refer to [the ReAct

 此外，我们还提供了实验结果表明我们的模型扮演Agent的能力。请阅读相关文档[链接](https://huggingface.co/docs/transformers/transformers_agents)了解更多信息。模型在Hugging Face提供的评测数据集上表现如下：

-| Model           | Tool Selection↑ | Tool Used↑ | Code↑    |
-| ----------------- | ------------------ | ------------- | ----------- |
-| GPT-4           | **100**          | **100**     | **97.41** |
-| GPT-3.5         | 95.37            | 96.30       | 87.04     |
-| StarCoder-15.5B | 87.04            | 87.96       | 68.89     |
-| **Qwen-7B**     | 90.74            | 92.59       | 74.07     |
+| Model          | Tool Selection↑ | Tool Used↑  |   Code↑   |
+|:---------------|:---------------:|:-----------:|:---------:|
+|GPT-4           |     **100**     |   **100**   | **97.41** |
+|GPT-3.5         |      95.37      |    96.30    |   87.04   |
+|StarCoder-15.5B |      87.04      |    87.96    |   68.89   |
+| **Qwen-7B**    |      90.74      |    92.59    |   74.07   |

 ## 长文本理解

@@ -298,3 +323,4 @@ For how to write and use prompts for ReAct Prompting, please refer to [the ReAct
 ## 联系我们

 如果你想给我们的研发团队和产品团队留言，请通过邮件（qianwen_opensource@alibabacloud.com）联系我们。
+