mirror of
https://github.com/QwenLM/Qwen.git
synced 2026-05-21 00:45:48 +08:00
Update README.md
This commit is contained in:
11
README.md
11
README.md
@@ -889,18 +889,13 @@ The statistics are listed below:
|
|||||||
|
|
||||||
For deployment and fast inference, we suggest using vLLM.
|
For deployment and fast inference, we suggest using vLLM.
|
||||||
|
|
||||||
If you use cuda 12.1 and pytorch 2.1, you can directly use the following command to install vLLM.
|
If you use **CUDA 12.1 and PyTorch 2.1**, you can directly use the following command to install vLLM.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# pip install vllm # This line is faster but it does not support quantization models.
|
pip install vllm
|
||||||
|
|
||||||
# The below lines support int4 quantization (int8 will be supported soon). The installation are slower (~10 minutes).
|
|
||||||
git clone https://github.com/QwenLM/vllm-gptq
|
|
||||||
cd vllm-gptq
|
|
||||||
pip install -e .
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Otherwise, please refer to the official vLLM [Installation Instructions](https://docs.vllm.ai/en/latest/getting_started/installation.html), or our [vLLM repo for GPTQ quantization](https://github.com/QwenLM/vllm-gptq).
|
Otherwise, please refer to the official vLLM [Installation Instructions](https://docs.vllm.ai/en/latest/getting_started/installation.html).
|
||||||
|
|
||||||
#### vLLM + Transformer-like Wrapper
|
#### vLLM + Transformer-like Wrapper
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user