mirror of
https://github.com/QwenLM/Qwen.git
synced 2026-05-20 16:35:47 +08:00
Update README.md
This commit is contained in:
@@ -159,8 +159,12 @@ print(f'Response: {response}')
|
|||||||
|
|
||||||
## Quantization
|
## Quantization
|
||||||
|
|
||||||
To load the model in lower precision, e.g., 4 bits and 8 bits, we provide examples to show how to load by adding quantization configuration:
|
We provide examples to show how to load models in `NF4` and `Int8`. For starters, make sure you have implemented `bitsandbytes`.
|
||||||
|
```
|
||||||
|
pip install bitsandbytes
|
||||||
|
```
|
||||||
|
|
||||||
|
Then you only need to add your quantization configuration to `AutoModelForCausalLM.from_pretrained`. See the example below:
|
||||||
```python
|
```python
|
||||||
from transformers import BitsAndBytesConfig
|
from transformers import BitsAndBytesConfig
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user