mirror of
https://github.com/QwenLM/Qwen.git
synced 2026-05-21 00:45:48 +08:00
Update README.md
This commit is contained in:
@@ -249,7 +249,7 @@ We also profile the peak GPU memory usage for encoding 2048 tokens as context (a
|
|||||||
| Quantization | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
|
| Quantization | Peak Usage for Encoding 2048 Tokens | Peak Usage for Generating 8192 Tokens |
|
||||||
| -------------- | :-----------------------------------: | :-------------------------------------: |
|
| -------------- | :-----------------------------------: | :-------------------------------------: |
|
||||||
| BF16 | 18.99GB | 24.40GB |
|
| BF16 | 18.99GB | 24.40GB |
|
||||||
| In4 | 10.20GB | 15.61GB |
|
| Int4 | 10.20GB | 15.61GB |
|
||||||
|
|
||||||
The above speed and memory profiling are conducted using [this script](https://qianwen-res.oss-cn-beijing.aliyuncs.com/profile.py).
|
The above speed and memory profiling are conducted using [this script](https://qianwen-res.oss-cn-beijing.aliyuncs.com/profile.py).
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user