release latest models

This commit is contained in:
yangapku
2023-09-25 10:41:59 +08:00
parent fb3180d8f0
commit fc57dea277
13 changed files with 938 additions and 235 deletions

18
FAQ.md
View File

@@ -4,7 +4,7 @@
#### Failure in installing flash attention
Flash attention is an option for accelerating training and inference. Only NVIDIA GPUs of Turing, Ampere, Ada, and Hopper architecture, e.g., H100, A100, RTX 3090, T4, RTX 2080, can support flash attention. You can use our models without installing it.
Flash attention is an option for accelerating training and inference. Only NVIDIA GPUs of Turing, Ampere, Ada, and Hopper architecture, e.g., H100, A100, RTX 3090, T4, RTX 2080, can support flash attention. **You can use our models without installing it.**
#### Which version of transformers should I use?
@@ -20,7 +20,7 @@ This is the merge file of the tokenizer. You have to download it. Note that if y
#### transformers_stream_generator/tiktoken/accelerate not found
Run the command `pip install -r requirements.txt`. You can find the file at [https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt).
Run the command `pip install -r requirements.txt`. You can find the file at [https://github.com/QwenLM/Qwen-7B/blob/main/requirements.txt](https://github.com/QwenLM/Qwen/blob/main/requirements.txt).
<br><br>
@@ -32,7 +32,6 @@ Run the command `pip install -r requirements.txt`. You can find the file at [htt
Yes, see `web_demo.py` for web demo and `cli_demo.py` for CLI demo. See README for more information.
#### Can I use CPU only?
Yes, run `python cli_demo.py --cpu-only` will load the model and inference on CPU only.
@@ -47,19 +46,16 @@ This is because tokens represent bytes and a single token may be a meaningless s
#### It seems that the generation is not related to the instruction...
Please check if you are loading Qwen-7B-Chat instead of Qwen-7B. Qwen-7B is the base model without alignment, which behaves differently from the SFT/Chat model.
Please check if you are loading Qwen-Chat instead of Qwen. Qwen is the base model without alignment, which behaves differently from the SFT/Chat model.
#### Is quantization supported?
Yes, the quantization is supported by `bitsandbytes`. We are working on an improved version and will release the quantized model checkpoints.
Yes, the quantization is supported by AutoGPTQ.
#### Errors in running quantized models: `importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes`
For Linux usersrunning `pip install bitsandbytes` directly can solve the problem. For Windows users, you can run `python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui`·
#### Slow when processing long sequences
We solved this problem. Updating the code to the latest version can help.
Updating the code to the latest version can help.
#### Unsatisfactory performance in processing long sequences
@@ -72,7 +68,9 @@ Please ensure that NTK is applied. `use_dynamc_ntk` and `use_logn_attn` in `conf
#### Can Qwen support SFT or even RLHF?
We do not provide finetuning or RLHF codes for now. However, some projects have supported finetuning, see [FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)), [Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly)), [**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning)), etc. We will soon update the relevant codes.
Yes, we now support SFT, including full-parameter finetuning, LoRA, and Q-LoRA. Also you can check other projects like [FastChat](**[https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)), [Firefly]([https://github.com/yangjianxin1/Firefly](https://github.com/yangjianxin1/Firefly)), [**LLaMA Efficient Tuning**]([https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning)), etc.
However, temporarily we do not support RLHF. We will provide the code in the near future.
<br><br>