mirror of
https://github.com/QwenLM/Qwen.git
synced 2026-05-20 16:35:47 +08:00
first commit
This commit is contained in:
45
eval/EVALUATION.md
Normal file
45
eval/EVALUATION.md
Normal file
@@ -0,0 +1,45 @@
|
||||
## 评测复现
|
||||
|
||||
- CEVAL
|
||||
|
||||
```Shell
|
||||
wget https://huggingface.co/datasets/ceval/ceval-exam/resolve/main/ceval-exam.zip
|
||||
mkdir data/ceval
|
||||
mv ceval-exam.zip data/ceval
|
||||
cd data/ceval; unzip ceval-exam.zip
|
||||
cd ../../
|
||||
python evaluate_ceval.py -d data/ceval/
|
||||
```
|
||||
|
||||
- MMLU
|
||||
|
||||
```Shell
|
||||
wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
|
||||
mkdir data/mmlu
|
||||
mv data.tar data/mmlu
|
||||
cd data/mmlu; tar xf data.tar
|
||||
cd ../../
|
||||
python evaluate_mmlu.py -d data/mmlu/data/
|
||||
```
|
||||
|
||||
- HumanEval
|
||||
|
||||
Get the HumanEval.jsonl file from [here](https://github.com/openai/human-eval/tree/master/data)
|
||||
|
||||
```Shell
|
||||
python evaluate_humaneval.py -f HumanEval.jsonl -o HumanEval_res.jsonl
|
||||
git clone https://github.com/openai/human-eval
|
||||
pip install -e human-eval
|
||||
evaluate_functional_correctness HumanEval_res.jsonl
|
||||
```
|
||||
|
||||
When installing package human-eval, please note its following disclaimer:
|
||||
|
||||
This program exists to run untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The execution call in execution.py is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner. See the comment in execution.py for more information and instructions.
|
||||
|
||||
|
||||
- GSM8K
|
||||
|
||||
```Shell
|
||||
python evaluate_gsm8k.py
|
||||
```
|
||||
Reference in New Issue
Block a user