Sean
|
2b565da220
|
Update evaluate_plugin.py
change Old Evaluation Dataset (Version 20230803) to new version
|
2024-02-17 17:27:49 +08:00 |
|
兼欣
|
7eb9016908
|
update agent benchmarks and add qwen-72b results
|
2023-12-06 12:57:11 +08:00 |
|
yangapku
|
e8e15962d8
|
add 72B and 1.8B Qwen models, add Ascend 910 and Hygon DCU support, add docker support
|
2023-11-30 15:29:13 +08:00 |
|
yangapku
|
c00209f932
|
update evaluate scripts
|
2023-10-30 19:13:14 +08:00 |
|
yangapku
|
f076e2fa42
|
specify repetition penalty
|
2023-10-13 11:44:48 +08:00 |
|
yangapku
|
fc57dea277
|
release latest models
|
2023-09-25 10:41:59 +08:00 |
|
yangapku
|
b86a0f2c8a
|
update EVALUATION.md
|
2023-09-18 11:50:53 +08:00 |
|
feihu.hf
|
4864f7b278
|
fix format problems in evaluation code; update ceval extraction rules
|
2023-08-25 22:44:07 +08:00 |
|
Yang An
|
677180a653
|
Merge pull request #185 from Owen-Qin/fix_ceval
fix bug for ceval
|
2023-08-15 17:55:23 +08:00 |
|
qinxy3
|
543ffaf617
|
fix code
|
2023-08-15 11:03:24 +08:00 |
|
qinxy3
|
bff91b3305
|
fix bug for ceval
|
2023-08-14 14:47:12 +08:00 |
|
Haonan Li
|
e7072a49c0
|
add CMMLU evaluation results
|
2023-08-13 20:58:52 +04:00 |
|
兼欣
|
9139fbdf99
|
release the evaluation benchmark for tool use; update tool use results to that of the hf version
|
2023-08-08 17:45:41 +08:00 |
|
feihu.hf
|
680a3e8bb8
|
update EVALUATION.md
|
2023-08-04 10:54:19 +08:00 |
|
feihu.hf
|
1134e08be7
|
add evaluation code for Qwen-7B-Chat
|
2023-08-03 23:27:48 +08:00 |
|
JustinLin610
|
ba2d85a13b
|
first commit
|
2023-08-03 12:57:53 +08:00 |
|