4 Commits

Author SHA1 Message Date
Sean
2b565da220 Update evaluate_plugin.py
change Old Evaluation Dataset (Version 20230803) to new version
2024-02-17 17:27:49 +08:00
兼欣
7eb9016908 update agent benchmarks and add qwen-72b results 2023-12-06 12:57:11 +08:00
feihu.hf
4864f7b278 fix format problems in evaluation code; update ceval extraction rules 2023-08-25 22:44:07 +08:00
兼欣
9139fbdf99 release the evaluation benchmark for tool use; update tool use results to that of the hf version 2023-08-08 17:45:41 +08:00