🏆 CHARM Leaderboard 🏆

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations.

Accuracy on CHARM reasoning tasks

LLM Chinese Commonsense Domain Global Commonsense Domain
AJ TU SqU MMR SpU NLI RC Avg. AJ TU SqU MMR SpU NLI RC Avg.
GPT-3.5-0125 80.0 45 65 40 82.0 65 53.0 61.43 90.0 94 80 58 89.5 56 48.0 73.64
GPT-4o-240513 98.67 77 91 80 93.5 74 60.5 82.1 96.0 98 98 74 94.5 72 65.0 85.36
Gemini-1.5-flash 94.0 52 84 68 78.0 79 67.0 74.57 89.33 98 96 60 90.5 70 68.5 81.76
LLaMA-3-8B 78.67 29 58 38 65.5 58 47.0 53.45 83.33 85 84 68 78.5 63 39.5 71.62
LLaMA-3-70B 92.67 44 82 52 77.0 74 65.5 69.6 84.0 97 94 64 83.0 66 64.5 78.93
InternLM2-1.8B 46.67 31 33 28 53.0 59 26.5 39.6 37.33 51 43 22 52.5 65 23.5 42.05
InternLM2-7B 79.33 43 62 52 78.0 76 27.0 59.62 70.67 77 65 48 77.0 77 36.5 64.45
InternLM2-20B 90.67 51 58 46 75.0 76 26.0 60.38 82.67 82 74 30 77.5 75 27.0 64.02
Yi1.5-6B 88.0 35 64 48 75.5 70 39.0 59.93 81.33 71 74 60 75.0 59 42.0 66.05
Yi1.5-34B 96.0 49 87 80 85.5 79 44.5 74.43 86.67 91 89 54 88.0 73 49.5 75.88
Qwen1.5-1.8B 41.33 37 39 40 56.0 47 36.5 42.4 42.67 42 45 26 60.5 53 32.0 43.02
Qwen1.5-7B 82.0 32 58 56 76.0 66 43.0 59.0 74.0 74 74 36 71.5 66 40.0 62.21
Qwen1.5-14B 95.33 48 74 60 78.5 80 51.0 69.55 86.0 81 84 34 83.5 78 50.5 71.0
Qwen1.5-72B 96.67 51 91 78 87.5 86 66.0 79.45 93.33 91 95 52 91.0 76 72.5 81.55

We selected the empirically optimal prompt strategy: XLT for English LLMs and ZH-CoT for Chinese-oriented LLMs. The table above shows the accuracy of LLMs on CHARM reasoning tasks. For detailed experimental results, please refer to the paper.

🖊️ Citation

    @misc{sun2024benchmarking,
          title={Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations}, 
          author={Jiaxing Sun and Weiquan Huang and Jiang Wu and Chenya Gu and Wei Li and Songyang Zhang and Hang Yan and Conghui He},
          year={2024},
          eprint={2403.14112},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }