OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

Haote Yang1*, Xingjian Wei1*, Jiang Wu1*†, Noémi Ligeti-Nagy2*, Jiaxing Sun3*, Yinfan Wang4*, Győző Zijian Yang2*, Junyuan Gao5*, Jingchao Wang6, Bowen Jiang7, Shasha Wang1, Nanjun Yu6, Zihao Zhang8, Shixin Hong9, Hongwei Liu1, Wei Li1, Songyang Zhang1, Dahua Lin1,10, Lijun Wu1, Gábor Prószéky2, Conghui He1‡
1 Shanghai Artificial Intelligence Laboratory, 2 Hungarian Research Centre for Linguistics, 3 Wuhan University, 4 Shanghai University, 5 University of Chinese Academy of Sciences, 6 East China Normal University, 7 Peking University, 8 Harbin Institute of Technology, 9 Tsinghua University, 10 Chinese University of Hong Kong
* Equal Contribution Project lead Corresponding Author
Correspondence: heconghui@pjlab.org.cn

Overview of OpenHuEval.

Comparison of related benchmarks.

Benchmark Real User Query Self-awareness Evaluation Proverb Reasoning Generative Task & LLM-as-Judge Hungarian Lang Comprehensive Hu-specific
WildBench
SimpleQA, ChineseSimpleQA
MAPS
MARC, MMMLU et al.
BenchMAX
MILQA
HuLU
OpenHuEval (ours)
OpenHuEval

We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics. OpenHuEval is constructed from a vast collection of Hungarian-specific materials sourced from multiple origins.In the construction, we incorporated the latest design principles for evaluating LLMs, such as using real user queries from the internet, emphasizing the assessment of LLMs' generative capabilities, and employing LLM-as-judge to enhance the multidimensionality and accuracy of evaluations. Consequently, OpenHuEval provides the comprehensive, in-depth, and scientifically accurate assessment of LLM performance in the context of the Hungarian language and its specifics. We evaluated current mainstream LLMs, including both traditional LLMs and recently developed Large Reasoning Models. The results demonstrate the significant necessity for evaluation and model optimization tailored to the Hungarian language and specifics. We also established the framework for analyzing the thinking processes of LRMs with OpenHuEval, revealing intrinsic patterns and mechanisms of these models in non-English languages, with Hungarian serving as a representative example.

Construction OpenHuEval
Image 1
Construction of HuWildBench
Image 2
Construction of HuProverbRea
Image 3
Construction of HuSimpleQA
Examples of OpenHuEval
Examples of OpenHuEval
Examples of OpenHuEval. The original text is in black, while the translation into English is in blue. In HuMatchingFIB and HuStandardFIB, the blank is highlighted.
Data statistics of Openhueval

The Hungarian-Specific Dimensions (HuSpecificDim).

HuSpecificDim Definition #Question
Language(𝓛) Basic knowledge of the Hungarian language and Hungarian proverbs and sayings 1333
History(𝓗) Historical events and historical development of Hungary 324
Life, Culture, and Custom(𝓛𝓒𝓒) Religion, rituals, culture, holidays, and the daily life of Hungarians 622
Education and Profession(𝓔𝓟) Education system in Hungary and related professions 282
Geography and Place(𝓖𝓟) Geographical knowledge of Hungary, cities, and locations 165
Figure(𝓕) Famous figures of Hungary 409
Politics, Policy, and Law(𝓟𝓟𝓛) Politics, policies, and laws of Hungary 485
Business and Finance(𝓑𝓕) Business and finance in Hungary 333

Tasks of OpenHuEval

Task HuSpecificDim Judge Question Type #Question
HuWildBench 𝓛𝓒𝓒, 𝓔𝓟, 𝓟𝓟𝓛, 𝓑𝓕 llm, checklist OE 1154
HuSimpleQA 𝓛, 𝓗, 𝓛𝓒𝓒, 𝓔𝓟, 𝓖𝓟, 𝓕, 𝓟𝓟𝓛, 𝓑𝓕 llm OE 1293
HuProverbRea 𝓛 rule, llm 2CQ/OE 1135
HuMatchingFIB 𝓛, 𝓗 rule Matching Filling-in-Blank 278
HuStandardFIB 𝓛, 𝓗 rule, similarity matching Standard Filling-in-Blank 93
🖊️ Citation
@misc{yang2025openhuevalevaluatinglargelanguage,
    title={OpenHuEval: Evaluating Large Language Model on Hungarian Specifics}, 
    author={Haote Yang and Xingjian Wei and Jiang Wu and Noémi Ligeti-Nagy and Jiaxing Sun and Yinfan Wang and Zijian Győző Yang and Junyuan Gao and Jingchao Wang and Bowen Jiang and Shasha Wang and Nanjun Yu and Zihao Zhang and Shixin Hong and Hongwei Liu and Wei Li and Songyang Zhang and Dahua Lin and Lijun Wu and Gábor Prószéky and Conghui He},
    year={2025},
    eprint={2503.21500},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2503.21500}, 
}