ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、MiniMax-M2.5、deepseek-v3.2、Qwen3.5、llama4、智谱GLM-5、GLM-4.7、LongC
Expert Video Review by SEOGANT · March 2026
Chinese LLM Benchmark (ReLE评测) is a comprehensive, continuously updated evaluation platform for Chinese-language large language models, currently tracking performance across 359+ models including international models (GPT, Claude, Gemini) evaluated on Chinese tasks and domestic Chinese models (DeepSeek, Qwen, Kimi, Doubao, Baidu ERNIE).
The benchmark covers capabilities critical for Chinese language AI applications including classical Chinese comprehension, character recognition, idiom usage, legal and medical domain knowledge, and cultural reasoning tasks that English-centric benchmarks do not address.
The evaluation methodology spans multiple capability dimensions: language understanding (reading comprehension, information extraction, semantic similarity), language generation (summarization, translation, creative writing), logical reasoning (mathematical problem-solving, commonsense inference), professional knowledge (law, medicine, finance, education), and safety alignment (toxicity detection, bias evaluation, instruction following).
Results are presented with statistical significance indicators and methodology documentation to enable reproducible comparison across model versions and providers.
The benchmark is maintained by the Chinese AI research community with regular updates as new model versions are released, providing a living leaderboard that reflects the current state of Chinese-language AI capability.
It serves as a primary reference for Chinese enterprises evaluating which models to deploy for customer-facing applications, for researchers studying multilingual model capabilities, and for the Chinese AI developer community tracking progress relative to international frontier models.
The evaluation data and scoring scripts are publicly available for independent verification of results.
Get implementation playbooks for tools like chinese llm benchmark in guided Academy lessons. Start free, then unlock the full library with Learner.
Open Academy →Pricing details on provider page.
Comments (0)
Sign in to join the discussion.