Home › Tools › Developer Tools › chinese llm benchmark

Listed on SEOGANT Developer Tools

chinese llm benchmark

ReLE评测：中文AI大模型能力评测（持续更新）：目前已囊括359个大模型，覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型，以及step3.5-flash、kimi-k2.5、ernie4.5、MiniMax-M2.5、deepseek-v3.2、Qwen3.5、llama4、智谱GLM-5、GLM-4.7、LongC

Score

Get deal

266 views

0 reviews

Listed Mar 2026

Overview

Pricing

Reviews (0)

Alternatives

Q&A

Free

Listed on SEOGANT

+12%

MoM Growth

Active Users

Churn Rate

8:24

EXPERT REVIEW

Expert Video Review by SEOGANT · March 2026

Distribution Score: 84/100 What is this? ⓘ

SEO & Organic Traffic

Affiliate Program

Product-Market Fit

Community & Social

Retention / Churn

What is chinese llm benchmark?

Chinese LLM Benchmark (ReLE评测) is a comprehensive, continuously updated evaluation platform for Chinese-language large language models, currently tracking performance across 359+ models including international models (GPT, Claude, Gemini) evaluated on Chinese tasks and domestic Chinese models (DeepSeek, Qwen, Kimi, Doubao, Baidu ERNIE).

The benchmark covers capabilities critical for Chinese language AI applications including classical Chinese comprehension, character recognition, idiom usage, legal and medical domain knowledge, and cultural reasoning tasks that English-centric benchmarks do not address.

The evaluation methodology spans multiple capability dimensions: language understanding (reading comprehension, information extraction, semantic similarity), language generation (summarization, translation, creative writing), logical reasoning (mathematical problem-solving, commonsense inference), professional knowledge (law, medicine, finance, education), and safety alignment (toxicity detection, bias evaluation, instruction following).

Results are presented with statistical significance indicators and methodology documentation to enable reproducible comparison across model versions and providers.

The benchmark is maintained by the Chinese AI research community with regular updates as new model versions are released, providing a living leaderboard that reflects the current state of Chinese-language AI capability.

It serves as a primary reference for Chinese enterprises evaluating which models to deploy for customer-facing applications, for researchers studying multilingual model capabilities, and for the Chinese AI developer community tracking progress relative to international frontier models.

The evaluation data and scoring scripts are publicly available for independent verification of results.

Who is chinese llm benchmark for?

→Chinese AI practitioners and researchers who need up-to-date benchmark comparisons across 350+ Chinese and international LLMs

→Product teams evaluating which LLM to deploy for Chinese-language applications who need objective, regularly updated performance data

→Researchers studying Chinese language model capabilities who want a comprehensive, community-maintained evaluation leaderboard

→International teams assessing Chinese AI models (DeepSeek, Qwen, ChatGLM) alongside Western models on standardized benchmarks

Learn this stack in Academy

Get implementation playbooks for tools like chinese llm benchmark in guided Academy lessons. Start free, then unlock the full library with Learner.

Open Academy →

Pricing & Access

Free Monthly

Visit chinese llm benchmark →

Pricing details on provider page.

Comments (0)

User Reviews

★ 0.0 · 0 reviews

Alternatives to

Supabase CMS

Coding & Dev Tools · Score 80/100

View →

SiteSignal

Coding & Dev Tools · Score 49/100

View →

AI Video API.ai

Coding & Dev Tools · Score 80/100

View →

Frequently Asked Questions

What is the Chinese LLM Benchmark?

The Chinese LLM Benchmark (ReLE评测) is a continuously updated evaluation leaderboard covering 359+ AI language models — including ChatGPT, GPT-5.2, o4-mini, Google Gemini, Claude, and major Chinese models — assessed on Chinese-language tasks and general benchmarks.

What Chinese models are covered?

The benchmark covers DeepSeek, Qwen (Alibaba), GLM (Zhipu), Baichuan, Yi, Ernie (Baidu), MiniMax, Kimi, Hunyuan, and many other Chinese AI models alongside international models.

What tasks are evaluated?

Evaluations cover Chinese language understanding, reasoning, coding, math, knowledge QA, instruction following, and other tasks — with a focus on capabilities most relevant to Chinese-language applications.

How frequently is the leaderboard updated?

The project is described as continuously updated (持续更新). New models are added as they are released, making it one of the most current Chinese AI model evaluation resources.

Is the benchmark open source?

Yes — the benchmark data and leaderboard are publicly available on GitHub. Evaluation methodology is documented and the community can contribute new model results.

chinese llm benchmark

Distribution Score: 84/100 What is this? ⓘ

What is chinese llm benchmark?

Who is chinese llm benchmark for?

Learn this stack in Academy

Pricing & Access

Comments (0)

Alternatives to

Frequently Asked Questions

Product Details

Founder