HELM Alternatives

Stanford's holistic evaluation framework for language models. Here are 12 similar model benchmarks worth considering as HELM alternatives.

LMArena — Blind-vote LLM battle arena behind the community leaderboard (alternatives)
OpenCompass — Model Benchmark (alternatives)
SuperCLUE — Model Benchmark (alternatives)
C-Eval — Model Benchmark (alternatives)
CMMLU — Model Benchmark (alternatives)
FlagEval — Model Benchmark (alternatives)
AGI-Eval — Model Benchmark (alternatives)
MMLU — Model Benchmark (alternatives)
Open LLM Leaderboard — Model Benchmark (alternatives)
MMBench — Model Benchmark (alternatives)
MagicArena — Model Benchmark (alternatives)
LLMEval3 — Model Benchmark (alternatives)