Stanford's holistic evaluation framework for language models. Here are 12 similar model benchmarks worth considering as HELM alternatives.
More Model Benchmarks · 中文版