Lighteval documentation

Lighteval

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.9.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Lighteval

πŸ€— Lighteval is your all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack up.

Key Features

πŸš€ Multi-Backend Support

Evaluate your models using the most popular and efficient inference backends:

πŸ“Š Comprehensive Evaluation

  • Extensive Task Library: 1000s pre-built evaluation tasks
  • Custom Task Creation: Build your own evaluation tasks
  • Flexible Metrics: Support for custom metrics and scoring
  • Detailed Analysis: Sample-by-sample results for deep insights

πŸ”§ Easy Customization

Customization at your fingertips: create new tasks, metrics or model tailored to your needs, or browse all our existing tasks and metrics.

☁️ Seamless Integration

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

Quick Start

Installation

pip install lighteval

Basic Usage

# Evaluate a model using Transformers backend
lighteval accelerate \
    "model_name=openai-community/gpt2" \
    "leaderboard|truthfulqa:mc|0"

Save Results

# Save locally
lighteval accelerate \
    "model_name=openai-community/gpt2" \
    "leaderboard|truthfulqa:mc|0" \
    --output-dir ./results

# Push to Hugging Face Hub
lighteval accelerate \
    "model_name=openai-community/gpt2" \
    "leaderboard|truthfulqa:mc|0" \
    --push-to-hub \
    --results-org your-username
< > Update on GitHub