Using LiteLLM as Backend

Lighteval allows you to use LiteLLM as a backend, enabling you to call all LLM APIs using the OpenAI format. LiteLLM supports various providers including Bedrock, Hugging Face, Vertex AI, Together AI, Azure, OpenAI, Groq, and many others.

Documentation for available APIs and compatible endpoints can be found here.

Basic Usage

lighteval endpoint litellm \
    "provider=openai,model_name=gpt-3.5-turbo" \
    "lighteval|gsm8k|0"

Using a Configuration File

LiteLLM allows generation with any OpenAI-compatible endpoint. For example, you can evaluate a model running on a local VLLM server.

To do so, you will need to use a configuration file like this:

model_parameters:
    model_name: "openai/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
    base_url: "URL_OF_THE_ENDPOINT_YOU_WANT_TO_USE"
    api_key: "" # Remove or keep empty as needed
    generation_parameters:
      temperature: 0.5
      max_new_tokens: 256
      stop_tokens: [""]
      top_p: 0.9
      seed: 0
      repetition_penalty: 1.0
      frequency_penalty: 0.0

Supported Providers

LiteLLM supports a wide range of LLM providers:

Cloud Providers

all cloud providers can be found in the litellm documentation.

Local/On-Premise

VLLM: Local VLLM servers
Hugging Face: Local Hugging Face models
Custom endpoints: Any OpenAI-compatible API

Using with Local Models

VLLM Server

To use with a local VLLM server:

Start your VLLM server:

vllm serve HuggingFaceH4/zephyr-7b-beta --host 0.0.0.0 --port 8000

Configure LiteLLM to use the local server:

model_parameters:
    provider: "openai"
    model_name: "HuggingFaceH4/zephyr-7b-beta"
    base_url: "http://localhost:8000/v1"
    api_key: ""

For more detailed error handling and debugging, refer to the LiteLLM documentation.

< > Update on GitHub