Lighteval documentation

Contributing to Multilingual Evaluations

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.9.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Contributing to Multilingual Evaluations

Lighteval supports multilingual evaluations through a comprehensive system of translation literals and language-adapted templates.

Contributing Translation Literals

What Are Translation Literals?

We define 19 literals, basic keywords or punctuation signs used when creating evaluation prompts in an automatic manner, such as yes, no, because, etc.

These literals are essential for:

  • Consistent prompt formatting across languages
  • Automatic prompt generation for multilingual tasks
  • Proper localization of evaluation templates

How to Contribute Translations

We welcome translations in your language! To contribute:

  1. Open the translation literals file: translation_literals.py

  2. Edit the file to add or expand the literal for your language of interest

  3. Open a PR with your modifications

Translation Literals Structure

Language.ENGLISH: TranslationLiterals(
    language=Language.ENGLISH,
    question_word="question",  # Usage: "Question: How are you?"
    answer="answer",  # Usage: "Answer: I am fine"
    confirmation_word="right",  # Usage: "He is smart, right?"
    yes="yes",  # Usage: "Yes, he is"
    no="no",  # Usage: "No, he is not"
    also="also",  # Usage: "Also, she is smart."
    cause_word="because",  # Usage: "She is smart, because she is tall"
    effect_word="therefore",  # Usage: "He is tall therefore he is smart"
    or_word="or",  # Usage: "He is tall or small"
    true="true",  # Usage: "He is smart, true, false or neither?"
    false="false",  # Usage: "He is smart, true, false or neither?"
    neither="neither",  # Usage: "He is smart, true, false or neither?"
    # Punctuation and spacing: only adjust if your language uses something different than in English
    full_stop=".",
    comma=",",
    question_mark="?",
    exclamation_mark="!",
    word_space=" ",
    sentence_space=" ",
    colon=":",
    # The first characters of your alphabet used in enumerations, if different from English
    indices=["A", "B", "C", ...]
)

Contributing New Multilingual Tasks

Prerequisites

Before creating a new multilingual task, you should:

  1. Read the custom task guide: Adding a Custom Task
  2. Understand multilingual task structure: Review the multilingual tasks file
  3. Browse available templates: Check the templates directory

Key Concepts

Language-Adapted Templates

For multilingual evaluations, the prompt_function should be implemented using language-adapted templates. These templates handle:

  • Correct formatting for each language
  • Consistent usage of language-adjusted prompt anchors (e.g., Question/Answer)
  • Proper punctuation and spacing conventions

Template Types

Available template types include:

Formulation Types

Multiple Choice Formulation (MCF)

Used for standard multiple choice questions where the model selects from lettered options:

MCFFormulation()

Example output:

Question: What is the capital of France?
A. London
B. Paris
C. Berlin
D. Rome
Answer: | A/B/C/D
Classification Formulation (CF)

Used for classification tasks where the model generates the answer directly:

CFFormulation()

Example output:

Question: What is the capital of France?
Answer: | Paris
Hybrid Formulation

Used for tasks that present choices but expect the full answer text:

HybridFormulation()

Example output:

Question: What is the capital of France?
A. London
B. Paris
C. Berlin
D. Rome
Answer: | Paris

Creating Your Multilingual Task

Step 1: Create the Task File

Create a Python file following the custom task guide structure.

Step 2: Import Required Components

from lighteval.tasks.lighteval_task import LightevalTaskConfig
from lighteval.tasks.multilingual.language import Language
from lighteval.tasks.multilingual.formulations import MCFFormulation, CFFormulation, HybridFormulation
from lighteval.tasks.multilingual.templates import get_template_prompt_function
from lighteval.tasks.multilingual.metrics import get_metrics_for_formulation, loglikelihood_acc_metric
from lighteval.tasks.multilingual.normalization import LogProbTokenNorm, LogProbCharNorm

Step 3: Define Your Tasks

your_tasks = [
    LightevalTaskConfig(
        # Name of your evaluation
        name=f"evalname_{language.value}_{formulation.name.lower()}",
        # The evaluation is community contributed
        suite=["community"],
        # This will automatically get the correct metrics for your chosen formulation
        metric=get_metrics_for_formulation(
            formulation,
            [
                LogLikelihoodAccMetric(normalization=None),
                LogLikelihoodAccMetric(normalization=LogProbTokenNorm()),
                LogLikelihoodAccMetric(normalization=LogProbCharNorm()),
            ],
        ),
        # In this function, you choose which template to follow and for which language and formulation
        prompt_function=get_template_prompt_function(
            language=language,
            # Use the adapter to define the mapping between the
            # keys of the template (left), and the keys of your dataset
            # (right)
            # To know which template keys are required and available,
            # consult the appropriate adapter type and doc-string.
            adapter=lambda line: {
                "key": line["relevant_key"],
                # Add more mappings as needed
            },
            formulation=formulation,
        ),
        # You can also add specific filters to remove irrelevant samples
        hf_filter=lambda line: line["label"] in <condition>,
        # You then select your huggingface dataset as well as
        # the splits available for evaluation
        hf_repo=<dataset>,
        hf_subset=<subset>,
        evaluation_splits=["train"],
        hf_avail_splits=["train"],
    )
    for language in [
        Language.YOUR_LANGUAGE,  # Add your target languages
        # Language.SPANISH,
        # Language.FRENCH,
        # etc.
    ]
    for formulation in [MCFFormulation(), CFFormulation(), HybridFormulation()]
]

Step 4: Test Your Implementation

Follow the custom task guide to test if your task is correctly implemented.

All LightevalTaskConfig parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE’s functionality to make it easier to correctly fill these parameters.

Validation Checklist

  • Translation literals are accurate and complete
  • Task works correctly across all target languages
  • Metrics are appropriate for the task type
  • Documentation is clear and comprehensive
  • Code follows project conventions

Getting Help

  • GitHub Issues: Report bugs or ask questions
  • Discussions: Join community discussions
  • Documentation: Review existing guides and examples
< > Update on GitHub