Contributing to Multilingual Evaluations

Lighteval supports multilingual evaluations through a comprehensive system of translation literals and language-adapted templates.

Contributing Translation Literals

What Are Translation Literals?

We define 19 literals, basic keywords or punctuation signs used when creating evaluation prompts in an automatic manner, such as yes, no, because, etc.

These literals are essential for:

Consistent prompt formatting across languages
Automatic prompt generation for multilingual tasks
Proper localization of evaluation templates

How to Contribute Translations

We welcome translations in your language! To contribute:

Open the translation literals file: translation_literals.py
Edit the file to add or expand the literal for your language of interest
Open a PR with your modifications

Translation Literals Structure

Language.ENGLISH: TranslationLiterals(
    language=Language.ENGLISH,
    question_word="question",  # Usage: "Question: How are you?"
    answer="answer",  # Usage: "Answer: I am fine"
    confirmation_word="right",  # Usage: "He is smart, right?"
    yes="yes",  # Usage: "Yes, he is"
    no="no",  # Usage: "No, he is not"
    also="also",  # Usage: "Also, she is smart."
    cause_word="because",  # Usage: "She is smart, because she is tall"
    effect_word="therefore",  # Usage: "He is tall therefore he is smart"
    or_word="or",  # Usage: "He is tall or small"
    true="true",  # Usage: "He is smart, true, false or neither?"
    false="false",  # Usage: "He is smart, true, false or neither?"
    neither="neither",  # Usage: "He is smart, true, false or neither?"
    # Punctuation and spacing: only adjust if your language uses something different than in English
    full_stop=".",
    comma=",",
    question_mark="?",
    exclamation_mark="!",
    word_space=" ",
    sentence_space=" ",
    colon=":",
    # The first characters of your alphabet used in enumerations, if different from English
    indices=["A", "B", "C", ...]
)

Contributing New Multilingual Tasks

Prerequisites

Before creating a new multilingual task, you should:

Read the custom task guide: Adding a Custom Task
Understand multilingual task structure: Review the multilingual tasks file
Browse available templates: Check the templates directory

Key Concepts

Language-Adapted Templates

For multilingual evaluations, the prompt_function should be implemented using language-adapted templates. These templates handle:

Correct formatting for each language
Consistent usage of language-adjusted prompt anchors (e.g., Question/Answer)
Proper punctuation and spacing conventions

Template Types

Available template types include:

XNLI: Natural language inference tasks - get_nli_prompt_function
COPA: Causal reasoning tasks - get_copa_prompt_function
Multiple Choice: Standard multiple choice questions - get_mcq_prompt_function
Question Answering: Open-ended question answering - get_qa_prompt_function
Custom: Specialized task templates

Formulation Types

Multiple Choice Formulation (MCF)

Used for standard multiple choice questions where the model selects from lettered options:

MCFFormulation()

Example output:

Question: What is the capital of France?
A. London
B. Paris
C. Berlin
D. Rome
Answer: | A/B/C/D

Classification Formulation (CF)

Used for classification tasks where the model generates the answer directly:

CFFormulation()

Example output:

Question: What is the capital of France?
Answer: | Paris

Hybrid Formulation

Used for tasks that present choices but expect the full answer text:

HybridFormulation()

Example output:

Question: What is the capital of France?
A. London
B. Paris
C. Berlin
D. Rome
Answer: | Paris

Creating Your Multilingual Task

Step 1: Create the Task File

Create a Python file following the custom task guide structure.

Step 2: Import Required Components

from lighteval.tasks.lighteval_task import LightevalTaskConfig
from lighteval.tasks.multilingual.language import Language
from lighteval.tasks.multilingual.formulations import MCFFormulation, CFFormulation, HybridFormulation
from lighteval.tasks.multilingual.templates import get_template_prompt_function
from lighteval.tasks.multilingual.metrics import get_metrics_for_formulation, loglikelihood_acc_metric
from lighteval.tasks.multilingual.normalization import LogProbTokenNorm, LogProbCharNorm

Step 3: Define Your Tasks

your_tasks = [
    LightevalTaskConfig(
        # Name of your evaluation
        name=f"evalname_{language.value}_{formulation.name.lower()}",
        # The evaluation is community contributed
        suite=["community"],
        # This will automatically get the correct metrics for your chosen formulation
        metric=get_metrics_for_formulation(
            formulation,
            [
                LogLikelihoodAccMetric(normalization=None),
                LogLikelihoodAccMetric(normalization=LogProbTokenNorm()),
                LogLikelihoodAccMetric(normalization=LogProbCharNorm()),
            ],
        ),
        # In this function, you choose which template to follow and for which language and formulation
        prompt_function=get_template_prompt_function(
            language=language,
            # Use the adapter to define the mapping between the
            # keys of the template (left), and the keys of your dataset
            # (right)
            # To know which template keys are required and available,
            # consult the appropriate adapter type and doc-string.
            adapter=lambda line: {
                "key": line["relevant_key"],
                # Add more mappings as needed
            },
            formulation=formulation,
        ),
        # You can also add specific filters to remove irrelevant samples
        hf_filter=lambda line: line["label"] in <condition>,
        # You then select your huggingface dataset as well as
        # the splits available for evaluation
        hf_repo=<dataset>,
        hf_subset=<subset>,
        evaluation_splits=["train"],
        hf_avail_splits=["train"],
    )
    for language in [
        Language.YOUR_LANGUAGE,  # Add your target languages
        # Language.SPANISH,
        # Language.FRENCH,
        # etc.
    ]
    for formulation in [MCFFormulation(), CFFormulation(), HybridFormulation()]
]

Step 4: Test Your Implementation

Follow the custom task guide to test if your task is correctly implemented.

All LightevalTaskConfig parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE’s functionality to make it easier to correctly fill these parameters.

Validation Checklist

Translation literals are accurate and complete
Task works correctly across all target languages
Metrics are appropriate for the task type
Documentation is clear and comprehensive
Code follows project conventions

Getting Help

GitHub Issues: Report bugs or ask questions
Discussions: Join community discussions
Documentation: Review existing guides and examples

< > Update on GitHub