Lighteval documentation
Contributing to Multilingual Evaluations
Contributing to Multilingual Evaluations
Lighteval supports multilingual evaluations through a comprehensive system of translation literals and language-adapted templates.
Contributing Translation Literals
What Are Translation Literals?
We define 19 literals
, basic keywords or punctuation signs used when creating evaluation prompts in an automatic manner, such as yes
, no
, because
, etc.
These literals are essential for:
- Consistent prompt formatting across languages
- Automatic prompt generation for multilingual tasks
- Proper localization of evaluation templates
How to Contribute Translations
We welcome translations in your language! To contribute:
Open the translation literals file: translation_literals.py
Edit the file to add or expand the literal for your language of interest
Open a PR with your modifications
Translation Literals Structure
Language.ENGLISH: TranslationLiterals(
language=Language.ENGLISH,
question_word="question", # Usage: "Question: How are you?"
answer="answer", # Usage: "Answer: I am fine"
confirmation_word="right", # Usage: "He is smart, right?"
yes="yes", # Usage: "Yes, he is"
no="no", # Usage: "No, he is not"
also="also", # Usage: "Also, she is smart."
cause_word="because", # Usage: "She is smart, because she is tall"
effect_word="therefore", # Usage: "He is tall therefore he is smart"
or_word="or", # Usage: "He is tall or small"
true="true", # Usage: "He is smart, true, false or neither?"
false="false", # Usage: "He is smart, true, false or neither?"
neither="neither", # Usage: "He is smart, true, false or neither?"
# Punctuation and spacing: only adjust if your language uses something different than in English
full_stop=".",
comma=",",
question_mark="?",
exclamation_mark="!",
word_space=" ",
sentence_space=" ",
colon=":",
# The first characters of your alphabet used in enumerations, if different from English
indices=["A", "B", "C", ...]
)
Contributing New Multilingual Tasks
Prerequisites
Before creating a new multilingual task, you should:
- Read the custom task guide: Adding a Custom Task
- Understand multilingual task structure: Review the multilingual tasks file
- Browse available templates: Check the templates directory
Key Concepts
Language-Adapted Templates
For multilingual evaluations, the prompt_function
should be implemented using language-adapted templates. These templates handle:
- Correct formatting for each language
- Consistent usage of language-adjusted prompt anchors (e.g., Question/Answer)
- Proper punctuation and spacing conventions
Template Types
Available template types include:
- XNLI: Natural language inference tasks -
get_nli_prompt_function
- COPA: Causal reasoning tasks -
get_copa_prompt_function
- Multiple Choice: Standard multiple choice questions -
get_mcq_prompt_function
- Question Answering: Open-ended question answering -
get_qa_prompt_function
- Custom: Specialized task templates
Formulation Types
Multiple Choice Formulation (MCF)
Used for standard multiple choice questions where the model selects from lettered options:
MCFFormulation()
Example output:
Question: What is the capital of France?
A. London
B. Paris
C. Berlin
D. Rome
Answer: | A/B/C/D
Classification Formulation (CF)
Used for classification tasks where the model generates the answer directly:
CFFormulation()
Example output:
Question: What is the capital of France?
Answer: | Paris
Hybrid Formulation
Used for tasks that present choices but expect the full answer text:
HybridFormulation()
Example output:
Question: What is the capital of France?
A. London
B. Paris
C. Berlin
D. Rome
Answer: | Paris
Creating Your Multilingual Task
Step 1: Create the Task File
Create a Python file following the custom task guide structure.
Step 2: Import Required Components
from lighteval.tasks.lighteval_task import LightevalTaskConfig
from lighteval.tasks.multilingual.language import Language
from lighteval.tasks.multilingual.formulations import MCFFormulation, CFFormulation, HybridFormulation
from lighteval.tasks.multilingual.templates import get_template_prompt_function
from lighteval.tasks.multilingual.metrics import get_metrics_for_formulation, loglikelihood_acc_metric
from lighteval.tasks.multilingual.normalization import LogProbTokenNorm, LogProbCharNorm
Step 3: Define Your Tasks
your_tasks = [
LightevalTaskConfig(
# Name of your evaluation
name=f"evalname_{language.value}_{formulation.name.lower()}",
# The evaluation is community contributed
suite=["community"],
# This will automatically get the correct metrics for your chosen formulation
metric=get_metrics_for_formulation(
formulation,
[
LogLikelihoodAccMetric(normalization=None),
LogLikelihoodAccMetric(normalization=LogProbTokenNorm()),
LogLikelihoodAccMetric(normalization=LogProbCharNorm()),
],
),
# In this function, you choose which template to follow and for which language and formulation
prompt_function=get_template_prompt_function(
language=language,
# Use the adapter to define the mapping between the
# keys of the template (left), and the keys of your dataset
# (right)
# To know which template keys are required and available,
# consult the appropriate adapter type and doc-string.
adapter=lambda line: {
"key": line["relevant_key"],
# Add more mappings as needed
},
formulation=formulation,
),
# You can also add specific filters to remove irrelevant samples
hf_filter=lambda line: line["label"] in <condition>,
# You then select your huggingface dataset as well as
# the splits available for evaluation
hf_repo=<dataset>,
hf_subset=<subset>,
evaluation_splits=["train"],
hf_avail_splits=["train"],
)
for language in [
Language.YOUR_LANGUAGE, # Add your target languages
# Language.SPANISH,
# Language.FRENCH,
# etc.
]
for formulation in [MCFFormulation(), CFFormulation(), HybridFormulation()]
]
Step 4: Test Your Implementation
Follow the custom task guide to test if your task is correctly implemented.
All LightevalTaskConfig parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE’s functionality to make it easier to correctly fill these parameters.
Validation Checklist
- Translation literals are accurate and complete
- Task works correctly across all target languages
- Metrics are appropriate for the task type
- Documentation is clear and comprehensive
- Code follows project conventions
Getting Help
- GitHub Issues: Report bugs or ask questions
- Discussions: Join community discussions
- Documentation: Review existing guides and examples