Data Preparation¶

Each benchmark requires specific data. Generation benchmarks (DPG Bench, GenEval, WISE) include their data in the repository. Understanding benchmarks follow the InternVL data preparation pipeline. All data is stored under data/ at the repository root.

Cloud users

If you are running evaluations via Modal, datasets are cached in persistent volumes. Use modal run modal/download.py --dataset <name> to download datasets to the cloud. You do not need to prepare data locally.

Understanding BenchmarksGeneration BenchmarksOther Benchmarks

Understanding benchmarks data is prepared following the InternVL evaluation data preparation guide.

MME¶

mkdir -p data/mme
cd data/mme
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/MME_Benchmark_release_version.zip
unzip MME_Benchmark_release_version.zip
cd -

MMMU¶

MMMU is auto-downloaded from HuggingFace (MMMU/MMMU) at evaluation time and cached in data/MMMU/. No manual download is needed.

MMBench¶

mkdir -p data/mmbench
cd data/mmbench
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_20230712.tsv
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_cn_20231003.tsv
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_dev_en_20231003.tsv
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_test_cn_20231003.tsv
wget https://download.openmmlab.com/mmclassification/datasets/mmbench/mmbench_test_en_20231003.tsv
cd -

MM-Vet¶

mkdir -p data/mm-vet
cd data/mm-vet
wget https://github.com/yuweihao/MM-Vet/releases/download/v1/mm-vet.zip
unzip mm-vet.zip
wget https://huggingface.co/OpenGVLab/InternVL/raw/main/llava-mm-vet.jsonl
cd -

MathVista¶

mkdir -p data/MathVista
cd data/MathVista
wget https://huggingface.co/datasets/AI4Math/MathVista/raw/main/annot_testmini.json
cd -

These benchmarks include their data in the repository. No additional download is needed.

DPG Bench¶

Prompts are stored in eval/generation/dpg_bench/prompts/ (100 prompt files). No download required.

GenEval¶

Metadata and prompts are included in model/geneval/. No download required.

WISE¶

Benchmark data is included in model/WISE/. No download required.

UEval¶

UEval data is auto-downloaded from HuggingFace (primerL/UEval-all) at evaluation time. No manual download is needed for local execution.

For Modal cloud execution:

modal run modal/download.py --dataset ueval

Uni-MMMU¶

Uni-MMMU data is hosted on HuggingFace (Vchitect/Uni-MMMU-Eval). Reference: Vchitect/Uni-MMMU.

For Modal cloud execution:

modal run modal/download.py --dataset uni_mmmu

See eval/generation/uni_mmmu/README.md for details.

GEdit-Bench¶

GEdit-Bench data is hosted on HuggingFace (stepfun-ai/GEdit-Bench). It is auto-downloaded at evaluation time if not pre-downloaded.

For Modal cloud execution:

modal run modal/download.py --dataset gedit

Scoring uses Qwen2.5-VL-72B-Instruct (same evaluator as WISE).