CompGen-MLIP: Compositional Generalisation for ML Interatomic Potentials

Benchmark with 4 tasks evaluating compositional generalization of ML interatomic potentials — whether models learn transferable chemistry vs. interpolating training patterns. Relevant to molecular dynamics-based drug design.

Composite

39.8

Experimental validation

Retrospective

Stages

Hit IDLead ID / ADMET

Modalities

small molecule

Task types

regression

Size

tasks: 4
molecules: unknown — compositional split evaluation

License

Other

First release

2026-05-09

Last updated

2026-05-09

Official site

→ project page

Leaderboard

→ leaderboard

Dataset

→ dataset

Code / GitHub

→ repository

HuggingFace

→ HF

Paper

Benchmarking Compositional Generalisation for Machine Learning Interatomic Potentials · Amir Masoud Nourollah, Irtaza Khalid, Stefano Leoni, Steven Schockaert · 2026 · paper · doi:N/A — preprint · 0 citations

Flags

none

Experts

—

Groups

—

Hosted by

—

Related benchmarks

MatBench, PDBbind, CASF-2016

Rubric (7-criterion)

rigor

coverage

maintenance

adoption

quality

accessibility

industry_relevance

Notes

Addresses important gap in MLIP evaluation — OOD generalization to unseen molecular compositions. Shows current models struggle (10x error on OOD). More computational chemistry than direct drug discovery, but relevant to free energy calculations and MD simulations used in drug design. Narrow scope (4 tasks only).

← Back to all benchmarks

Compare:

Open comparison →