CompGen-MLIP: Compositional Generalisation for ML Interatomic Potentials

Benchmark with 4 tasks evaluating compositional generalization of ML interatomic potentials — whether models learn transferable chemistry vs. interpolating training patterns. Relevant to molecular dynamics-based drug design.

Composite
39.8
Experimental validation
Retrospective
Stages
Hit IDLead ID / ADMET
Modalities
small molecule
Task types
regression
Size
tasks: 4
molecules: unknown — compositional split evaluation
License
Other
First release
2026-05-09
Last updated
2026-05-09
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Benchmarking Compositional Generalisation for Machine Learning Interatomic Potentials · Amir Masoud Nourollah, Irtaza Khalid, Stefano Leoni, Steven Schockaert · 2026 · paper · doi:N/A — preprint · 0 citations
Flags
none
Experts
Groups
Hosted by
Related benchmarks
MatBench, PDBbind, CASF-2016

Rubric (7-criterion)

rigor
4
coverage
2
maintenance
3
adoption
1
quality
3
accessibility
2
industry_relevance
2

Notes

Addresses important gap in MLIP evaluation — OOD generalization to unseen molecular compositions. Shows current models struggle (10x error on OOD). More computational chemistry than direct drug discovery, but relevant to free energy calculations and MD simulations used in drug design. Narrow scope (4 tasks only).

← Back to all benchmarks

Compare:
Open comparison →