ScaleBench: Molecular Property Prediction

Comprehensive benchmark assessing whether larger pre-trained models outperform compact models across 26 ADME, toxicity, safety, and bioactivity endpoints with 78 endpoint-split combinations.

Composite
69.5
Experimental validation
Retrospective
Stages
Lead ID / ADMETHit ID
Modalities
small-molecule
Task types
classificationregression
Size
endpoints: 26
endpoint_split_entries: 78
model_families: 4
License
Other
First release
2026-05-04
Last updated
2026-05-15
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling · Unknown — biorxiv preprint · 2026 · paper · doi:10.64898/2026.04.29.721568 · 0 citations
Flags
none
Experts
Groups
Hosted by
Related benchmarks
TDC ADMET Group, BOOM (Benchmarking Out-Of-Distribution Molecular Predictions), MoleculeNet

Rubric (7-criterion)

rigor
4
coverage
4
maintenance
4
adoption
2
quality
4
accessibility
3
industry_relevance
4

Notes

Timely study showing compact specialized models remain competitive vs. large foundation models for molecular property prediction. Key finding: performance depends on model-task-validation fit, not scale alone. Very early in adoption.

← Back to all benchmarks

Compare:
Open comparison →