ScaleBench: Molecular Property Prediction
Comprehensive benchmark assessing whether larger pre-trained models outperform compact models across 26 ADME, toxicity, safety, and bioactivity endpoints with 78 endpoint-split combinations.
Composite
69.5
Experimental validation
Retrospective
Stages
Lead ID / ADMETHit ID
Modalities
small-molecule
Task types
classificationregression
Size
endpoints: 26
endpoint_split_entries: 78
model_families: 4
endpoint_split_entries: 78
model_families: 4
License
Other
First release
2026-05-04
Last updated
2026-05-15
Official site
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling · Unknown — biorxiv preprint · 2026 · paper · doi:10.64898/2026.04.29.721568 · 0 citations
Flags
none
Experts
—
Groups
—
Hosted by
—
Related benchmarks
Rubric (7-criterion)
rigor
4
coverage
4
maintenance
4
adoption
2
quality
4
accessibility
3
industry_relevance
4
Notes
Timely study showing compact specialized models remain competitive vs. large foundation models for molecular property prediction. Key finding: performance depends on model-task-validation fit, not scale alone. Very early in adoption.