AssayBench
Assay-level virtual cell benchmark for LLMs and agents. 1,920 CRISPR screens from BioGRID ORCS. Models predict gene rankings modulating phenotypes from free-text experimental descriptions. Tests LLM ability to perform in silico phenotypic screens.
Composite
63.3
Experimental validation
None
Stages
Target IDHit ID
Modalities
textgenetic_perturbationsingle_cell
Task types
phenotypic_screeninggene_rankingllm_evaluation
Size
entries: 1
CRISPR_screens: 920
CRISPR_screens: 920
License
Unknown
First release
2026-05
Last updated
2026-05
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents · · 2026 · paper · 0 citations
Flags
virtual_cellllm_benchmarkphenotypic
Experts
—
Groups
—
Hosted by
—
Related benchmarks
Rubric (7-criterion)
rigor
4
coverage
3
maintenance
2
adoption
2
quality
4
accessibility
3
industry_relevance
4
Notes
Novel framing: gene rank prediction from natural language experiment descriptions. Part of broader virtual cell revolution. Tests whether LLMs can replace actual CRISPR screens for hypothesis generation.