AssayBench

Assay-level virtual cell benchmark for LLMs and agents. 1,920 CRISPR screens from BioGRID ORCS. Models predict gene rankings modulating phenotypes from free-text experimental descriptions. Tests LLM ability to perform in silico phenotypic screens.

Composite
63.3
Experimental validation
None
Stages
Target IDHit ID
Modalities
textgenetic_perturbationsingle_cell
Task types
phenotypic_screeninggene_rankingllm_evaluation
Size
entries: 1
CRISPR_screens: 920
License
Unknown
First release
2026-05
Last updated
2026-05
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents · · 2026 · paper · 0 citations
Flags
virtual_cellllm_benchmarkphenotypic
Experts
Groups
Hosted by
Related benchmarks
Virtual Cell Benchmark Suite 2026, CZ Virtual Cell Challenge

Rubric (7-criterion)

rigor
4
coverage
3
maintenance
2
adoption
2
quality
4
accessibility
3
industry_relevance
4

Notes

Novel framing: gene rank prediction from natural language experiment descriptions. Part of broader virtual cell revolution. Tests whether LLMs can replace actual CRISPR screens for hypothesis generation.

← Back to all benchmarks

Compare:
Open comparison →