AssayBench

Assay-level virtual cell benchmark for LLMs and agents. 1,920 CRISPR screens from BioGRID ORCS. Models predict gene rankings modulating phenotypes from free-text experimental descriptions. Tests LLM ability to perform in silico phenotypic screens.

Composite

63.3

Experimental validation

None

Stages

Target IDHit ID

Modalities

textgenetic_perturbationsingle_cell

Task types

phenotypic_screeninggene_rankingllm_evaluation

Size

entries: 1
CRISPR_screens: 920

License

Unknown

First release

2026-05

Last updated

2026-05

Official site

→ project page

Leaderboard

→ leaderboard

Dataset

→ dataset

Code / GitHub

→ repository

HuggingFace

→ HF

Paper

AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents · · 2026 · paper · 0 citations

Flags

virtual_cellllm_benchmarkphenotypic

Experts

—

Groups

—

Hosted by

—

Related benchmarks

Virtual Cell Benchmark Suite 2026, CZ Virtual Cell Challenge

Rubric (7-criterion)

rigor

coverage

maintenance

adoption

quality

accessibility

industry_relevance

Notes

Novel framing: gene rank prediction from natural language experiment descriptions. Part of broader virtual cell revolution. Tests whether LLMs can replace actual CRISPR screens for hypothesis generation.

← Back to all benchmarks

Compare:

Open comparison →