The Arc Institute, in partnership with 10x Genomics and Ultima Genomics, is building something extraordinary: a Virtual Cell Atlas that could transform how we understand and treat disease. By combining cutting-edge biology, high-throughput technology, and artificial intelligence, they aim to create detailed digital models that simulate how cells behave, respond to treatment, and evolve over time.

Launched in February 2025 with data from over 300 million single cells, the Atlas has already surpassed 400 million cells—a staggering scale that echoes the ambition of the Human Genome Project. Rather than mapping DNA, this initiative is building a computational version of the cell, enabling scientists to ask “what if?” questions in silico before heading to the lab.

 Figure: Expansion of Arc Virtual Cell Atlas in early 2025.


🧬 Tahoe-100M: The Engine Behind Predictive Biology

At the heart of this effort is Tahoe-100M, a powerful dataset developed by Vevo Therapeutics. It includes data from 100 million individual cells, each exposed to one or more of over 1,100 different drug compounds, across 50 types of cancer cells (Zhang et al., 2025).

What sets Tahoe-100M apart is its level of detail. Instead of averaging responses across millions of cells, it captures how each individual cell reacts to a specific treatment. That’s especially important for diseases like cancer, where different cells in the same tumor can respond very differently.

This dataset expands the possibilities for drug discovery dramatically—offering 31 times more combinations of drug and cell type than previous efforts. This helps researchers build better models that reflect the real complexity of human biology, enabling faster and more targeted drug development.


🧠 The Role of AI: Making Sense of the Cellular Universe

Processing hundreds of millions of data points requires more than just powerful computers—it takes smart algorithms. Arc uses AI and machine learning to sift through the data, find patterns, and build models that can predict how cells will behave in different scenarios (Arc Institute, 2025).

They’ve even trained AI agents to search public databases for new single-cell data, clean it up, and add it to the Atlas. By bringing together multiple layers of biological information—like genes, proteins, and cell location—AI helps create a richer, more integrated understanding of how life works at the cellular level.


🔬 Observation vs. Intervention: Why Perturbation Matters

Much of biology has been built on observation—watching how cells behave in their natural state. But to truly understand how cells work, scientists need to intervene and see how things change. That’s where perturbation data comes in.

By exposing cells to drugs, mutations, or other experimental changes, researchers can track how their internal systems respond. This helps reveal cause-and-effect relationships, which are essential for building models that don’t just describe biology—they can predict and manipulate it.


🌍 From One Cell Type to Many: Expanding Biological Diversity

The Virtual Cell Atlas doesn’t just focus on one kind of cell. It includes data from 21 different species and 72 distinct tissues, making it one of the most diverse biological resources ever assembled. This wide scope ensures that the models it powers aren’t limited to a single condition or organism, but can support broad research—from cancer to aging to immune disorders.

Still, this diversity raises an important question: Whose cell is the “virtual cell”? The team acknowledges this and suggests future versions will include more human-specific models, including cells from genetically diverse populations and pluripotent stem cells that can transform into many different tissue types.


💊 New Possibilities for Drug Discovery

With a well-trained virtual cell, researchers can begin to simulate how new drugs might affect human cells before they’re ever made in a lab. That’s a game-changer.

This means:

  • Identifying which drugs work best for which patients
  • Spotting side effects early
  • Speeding up the discovery of new therapies
  • Reducing the need for animal and human testing

Patrick Hsu, Arc’s co-founder, describes it as building a “scientific co-pilot”—an AI system that can help scientists ask better questions and find faster answers.


📣 Advance Your Expertise with GMDP Academy’s Module 8

The cutting-edge technologies and data strategies driving the Virtual Cell Atlas are directly aligned with the competencies explored in Module 8: Digital Technology in Medicines Development at GMDP Academy.

This six-week, expert-led course helps medical and regulatory professionals:

  • Understand how digital tools are transforming medicines development
  • Explore the role of AI and data science in clinical and regulatory workflows
  • Examine current regulations around digital health technologies
  • Anticipate future trends in digital therapeutics and diagnostics

By completing Module 8, you’ll be better equipped to engage with the future of pharmaceutical development—one increasingly driven by the type of innovation highlighted in this article.

👉 Explore Module 8 and enroll today


References

Arc Institute. (2025, February). Arc Virtual Cell Atlas launches, combining data from over 300 million cells. https://arcinstitute.org/news/news/arc-virtual-cell-atlas-launch

Grinstein, J. D. (2025, April 28). Arc Institute teams with 10x and Ultima Genomics to evolve virtual cell atlas. Inside Precision Medicine. https://www.insideprecisionmedicine.com

Zhang, J., Ubas, A. A., de Borja, R., Svensson, V., et al. (2025). Tahoe-100M: A giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling. bioRxiv. https://doi.org/10.1101/2025.02.20.639398

Disclaimers

  • The material in these reviews is from various public open-access sources, meant for educational and informational purposes only
  • Any personal opinions expressed are those of only the author(s) and are not intended to represent the position of any organization(s)
  • No official support by any organization(s) has been provided or should be inferred