Anthropic and OpenAI Expand Their AI Battle Into the World of Scientific Discovery
Anthropic launched Claude Science, a research workbench connecting over 60 scientific databases, while OpenAI released GeneBench-Pro, a benchmark testing AI performance on complex biology tasks — both on the same day.
On the same day, two of the biggest names in artificial intelligence — Anthropic and OpenAI — simultaneously pushed their rivalry into a new domain: scientific research. Anthropic unveiled Claude Science, a dedicated workbench built for researchers, while OpenAI introduced GeneBench-Pro, a benchmark designed to evaluate how well AI models handle computational biology tasks.
The coordinated yet competing releases signal a broader shift in the AI race — one that now extends well beyond chatbots and code generation into the actual work happening inside laboratories. While Anthropic chose to ship a practical tool scientists can use immediately, OpenAI opted to define and measure how far the technology still needs to go.
**Inside Anthropic's Claude Science**
Claude Science consolidates the tools and resources researchers rely on — databases, code execution environments, and computational infrastructure — into a single unified application. It integrates more than 60 scientific databases spanning genomics, proteomics, and cheminformatics, making it one of the more comprehensive research environments available in AI form.
Importantly, Claude Science is an application layer, not a new underlying model. Anthropic's most advanced models — Fable 5 and Mythos 5 — remain subject to US export restrictions and are not part of this release. A key feature of the platform is full auditability: every result can be traced directly back to the code that generated it, addressing a major concern around scientific reproducibility.
The launch builds on a life sciences initiative Anthropic began back in October 2025. Early beta users included Jérôme Lecoq from the Allen Institute, who reported using the tool to compress literature reviews that previously required up to two years of work. To further support scientific adoption, Anthropic announced plans to fund up to 50 research projects, offering each team as much as $30,000 in computing credits.
**What OpenAI's GeneBench-Pro Measures**
Shortly after Anthropic's announcement, OpenAI released GeneBench-Pro — a benchmark containing 129 research-grade problems spanning genomics, quantitative biology, and translational medicine. Unlike standard AI tests, GeneBench-Pro is designed to assess whether models can navigate ambiguous biological data, select appropriate analytical paths, and exercise the kind of judgment real research demands.
OpenAI's current flagship model, GPT-5.6 Sol, solved 28.7% of the problems at its standard reasoning setting, climbing to 31.5% in Pro mode. By comparison, GPT-5 scored below 5% on the original GeneBench, while Anthropic's Opus 4.8 reached 16% on the harder GeneBench-Pro version. OpenAI noted that each problem in the benchmark would take a human expert between 20 and 40 hours to complete, at a cost of thousands of dollars — whereas its model performs the same analysis for just a few dollars.
**Two Approaches, One Destination**
The dueling releases reflect two distinct but complementary strategies for winning scientific credibility. Anthropic is focused on immediate utility — giving researchers a working environment today. OpenAI is drawing a map of where the technology currently stands and how far it still needs to travel.
Both launches arrive at a moment of intensifying geopolitical pressure. Chinese AI models are increasingly competitive in research contexts, and US export controls have already prompted Anthropic to evaluate alternative host countries for deploying its most capable systems.
Notably, OpenAI's own benchmark results serve as a sobering self-assessment: its best model still fails the majority of GeneBench-Pro tasks, suggesting the technology's scientific potential remains largely unrealized.
**Experts Weigh In**
Biomedical gerontologist Aubrey de Grey, President and Chief Science Officer of the Longevity Escape Velocity Foundation, acknowledged the transformative potential while urging caution about timelines. Speaking on a BeInCrypto podcast, he noted that AI will soon eliminate bottlenecks in drug development specifically — though he stressed that translating faster research into approved treatments still depends heavily on regulatory frameworks and public appetite for risk.
Dr. Derya Unutmaz, a Professor of Immunology, offered a more direct take on the same panel. With 35 years in his field, he said he now trusts AI over his own instincts, and predicted that failure to integrate AI into clinical practice will soon constitute medical malpractice.
Whether researchers broadly adopt these tools — and whether GeneBench-Pro scores begin rising meaningfully — will be the real test of these ambitions in the months ahead.


