The Invisible Work of Computer-Assisted Drug Design

by Corin Wagen · Aug 28, 2025

Workers breaking stones with hammers.

The Stone Breakers, Gustave Courbet (1849)

Scientists who work in computer-assisted drug discovery (CADD) must be comfortable with a vast variety of skills. In modern drug-design organizations, CADD scientists are responsible for a large and ever-growing list of responsibilities:

The diversity of software tools required for state-of-the-art computational drug design means that scientists often spend a surprising fraction of their time away from actual drug design. Jim Snyder, a "world-class modeler and scientist" with "high scientific success in academia and industry" (per Ash Jogalekar), wrote a fascinating overview of the state of computer-assisted drug design in the 1980s. Here's what he wrote about this particular topic (emphasis added):

On the invisible side of the ledger—about 30-50% of the group’s time—is the effort that permits the CADD group to maintain state-of-the-art status. In the current late 1980s-early 1990s environment, major software packages often incorporating new methodology are generally purchased from commercial vendors. These are now generally second or third generation, sophisticated and expensive ($50,000–150,000). Still, no commercial house can anticipate all the needs of a given applications’ environment. It remains necessary to treat problems specific to a given research project and to locally extend known methodology. This means that new capabilities delivered in advanced versions of commercial software need careful evaluation.

Although Snyder was writing about the 1980s, his observations are no less true today. Commercial software solutions must be evaluated, benchmarked, and tested on internal data—a process which is slow and time-consuming. The problem is even worse for academic code, whose authors often have little experience with industry use cases or conventional software practices.

The rise of machine learning has made the work of benchmarking and internal validation even more important, particularly as public benchmarks become contaminated by data leakage and overfitting. A recent study benchmarking DiffDock by Ajay Jain, Ann Cleves, and Pat Walters discusses the time and effort that the CADD community collectively spends benchmarking new methods (emphasis added):

Publication of studies such as the DiffDock report are not cost-free to the CADD field. Magical sounding claims generate interest and take time for groups to investigate and debunk. Many groups must independently test and understand the validity of such claims. This is because most groups, certainly those focused primarily on developing new drugs, do not have the time to publish extensive rebuttals such as this. Therefore their effort in validation/debunking is replicated many fold. The waste of time and effort is substantial, and the process of drug discovery is difficult enough without additional unnecessary challenges.

Even when authors report high-quality benchmarks and clearly disclose when a method will and won't work, considerable work remains before a given method can be integrated into production CADD usage. Most scientific tasks require more than a single computation or model-inference step, necessitating integration into a larger software ecosystem. (I wrote about this in the context of ML-powered workflows previously.) Building this state-of-the-art software infrastructure can still be challenging, as Snyder describes (emphasis added):

No single piece of software is ordinarily sufficient to address a routine but multistep modeling task. For example, conformation generation, optimization, and least-squares fitting can involve three separate computer programs. The XYZ coordinate output from the first is the input for the second; output from the latter is input for the third. With an evolving library of 40-50 active codes, the task of assuring comprehensive and smooth coordinate interconversion is a demanding and ongoing one.

Most scientific software tools don't make integration easy. Modern packaging and code-deployment processes are rarely followed in science, forcing the CADD practitioner to go through the painful and time-consuming task of manually creating a minimal environment capable of running a given model or algorithm.

For methods requiring specialized hardware like GPUs, things become still more complex—and some modern methods, like protein–ligand co-folding, require external resources like a MSA server which must be provisioned, creating additional opportunities for failure. Solving all these issues requires CADD scientists to essentially become "ML DevOps" experts, a skillset which most do not naturally have.

Building tools to run calculations is only half the problem. To be impactful, CADD scientists must also integrate their predictions into the experimental design–make–test–analyze cycle, which necessitates communicating results with medicinal chemists. Many large pharmaceutical companies have invested in building some sort of internal graphical platform to simplify communication and allow scientists across the organization to run and view calculations, but these platforms are often costly to maintain and accumulate technical debt quickly. (We've talked to a lot of teams that had a fantastic internal platform for running calculations until the maintainer switched roles and left the platform to die a slow and ignominious death.)

At Rowan, we're working to build a CADD platform that addresses all these issues. Our goal is to help scientists stop worrying about software issues and free them up to focus on their science, helping to cut down on the amount of invisible work that goes into CADD and letting our users do what they're good at. Here's what we do:

Building a top-tier CADD team used to mean spending millions on software licenses and developers to build a bespoke internal platform; with Rowan, we're building this platform for all our customers. If you'd like to be one of them, make an account or reach out to our team!

Banner background image

What to Read Next

Studying Scaling in Electron-Affinity Predictions

Studying Scaling in Electron-Affinity Predictions

Testing low-cost computational methods to see if they get the expected scaling effects right.
Sep 10, 2025 · Corin Wagen
Open-Source Projects We Wish Existed

Open-Source Projects We Wish Existed

The lacunæ we've identified in computational chemistry and suggestions for future work.
Sep 9, 2025 · Corin Wagen, Jonathon Vandezande, Ari Wagen, and Eli Mann
How to Make a Great Open-Source Scientific Project

How to Make a Great Open-Source Scientific Project

Guidelines for building great open-source scientific-software projects.
Sep 9, 2025 · Jonathon Vandezande
ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

the promise & peril of solubility prediction; our approach and models; pH-dependent solubility; testing NNPs for redox potentials; benchmarking opt. methods + NNPs; an FSM case study; intern farewell
Sep 5, 2025 · Eli Mann, Corin Wagen, and Ari Wagen
Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Prediction of aqueous solubility for unseen organic molecules remains an outstanding and important challenge in computational drug design.
Sep 5, 2025 · Elias L. Mann, Corin C. Wagen
What Isaiah and Sawyer Learned This Summer

What Isaiah and Sawyer Learned This Summer

Reflections from our other two interns on their time at Rowan and what they learned.
Sep 5, 2025 · Isaiah Sippel and Sawyer VanZanten
Benchmarking OMol25-Trained Models on Experimental Reduction-Potential and Electron-Affinity Data

Benchmarking OMol25-Trained Models on Experimental Reduction-Potential and Electron-Affinity Data

We evaluate the ability of neural network potentials (NNPs) trained on OMol25 to predict experimental reduction-potential and electron-affinity values for a variety of main-group and organometallic species.
Sep 4, 2025 · Sawyer VanZanten, Corin C. Wagen
Which Optimizer Should You Use With NNPs?

Which Optimizer Should You Use With NNPs?

The results of optimizing 25 drug-like molecules with each combination of four optimizers (Sella, geomeTRIC, and ASE's implementations of FIRE and L-BFGS) and four NNPs (OrbMol, OMol25's eSEN Conserving Small, AIMNet2, and Egret-1) & GFN2-xTB.
Sep 4, 2025 · Ari Wagen and Corin Wagen
Double-Ended TS Search and the Invisible Work of Computer-Assisted Drug Design

Double-Ended TS Search and the Invisible Work of Computer-Assisted Drug Design

finding transition states; the freezing-string method; using Rowan to find cool transition states; discussing drug design
Sep 3, 2025 · Jonathon Vandezande, Ari Wagen, Spencer Schneider, and Corin Wagen
The Invisible Work of Computer-Assisted Drug Design

The Invisible Work of Computer-Assisted Drug Design

Everything that happens before the actual designing of drugs, and how Rowan tries to help.
Aug 28, 2025 · Corin Wagen