Guest Column | July 14, 2023

Here's A Realistic Outlook For AI In Biomanufacturing

By Camille Bilodeau, Ph.D., University of Virginia

Pharmacogenomics In Drug Discovery-GettyImages-976286862

The promise of applying artificial intelligence to solve engineering design problems has always been compelling. What if, instead of testing hundreds of drug candidates, a computer could tell us which candidate is best? What if, instead of exploring thousands of process design parameters, a model could propose an optimal design process? While these ideas may have seemed farfetched only a decade ago, recent advances in AI algorithm development along with dramatic increases in the availability of digital data have brought them within reach. This leads us to the question: is AI-guided biomanufacturing just around the corner? What kinds of advances are realistically achievable using artificial intelligence?

Before we think about answering these questions, it is useful to establish a common vocabulary. In the media, words like artificial intelligence and deep learning have been used to refer to all kinds of algorithms and methodologies. Let’s begin by formalizing a few definitions:

Artificial intelligence (AI) is defined as any algorithm that automates intellectual tasks normally performed by humans. Using this definition, artificial intelligence is a broad field that encompasses a wide range of algorithms, including everything from the basic operations of a calculator to ChatGPT. In this way, artificial intelligence as a field has been around for a long time and could accurately be used to describe algorithms and machines dating back to the “analytical engines” of the 1800s.

Machine learning (ML) is a subfield of artificial intelligence defined as any algorithm that automates intellectual tasks without being provided explicit rules for carrying out that task. Instead of using human-defined rules, machine learning models approximate rules by analyzing the relationships between datapoints. This is particularly powerful because it means that to use machine learning, we don’t need to have a complete understanding of the property we are trying to predict, all we need is enough datapoints to teach a model about this property by example. Machine learning methods include techniques like clustering, regression, principal component analysis (PCA), and other statistical methods.

Deep learning (DL) is an even narrower subfield of machine learning that uses multiple successive representations of an input to learn to perform a task, giving it the ability to learn more complex nonlinear relationships. While deep learning methods have existed since the 1970s, they have historically been less popular than shallow machine learning methods for two reasons: 1) they are more computationally expensive to train and 2) they require larger training data sets. In recent years, however, a range of algorithmic and hardware advances, most notably advances in GPU computing, have dramatically reduced the computational cost and therefore accessibility of training deep models. Additionally, increases in the degree of data digitization across fields have made it easier to access larger amounts of data quickly. As a result, deep learning has become increasingly common over the last decade, powering famous models including ChatGPT and AlphaFold.

What Kinds Of Problems Can Artificial Intelligence Solve?

While artificial intelligence and deep learning models are powerful tools, it is important to apply them in the right contexts in order to get the most out of them. Specifically, artificial intelligence and deep learning models are most successful when they are applied to problems for which there is a measurable objective function that we are interested in maximizing or minimizing. Because of this, it is useful to ask how we can recast biomanufacturing problems as measurable, multi-objective optimization problems.

To help guide this process, we think about breaking biomanufacturing applications into four broad categories of optimization problems.

Product Design

Designing any biological product, ranging from mAbs to mRNA, is an inherently multi-objective optimization problem. While the primary objectives may be to develop a product that is safe and efficacious, maximizing developability and manufacturability can be considered as secondary objectives that also constrain the design process. In this way, decisions regarding product design can be cast as measurable optimization problems, allowing for the development and integration of AI models into the product design process.

Process Design

When developing a new process for a given product, engineers often need to consider and test hundreds or even thousands of process parameters. In these cases, the primary objectives are typically to maximize purity, yield, and process robustness while minimizing complexity or cost. Beyond these primary objectives, it may also be valuable to identify “meta-objectives” that govern the overall process design strategy, such as minimizing the number of experiments or the amount of material required to qualify or validate a process. Because each of these objectives is well-defined and measurable, they are good candidates for machine learning.

Process Controls

Controlling a process is one of the most straightforward examples of a machine learning problem in biomanufacturing because it has a single clear objective: to keep the manufacturing process from going out of spec. Process control problems also have the convenient property that as the process runs for a longer time, we have more data to train the model with. One of the major challenges with process controls, however, is the fact that it is a higher risk application: if a control model fails, the result could mean large financial losses or even harm to patients. Therefore, process control problems are distinct from product and process design problems in that they require the use of more robust and transparent deep learning models.

Process Analytics

Process analytical technologies (PAT) are tools that are used alongside the manufacturing process to monitor process behaviors and make process decisions in real time. The opportunity for AI in PAT is to help interpret complex analytical data to make a process decision. While this may not seem like a well-defined, measurable goal, we can formulate it for modeling such that our goal is to predict some un-observable behavior of the process (e.g., fouling of a chromatography column) given some observable signal (e.g., online UV signal).

Obstacles Facing Artificial Intelligence In Biomanufacturing

Artificial intelligence and deep learning clearly have the potential to solve major problems in biomanufacturing. So, what’s stopping pharmaceutical companies from using artificial intelligence models at every stage of the drug development pipeline?

The biggest obstacle facing artificial intelligence in the pharmaceutical industry is the lack of data required to train deep learning models. While most pharmaceutical companies have begun the process of digitizing their internal data sets, a lot of historical data continues to be stored in non-digital formats. Data that has been recorded digitally is often “unstructured,” that is, data that has no consistent underlying organization. This makes it unreliable and labor-intensive to use for model training. Finally, even data that are digitally stored and structured are often not independent and well distributed. Machine learning models make predictions by learning from examples, so if most of these examples are related in some way (e.g., you have measurements of the same molecule under many different conditions), it becomes difficult to make predictions about less related examples (e.g., another molecule). In this way, a company may have a large amount of data, but this data may not be well-suited for model building.

One alternative to leveraging digitized historical data for model building is to use high-throughput experiments or simulations to build the data set. High-throughput methods have the advantage that the data generated will almost always be structured, digitally stored, and independent/well distributed. The main disadvantage of high-throughput methods is that they often don’t capture the exact property of interest but rather measure an alternative surrogate endpoint. For example, in process design applications, most high-throughput experiments involve using a scaled-down model system whose measurements might not directly reflect the behavior of the real manufacturing system.

One promising strategy for obtaining the best of both worlds is to develop a model that can leverage both historical data and high-throughput data as inputs. This can be accomplished using a class of techniques called transfer learning, which aims to apply knowledge from one modeling task to a second task. Using transfer learning, a model can first be trained using large amounts of high-throughput data and then knowledge from this task can be transferred to the task of predicting system behavior from historical data.

A second promising strategy for increasing the quantity of high-quality data available for model building is to develop large, centralized data sets that can be shared across the pharmaceutical industry. Historically, this has proven impractical because most pharmaceutical companies face intellectual property barriers to sharing their internal data sets. One promising strategy for sidestepping these intellectual property barriers is to leverage a technique called federated learning. Federated learning allows models to be trained across multiple devices or locations without the need for centralized data storage. This is powerful because it can enable the collaboration of multiple companies while maintaining data privacy and security, partially alleviating the challenge of obtaining large and centralized data sets.

Summary And Outlook

Overall, the integration of AI into biomanufacturing has the potential to enhance efficiency, productivity, and innovation in the biopharmaceutical industry, leading to improved product quality, reduced costs, and faster development cycles. While identifying and curating data sets remains a challenge, increasing quantities of digitized data, improvement in high-throughput methods, advances in deep learning techniques, and increases in cross-industry cooperativity all have the potential to alleviate the existing barriers to model development.

About the Author:

Camille Bilodeau is an assistant professor in the Chemical Engineering Department at the University of Virginia. She received her B.S. and M.S. from Northwestern University and her Ph.D. from Rensselaer Polytechnic Institute, both in chemical and biological engineering. During her Ph.D., she received the Lawrence Livermore Advanced Simulations and Computation Graduate Fellowship, through which she carried out research at Lawrence Livermore National Laboratory. Her research explores the intersection between artificial intelligence and molecular simulations with the goal of designing new molecules and materials. Reach her by email at or follow her on Twitter at @clbilodeau.