Bringing AI-driven protein design tools to biologists everywhere | MIT News

Artificial intelligence is already proving that it can accelerate drug development and improve our understanding of disease. But to turn AI into new medicine we need to get the latest, most powerful models into the hands of scientists.
The problem is that most scientists are not machine learning experts. Now the company OpenProtein.AI is helping scientists stay on the edge of AI with a code-free platform that gives them access to powerful basic models and a range of tools to design proteins, predict protein structure and function, and train models.
The company, founded by Tristan Bepler PhD ’20 and former MIT professor Tim Lu PhD ’07, already equips researchers at pharmaceutical and biotech companies of all sizes with its tools, including basic models developed in-house by protein engineers. OpenProtein.AI also offers its platform to academic scientists for free.
“It’s a really exciting time right now because these models can not only make protein engineering more efficient — which shortens the cycles of medical development and industrial use — they can also improve our ability to design new proteins with specific properties,” Bepler said. “We are also thinking about applying these methods to non-protein pathways. The big picture is that we are creating a language that describes biological processes.”
Improving biology with AI
Bepler came to MIT in 2014 as part of the Computational and Systems Biology PhD Program, studying under Bonnie Berger, MIT’s Simons Professor of Applied Mathematics. It was then that he realized how little we understand about the molecules that make up the building blocks of biology.
“We hadn’t seen biomolecules and proteins enough to build good predictive models of what, say, a gene region would do, or how a network of protein interactions would behave,” Bepler recalls. “It made me interested in understanding proteins at a finer level.”
Bepler began exploring ways to predict the amino acid chains that make up proteins by analyzing evolutionary data. This was before Google released AlphaFold, a powerful predictive model of protein structure. The work led to one of the first AI models to generate protein understanding and design – what the team calls a protein language model.
“I was very excited about the classical structure of proteins and the relationship between their sequence, their structure, and their function. We don’t understand those links very well,” Bepler said. “So how can we use these basic models to skip the ‘structure’ part and move from sequence to function?”
After receiving his PhD in 2020, Bepler joined Lu’s lab in MIT’s Department of Biological Engineering as a postdoc.
“This was right around the time when the idea of combining AI and biology was starting to emerge,” Lu recalls. “Tristan helped us build better computer models for biological design. We also saw that there was a disconnect between the most advanced tools available and biologists, who would like to use these things but don’t know how to code. OpenProtein came from the idea of expanding access to these tools.”
Bepler had worked on the AI front as part of his PhD. He knew that technology could help scientists speed up their work.
“We started with the idea of building a general-purpose platform for machine learning protein engineering,” Bepler said. “We wanted to build something that was easy to use because the ideas of machine learning are kind of esoteric. It needs implementation, GPUs, fine-tuning, designing sequencing libraries. Especially at that time, there was a lot for biologists to learn.”
The OpenProtein platform, in contrast, features a web interface for biologists to upload data and perform protein engineering with machine learning. It includes a variety of open source models, including PoET, OpenProtein’s flagship protein language model.
Poet, referred to as the Protein Evolutionary Transformer, trained groups of proteins to produce sets of related proteins. Bepler and his collaborators have shown that it can generalize about the evolutionary constraints on proteins and integrate new protein sequence information without retraining, allowing other researchers to add experimental data to improve the model.
“Researchers can use their data to train models and optimize protein sequences, and then use some of our tools to analyze those proteins,” Bepler said. “People are generating libraries of protein sequences in silico [on computers] and then use them with predictive models for structural validation and prediction. It’s essentially no-code-end, but we also have APIs that people want to access in code. “
The models help researchers design proteins quickly, and then decide which ones are promising enough for further testing in the lab. Researchers can clone proteins of interest, and models can generate new ones with similar properties.
Since its inception, the OpenProtein team has continued to add tools to their research environment regardless of their lab size or resources.
“We’ve tried really hard to make the platform an open toolbox,” Bepler said. “It has a specific function, but it’s not directly linked to the function of a single protein or class of proteins. One of the great things about these methods is that they’re very good at understanding proteins broadly. They’re learning about the universe of possible proteins.”
Enabling the next generation of therapy
The large pharmaceutical company Boehringer Ingelheim started using the OpenProtein platform in early 2025. Recently, the companies announced an expanded partnership that will see the OpenProtein platform and models embedded in the work of Boehringer Ingelheim as it develops proteins to treat diseases such as cancer and immune or inflammatory conditions.
Last year, OpenProtein also released a new version of the protein language model, PoET-2, which outperforms larger models while using a fraction of the computing resources and experimental data.
“We really want to solve the question of how we define proteins,” Bepler said. “What is the important, domain-specific language of protein constraints that we use as we manufacture them? How can we introduce other evolutionary barriers? How can we describe the enzymatic reaction that a protein performs such that the model can generate the sequence to perform that reaction? “
Moving forward, The inventors hope to make models that influence the changing, interconnected nature of the protein’s function.
“The area I’m excited about is going beyond protein binding events to use these models to predict and design dynamic properties, where a protein must use two, three, or four biological pathways at the same time, or change its function after binding,” said Lu, who currently works in an advisory role for the company.
As progress in AI races forward, OpenProtein continues to see its mission as providing scientists with the best tools to rapidly develop new treatments.
“As the work becomes more complex, with approaches that include things like protein logic and adaptive medicine, the available screening tools are limited,” Lu said. “It’s really important to create an open ecosystem around AI and biology. There’s a risk that AI resources can become so concentrated that the average researcher can’t use them. Open access is very important for the field of science to progress.”

