← Back to Workshop Page

Talk Abstracts

Jump to: Sebastian Ament (Meta) Anand Chandrasekaran (Schrödinger) Olivier Elemento (Weill Cornell Medicine) Jake Gardner (University of Pennsylvania) Stefano Martiniani (New York University) Saeed Najafi (Grove Biopharma) Matthew Tamasi & Matt Barker (Plexymer & P&G) Venkat Venkatasubramanian (Columbia University) Michael Webb (Princeton University)

Sebastian Ament

Anand Chandrasekaran

Schrödinger

Advancing formulation design with combined physics-based and machine learning approaches

Machine learning (ML) is transforming formulation design across pharmaceuticals, cosmetics, electronics, and consumer packaged goods by enabling data-driven predictions of critical properties such as solubility, viscosity, and stability, among others. Chemistry-informed AI and ML models provide a powerful framework for accelerating materials innovation, extending beyond active ingredients to complex multi-component formulations. The ability of machine learning to analyze large volumes of data and make predictions about novel formulations allows for the rapid exploration and screening of broad chemical spaces, significantly reducing reliance on traditional trial-and-error experimentation. Automated workflows and deep learning can integrate both chemical composition and molecular structure to build predictive models that optimize formulation properties with much greater speed and efficiency.

At the same time, Schrödinger’s physics-based modeling tools, including molecular dynamics (MD), density functional theory (DFT), and machine learning force fields, offer atomistic insights that connect molecular-level interactions with macroscopic properties. These tools help uncover causal relationships within hierarchical materials and multilayered systems such as OLED devices, and support the rational design of advanced formulations, composites, and device architectures.

Olivier Elemento

Weill Cornell Medicine

Towards an AI blueprint for designing effective drug cocktails

Designing effective and safe multi-drug formulations is a critical bottleneck in therapeutic development. To address this, we have developed a comprehensive AI strategy that predicts both synergistic efficacy and adverse drug-drug interactions (DDIs). For efficacy, our PAIRWISE model employs a multimodal deep learning architecture that integrates drug chemical structures, known targets, and tumor-specific transcriptomes to predict personalized drug synergy. PAIRWISE has been successfully benchmarked against other models and its predictions for Bruton's Tyrosine Kinase inhibitor (BTKi) combinations have been validated in vitro in lymphoma cell lines. To address formulation safety, we created DDI-GPT, a novel framework that predicts DDIs by enhancing a Large Language Model (LLM) with a biomedical Knowledge Graph (KG). This model transforms complex KG relationships into structured "sentence trees" that are fed into a fine-tuned BioGPT-2 architecture. DDI-GPT achieved a state-of-the-art AUROC of 0.964 on the TwoSIDES benchmark dataset. More importantly, it demonstrates superior zero-shot prediction capability, attaining a 0.84 AUROC on a recently curated dataset of FDA-reported DDIs, a 14% improvement over prior methods. An explainability module pinpointed CYP3A-mediated mechanisms for predicted acalabrutinib DDIs, showcasing its utility for mechanistic discovery. Together, these AI-driven tools provide a powerful, technically advanced approach to accelerate the rational design of safe and effective combination therapies.

Jake Gardner

University of Pennsylvania

Extracting knowledge priors from scientific texts for de novo molecular design

AI-driven design within chemistry and biochemistry has become one of the most exciting areas of growth for the field. This success is enabled in part by a myriad of repositories that provide data on the structure, function, and activity for proteins, small molecules, and other entities of interest. However, despite this wealth of curated data, the majority of our knowledge of chemistry, biology, and medicine remains “locked” in natural-language text found in publications, patents, and other articles. In this talk, I'll discuss a large-scale effort we are undertaking to exploit the natural-language understanding capabilities of large language models to extract this knowledge directly from the scientific literature. I'll show how we can, leveraging tens of millions of facts about molecules and proteins extracted from the literature, achieve state-of-the-art performance--both supervised and zero-shot--on an enormous variety of molecule property prediction tasks, with even small 15 million parameter models leveraging our large dataset as pretraining outperforming much larger, 2B+ parameter models released by Google. We'll also show how these predictors can be incorporated into de novo molecular Bayesian optimization design pipelines as constraints, moving us towards constraining molecular design using knowledge extracted directly from the literature.

Stefano Martiniani

New York University

Property-guided generative AI for molecules and materials

A core objective of the chemical sciences is to discover compounds and materials that address pressing societal needs. The design space of possible compositions and structures is combinatorially vast, making exhaustive experimental or computational screening intractable. The key challenge is to sample efficiently from the distribution of chemically valid structures under constraints on composition and target properties. Generative AI provides a principled solution by learning a transport map from a simple base distribution to this target distribution, enabling both the discovery of new molecules and materials and their conditional generation guided by desired properties. In this talk, I will introduce Open Materials Generation (OMatG) and PropMolFlow, state-of-the-art frameworks for inorganic crystals and small molecules, together with strategies for property guidance. I will conclude with a critical assessment of existing datasets and benchmarks and outline practical remedies we have developed.

Saeed Najafi

Grove Biopharma

Multi-objective AI-guided brush-like polymer–peptide design enables tunable peptide-based drug formulation optimization

Rapid advances in AI are reshaping peptide-based drug discovery by linking sequence and structure to developability traits such as solubility, aggregation, permeability, stability, and release kinetics. In this talk, I’ll give a concise, cross-disciplinary primer on the methods we use—protein language models and generative sequence design; structure-aware scoring (e.g., AlphaFold-based pipelines); physics-informed predictors; and multi-objective optimization with uncertainty and active-learning feedback—and outline where each class of model is most and least useful.

I’ll then illustrate these ideas with peptide therapeutics and polymer–peptide conjugates (PLPs): brush-like scaffolds that display multiple peptide arms with tunable properties. Our workflow predicts aggregation, cell permeability, and uptake directly from sequence, integrates structural and biophysical filters, and performs composition optimization to balance potency with formulation performance.

Case studies from the Alzheimer’s design space demonstrate how the approach generalizes across targets and modalities. Together, these tools provide a practical path to AI-guided multifunctional PLP design that is structurally robust, biologically active, and optimized for stability—all within a scalable and designable framework.

Matthew Tamasi & Matt Barker

Plexymer & Procter & Gamble

Accelerating innovation: The journey toward high-throughput methods & models for tailored water-soluble polymers

Co-author: Erica Locke (Procter & Gamble)

To remain competitive in a world characterized by rapidly changing consumer needs, regulatory requirements, and supply chain constraints, P&G requires faster methods for discovering new materials. Via partnership between Plexymer and P&G, our team is exploring opportunities across the innovation workflow to leverage high-throughput methods and advanced modeling techniques to accelerate polymer discovery. This poster will showcase the adoption of semi-automated high-throughput synthesis and characterization methods paired with sequential and explainable machine learning techniques to prepare, model, and learn from a model library of industrially relevant water-soluble polymer derivatives.

We will share insights from our journey and highlight how, using a limited library of fewer than 100 polymers, we built and validated explainable AI models across multiple layers of structure-property and process information. These models enabled us to deploy active machine learning strategies to target specific compositions with desired viscosities and thermogravimetric properties, as well as enhance model learning by targeting low-information polymer designs. As our models connected synthetic conditions to design information, targeted polymer designs could then be rapidly synthesized and characterized by this semi-automated high-throughput workflow. Additionally, application of explainable AI models to this system aided understanding of the key underlying structure-property relationships driving desired performance. This work lays the foundation for future efforts to rapidly develop water-soluble polymers tailored to excel in key areas of sustainability, performance, and cost.

Venkat Venkatasubramanian

Columbia University

Combining LLMs and first-principles knowledge for pharmaceutical applications

The surprising success of generative AI models in applications such as natural language processing and image synthesis has sparked considerable interest among researchers, particularly in pharmaceutical systems engineering (PSE). However, there is an essential difference between such applications and PSE. PSE is governed by fundamental laws of physics, chemistry, and biology, constitutive relations, and highly technical knowledge about materials, processes, and systems. While purely data-driven machine learning has its immediate uses, I believe the long-term success of AI in scientific and engineering domains would depend on effectively leveraging first principles and technical knowledge. ChatGPT's "hallucinations" may be amusing in certain applications, but they are potentially dangerous in highly technical domains, such as PSE. In this talk, I will discuss the challenges and opportunities that lie ahead

Michael Webb

Princeton University

Generative active learning of polymer additives for tuning solution rheology

Polymers are essential to a wide range of technologies, yet designing them with targeted structural and functional properties remains a grand challenge. For example, controlling solution flow behavior is critical to applications such as sprayable coatings and paints, printable inks, consumer formulations (e.g., shampoos), battery-electrode slurries, and injectable biomedical fluids. A major opportunity lies in applying machine learning to help navigate the vast combinatorial design space—spanning sequence, composition, architecture, morphology, processing, and more—to discover new formulations or replace existing ones with more sustainable alternatives. However, this complexity, combined with data scarcity and characterization challenges, limits the effectiveness of purely rational design and/or high-throughput screening. In this talk, I will describe some of our recent efforts to integrate molecular simulation, machine learning, and theory to map and navigate structure–function relationships in chemically and topologically diverse polymeric materials. A focal example will examine the of design complex polymer additives that tune material rheology, with a particular focus on shear-thinning fluids. This effort depends on the development of a computational framework coupling physics-guided generative machine learning, Bayesian optimization, and multiparticle collision dynamics simulation. I will illustrate the performance of the approach based on targeting thirty distinct rheological profiles of varying difficulty, with results conveying both strengths and limitations of our approach. I will further highlight how this data-driven effort enables new physical insights and how its limitations can be overcome with a little human thought. Ultimately, this work provides a foundation and integrated paradigm to pursue polymer solution formulations for a variety of tehcnological needs.