PromptSE Leverages AI Reasoning to Predict Drug Side Effects

PromptSE Leverages AI Reasoning to Predict Drug Side Effects

Ivan Kairatov is a distinguished figure in the biopharmaceutical industry, possessing a rare blend of deep technical expertise and hands-on experience in research and development. Throughout his career, he has stood at the crossroads of biology and innovation, witnessing firsthand how the industry struggles to navigate the complexities of drug safety. His insights into the nuances of pharmacological pathways and his understanding of how artificial intelligence can be harnessed to protect patients make him an essential voice in this field. Today, he joins us to discuss a breakthrough in side-effect prediction—PromptSE—and how shifting our focus from symptoms to biological mechanisms could redefine the future of healthcare.

The following discussion centers on the evolving landscape of drug-safety screening, specifically exploring why traditional computational models have historically fallen short due to poor data quality and a lack of mechanistic depth. We explore the technical architecture of PromptSE, a hybrid framework that utilizes large language models to “reason” through biological clues like metabolism and target selectivity. Furthermore, the conversation highlights the significance of mathematical precision in evaluating these models, the challenges posed by sparse and skewed datasets, and the potential for this technology to expand into drug repurposing and complex interaction modeling.

Adverse drug reactions currently rank as the fourth leading cause of mortality, yet traditional computational models still struggle to move beyond superficial symptom reports. Why has it been so difficult to bridge the gap between simple data tracking and truly understanding the biological “why” behind these reactions?

The reality on the ground is that for decades, we have been drowning in a sea of unstructured information. While we have beautiful, crystal-clear data on chemical structures, the actual human experience of a drug—the side effects—is often buried deep within messy clinical narratives and spontaneous reports that lack a cohesive structure. When you look at the statistics, where these reactions trail only behind massive killers like cardiovascular disease and cancer, it becomes clear that our current reliance on symptom-matching is a reactive rather than a proactive strategy. Traditional machine learning algorithms are essentially glorified pattern recognizers; they see that a symptom like “dizziness” is mentioned frequently and flag it, but they possess zero understanding of the metabolic pathway or the target selectivity that caused the blood pressure drop in the first place. This superficiality is what makes identifying risks in a laboratory setting so prohibitively expensive and time-consuming, leaving us in a position where we are often guessing until a drug is already in the hands of thousands of patients.

In the context of PromptSE, how does the transition from using AI as a basic text encoder to a “reasoning” framework fundamentally change the way we approach pharmacological screening?

The shift we are seeing with PromptSE is about moving from “what” to “how” by implementing a multi-stage prompting technique that forces the model to think like a pharmacologist. Instead of just absorbing text, the framework evaluates a drug across four specific pillars: its administration route, metabolism pathways, structural properties, and target selectivity. By guiding the AI to infer these mechanism-relevant explanations, we are essentially teaching it to understand that a side effect isn’t just a label in a database, but a biological consequence of a specific chemical interaction. This level of biological plausibility is what was missing from older models that would often overlook rare but critical mechanisms simply because they weren’t frequently mentioned in the training set. It feels much more like a collaborative brainstorming session with a biological expert than just running a script, which adds a layer of interpretability that is vital for trust in clinical settings.

The study utilized a dataset of 1,020 drugs and 5,599 side effects, but noted that known positive associations were incredibly rare. How does the architecture of PromptSE, specifically tools like BioBERT and the Hierarchical Graph Convolutional Network, handle such a skewed and sparse data landscape?

When you are dealing with a dataset where only 2.34% of possible drug-side effect pairs are known positive associations, you are effectively looking for a needle in a haystack. This is where the integration of BioBERT becomes a game-changer; it takes the descriptive, mechanistic profiles generated by the LLM and converts that human-readable text into precise mathematical vectors that a deep learning module can actually digest. For those “rare” drugs or side effects where we have very little documentation, we utilize the Hierarchical Graph Convolutional Network, or HiGCN, which allows these low-frequency entities to essentially “borrow” contextual clues from more common, well-documented medications. It creates a neighborhood of relationships, so if a new drug shares a target selectivity or a metabolic pathway with a known entity, the model can make an educated inference rather than failing due to a lack of direct data. This prevents the model’s accuracy from degrading when it encounters something it hasn’t seen a thousand times before.

Looking at the performance metrics, PromptSE achieved an AUPR of 0.6551, outperforming traditional baselines significantly. What do these numbers, along with the Kolmogorov-Smirnov test results, tell us about the model’s ability to distinguish real biological relationships from mere linguistic coincidences?

The numbers tell a story of significant refinement; achieving an AUPR of 0.6551 means PromptSE outperformed the strongest non-drug-informed baseline by a substantial 9.26%. When we went a step further with PromptSE+ by incorporating multi-modal drug information, we saw another 1.81% jump to an AUPR of 0.6878, which is a massive win in a field where every percentage point represents potential lives saved. But the most telling statistic for me is the Kolmogorov-Smirnov (KS) score of 0.3939 compared to the meager 0.0195 achieved by basic textual descriptions. This score measures how well the model separates related from unrelated side-effect pairs, and the fact that the LLM-derived representations were so much higher proves that the AI is actually grouping side effects by their pharmacological drivers. It isn’t just seeing the word “headache” and “migraine” and putting them together; it’s understanding the underlying vascular or neurological triggers that link them.

While this framework was tested on side effects, there is a clear potential for it to be applied to drug-drug interactions or even drug repurposing. How close are we to seeing this integrated into the standard pipeline of drug discovery?

We are standing on the threshold of a very exciting era, but we must tread carefully; while the paradigm of guided reasoning is powerful, we still need rigorous validation using external datasets and curated pharmacological knowledge bases. The potential to use this framework for drug-drug interactions is the logical next step because those interactions are essentially side effects born from the collision of two different metabolic pathways. I can also see a future where we use this “biological reasoning” to discover new therapeutic uses for existing medications—what we call drug repurposing—by identifying beneficial off-target effects that were previously ignored. However, to truly integrate this into the standard pipeline, we need to ensure that the AI’s “biological grounding” is strengthened by published evidence and not just internal logic. It’s about moving from a successful research paper to a tool that a lab technician can rely on with 100% confidence.

What is your forecast for the role of large language models in the broader pharmacological landscape over the next decade?

I believe that over the next ten years, LLMs will transition from being simple assistants to becoming the foundational “reasoning core” of all pharmacological screening. We will likely move away from isolated models toward a more holistic, multi-modal ecosystem where an AI can simultaneously analyze a drug’s molecular structure, its impact on the transcriptome, and the vast, messy world of clinical narratives all at once. I expect we will see a dramatic reduction in early-stage clinical trial failures because we will be able to “pre-reason” the potential adverse effects with a degree of accuracy that currently feels like science fiction. Ultimately, this isn’t just about faster computing; it’s about a fundamental shift in how we perceive the relationship between chemical compounds and human biology, turning the “black box” of drug reactions into a transparent, predictable roadmap for patient safety.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later