AI Papers Reader

Personalized digests of latest AI research

View on GitHub

CoTox: Unlocking Interpretable Drug Toxicity Predictions with AI

Drug toxicity remains a significant hurdle in pharmaceutical development, often leading to costly failures and potential patient harm. While traditional machine learning models have made strides in predicting these risks, their reliance on vast datasets and a lack of transparency have limited their utility. Now, a new AI framework called CoTox promises to revolutionize this field by leveraging the advanced reasoning capabilities of large language models (LLMs) to provide more interpretable and biologically grounded toxicity predictions.

Developed by researchers at Korea University, CoTox addresses key limitations of existing methods. Instead of solely relying on chemical structures represented by complex strings like SMILES, CoTox integrates this information with crucial biological context, including cellular pathways and Gene Ontology (GO) terms. This holistic approach allows the AI to “reason” through the potential mechanisms by which a drug might cause harm, mimicking a more human-like analytical process.

A key innovation in CoTox is the use of IUPAC names for chemical structures. Unlike SMILES strings, which can be challenging for LLMs to interpret, IUPAC names are more human-readable and linguistically aligned. This allows LLMs to better connect structural features to biological pathways and predict toxicity with greater accuracy. For example, the paper illustrates how for a compound like Etodolac, the IUPAC name clearly indicates an “indole-pyran fused ring” and a “carboxylic acid group,” features that directly inform reasoning about the drug’s lipophilicity and potential bioactivation. In contrast, the SMILES representation provides a less intuitive atomic connectivity.

The CoTox framework employs a “chain-of-thought” (CoT) prompting strategy, guiding the LLM to produce step-by-step explanations for its predictions. This means that instead of simply outputting a “toxic” or “non-toxic” label, the AI details which biological pathways are implicated, how specific gene ontology terms relate to toxicity, and how structural features identified in the IUPAC name contribute to these effects. This transparency is vital for drug developers, enabling them to understand the “why” behind a prediction and make more informed decisions.

In experiments using the UniTox dataset, CoTox, powered by advanced LLMs like GPT-4o and Gemini-2.5-Pro, demonstrated a significant improvement in predicting various organ-specific toxicities, including cardiotoxicity, liver toxicity, and hematological toxicity, compared to traditional AI models and LLMs relying solely on structural data. The framework’s ability to integrate biological context proved particularly impactful.

Furthermore, the study showcases CoTox’s adaptability through a case study involving the drug Entecavir. By analyzing gene expression changes induced by Entecavir in specific cell lines, CoTox was able to predict liver and lung toxicity accurately. Intriguingly, it also flagged a potential for renal toxicity, a signal that, while not yet in the official drug label, is supported by recent clinical observations, underscoring CoTox’s potential to identify latent safety concerns.

In essence, CoTox represents a significant step forward in drug safety assessment, offering a powerful and interpretable AI-driven solution that bridges the gap between complex chemical structures, biological mechanisms, and human understanding. This framework holds immense promise for accelerating drug development by enabling earlier and more reliable identification of potential toxicities.