DES Ph.D. Student Unveils Breakthrough Research at KDD 2026: Proposing ToxiMol, the World's First Molecular Toxicity Repair Benchmark

Published Time 2026/06/10

FIE student Lin Fei, who is a third-year doctoral student in Doctor of Philosophy in Intelligent Science and Systems at the Macau University of Science and Technology (MUST), has made a significant mark at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026). Serving as the first author, Lin presented the groundbreaking paper, "Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?". The research was selected for an exclusive Oral Presentation—an honor reserved for the top 20% of accepted papers. The study was directed by corresponding author Professor Fei-Yue Wang from MUST's Faculty of Innovation Engineering, and was conducted in collaboration with leading institutions including Shanghai Jiao Tong University, the Chinese Academy of Sciences (Institute of Automation and Institute of Process Engineering), the Shanghai Artificial Intelligence Laboratory, and Ningbo University.

▲ Ph.D. Student Lin Fei

ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) is globally recognized as a premier Tier-A conference by the China Computer Federation (CCF) and consistently ranks at the top of Google Scholar Metrics for data mining and artificial intelligence. Securing an Oral Presentation in the highly competitive "AI for Sciences" Track is a major milestone. Notably, this marks the first time a paper with MUST as the lead institution has been published at KDD, underscoring the university’s growing research prowess in the interdisciplinary field of AI for Science.

The research addresses a massive bottleneck in early-stage drug development: toxicity. Countless drug candidates fail due to poor ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profiles. Traditionally, avoiding toxicity requires senior medicinal chemists to conduct extensive, costly trial-and-error experiments, making large-scale automation nearly impossible. While Multimodal Large Language Models (MLLMs) have recently shown massive potential in complex reasoning, their ability to actively identify toxicity mechanisms and redesign molecular structures has never been systematically proven.

To bridge this gap, the team developed ToxiMol, the world's first benchmark task designed to test how well general MLLMs can "repair" toxic molecules. ToxiMol covers 11 primary toxicity repair tasks—such as LD50, DILI, and AMES—and features a dataset of 660 real-world toxic molecules with complex structures and diverse mechanisms. To rigorously measure the AI's success, the researchers also introduced ToxiEval, a multi-dimensional evaluation framework. ToxiEval operates on a strict "all-constraints-passed" rule: an AI-generated molecule is only considered a success if it simultaneously meets strict criteria for structural validity, safety, quantitative estimate of drug-likeness (QED), and synthetic accessibility (SAS). Together, ToxiMol and ToxiEval provide a vital, standardized infrastructure for future research in AI-driven molecular detoxification.

▲ The ToxiMol Molecular Toxicity Repair Task and the ToxiEval Multi-Criteria Evaluation Chain

This publication highlights MUST's continuous innovation and its competitive edge on the global AI for Science stage. Moving forward, MUST remains committed to supporting cutting-edge research, driving the integration of AI with healthcare and the life sciences, and contributing to global technological advancements for a smarter society.

Paper Links

（https://arxiv.org/abs/2506.10912）

GitHub Project: https://github.com/HydroSophy/ToxiMol

Dataset: https://huggingface.co/datasets/HydroSophyTech/ToxiMol-benchmark

Back