湖南大学创新分子表征框架,解决AI辅助分子发现领域挑战
近日,湖南大学研究团队在AI辅助分子发现领域取得重要突破,提出了一种基于片段的多尺度分子表征框架——t-SMILES。该框架解决了在分子建模中,尤其是使用AI技术进行分子表征时所面临的重要挑战。
在现有的分子建模中,分子描述符KX广泛应用于分子建模,但在AI辅助分子发现领域,缺乏自然适用、完整且原始的分子表征是一个重大挑战,直接影响AI模型的性能和可解释性。湖南大学团队针对此问题进行了深入研究,并提出了全新的解决方案。
t-SMILES框架旨在解决自然语言处理在化学领域应用中的两大问题:什么是“化学词”,以及如何将它们编码为“化学句子”。该框架使用SMILES类型的字符串描述分子,并提出了三种代码算法:TSSA、TSDY和TSID。实验结果显示,t-SMILES模型能够生成理论有效性高达百分之百且高度新颖的分子,相较于基于传统SMILES的模型具有显著优势。
更重要的是,湖南大学团队的这一突破还在于他们的模型能够避免过拟合问题,并且在保持合理相似性的同时获得更高的新颖性分数。即使在低资源数据集上,无论是原始模型还是经过数据增强或预训练微调后的模型都能展现其卓越性能。
该研究以“t-SMILES:基于片段的分子表征框架用于从头设计配体”为题,为AI在化学领域的进一步应用提供了新的思路和方法。这一突破将极大地推动AI在药物研发等领域的应用和发展。
英语如下:
News Title: Innovation of Hunan University’s Molecular Characterization Framework: t-SMILES Aids in the De novo Design of New Molecules
Keywords: Hunan University, Molecular Characterization, AI Assistance
News Content:
Hunan University’s Innovation in Molecular Characterization Framework Addresses Challenges in AI-Aided Molecular Discovery
Recently, the research team at Hunan University has made significant breakthroughs in the field of AI-assisted molecular discovery, proposing a fragment-based multi-scale molecular characterization framework called t-SMILES. This framework addresses critical challenges encountered in molecular modeling, especially when using AI technology for molecular characterization.
In existing molecular modeling, molecular descriptor KX is widely used in molecular modeling, but a natural, complete, and raw molecular characterization in the field of AI-assisted molecular discovery remains a significant challenge, directly affecting the performance and interpretability of AI models. The team at Hunan University has conducted thorough research on this issue and proposed a brand-new solution.
The t-SMILES framework aims to address two major issues in the application of natural language processing in the chemical field: what are “chemical words” and how to encode them into “chemical sentences.” This framework uses SMILES-type string representations to describe molecules and proposes three code algorithms: TSSA, TSDY, and TSID. Experimental results show that the t-SMILES model can generate molecules with theoretical validity of up to 100% and high novelty compared to models based on traditional SMILES.
Moreover, the breakthrough of the Hunan University team lies in their model’s ability to avoid overfitting issues and achieve higher novelty scores while maintaining reasonable similarity. Even on low-resource datasets, both the original model and the model after data augmentation or pre-training fine-tuning can demonstrate excellent performance.
With the topic of “t-SMILES: A Fragment-Based Molecular Characterization Framework for De Novo Ligand Design,” this research provides new ideas and methods for further applications of AI in the chemical field. This breakthrough will greatly promote the application and development of AI in areas such as drug discovery and development.
【来源】https://www.jiqizhixin.com/articles/2024-07-05-4
Views: 3