Suppr超能文献

基于无监督机器学习的原始红外光谱图像识别:迈向类似化学家的化学结构分类及超越数值数据

Unsupervised Machine Learning-Based Image Recognition of Raw Infrared Spectra: Toward Chemist-like Chemical Structural Classification and Beyond Numerical Data.

作者信息

Fuku Kentarou, Yoshida Takefumi

机构信息

Faculty of Advanced Engineering, Tokyo University of Science, Tokyo 125-8585, Japan.

Cluster of Nanomaterials, Graduate School of Systems Engineering, Wakayama University,930 Sakaedani, Wakayama 640-8510, Japan.

出版信息

J Chem Inf Model. 2025 Apr 14;65(7):3176-3185. doi: 10.1021/acs.jcim.4c01644. Epub 2025 Mar 19.

Abstract

Recent advances in artificial intelligence have significantly improved spectral data analysis. In this study, we used unsupervised machine learning to classify chemical compounds based on infrared (IR) spectral images, without relying on prior chemical knowledge. The potential of machine learning for chemical classification was demonstrated by extracting IR spectral images from the Spectral Database for Organic Compounds and converting them into 208,620-dimensional vector data. Hierarchical clustering of 230 compounds revealed distinct main clusters (-), each with specific subclusters exhibiting higher intracluster similarities. Despite the challenges, including sensitivity to spectral deviations and difficulty of distinguishing delicate chemical structures in spectra with low transparency in the fingerprint area, the proposed image recognition approach exhibits good potential. Both principal component analysis and k-means clustering produced similar results. Furthermore, the method demonstrated high robustness to noise. The Tanimoto coefficient was used to evaluate the molecular similarity, providing valuable insights. However, some results deviated from chemists' intuitions. The study also highlighted that the scaling composition formulas and molecular weights did not affect the classification results because high-dimensional features dominated the process. A comparison of the clustering results obtained from molecular fingerprints, using the adjusted Rand index as a metric, indicated that the image data provided better classification performance than numerical data of the same resolution. Overall, this study demonstrates the feasibility of using machine learning with IR spectral image data for chemical classification and offers a novel perspective that complements traditional methods, although the classifications may not always align with chemists' intuitions. This approach has broader implications for fields such as drug discovery, materials science, and automated spectral analysis, where handling large, raw spectral data sets is essential.

摘要

人工智能的最新进展显著改善了光谱数据分析。在本研究中,我们使用无监督机器学习基于红外(IR)光谱图像对化合物进行分类,而无需依赖先验化学知识。通过从有机化合物光谱数据库中提取IR光谱图像并将其转换为208,620维向量数据,证明了机器学习在化学分类方面的潜力。对230种化合物进行层次聚类揭示了不同的主要聚类(-),每个聚类都有特定的子聚类,这些子聚类表现出更高的类内相似性。尽管存在挑战,包括对光谱偏差的敏感性以及在指纹区域透明度较低的光谱中区分精细化学结构的困难,但所提出的图像识别方法仍具有良好的潜力。主成分分析和k均值聚类都产生了相似的结果。此外,该方法对噪声表现出高鲁棒性。使用Tanimoto系数评估分子相似性,提供了有价值的见解。然而,一些结果与化学家的直觉不符。该研究还强调,缩放组成公式和分子量不会影响分类结果,因为高维特征主导了该过程。使用调整后的兰德指数作为度量标准,对从分子指纹获得的聚类结果进行比较,表明图像数据比相同分辨率的数值数据提供了更好的分类性能。总体而言,本研究证明了使用机器学习结合IR光谱图像数据进行化学分类的可行性,并提供了一个补充传统方法的新视角,尽管分类结果可能并不总是与化学家的直觉一致。这种方法对药物发现、材料科学和自动光谱分析等领域具有更广泛的意义,在这些领域中处理大型原始光谱数据集至关重要。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验