Suppr超能文献

利用机器学习对来自经合组织清单的 3486 种全氟和多氟烷基物质(PFASs)进行生物活性分类。

Using Machine Learning to Classify Bioactivity for 3486 Per- and Polyfluoroalkyl Substances (PFASs) from the OECD List.

机构信息

Department of Civil and Environmental Engineering , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.

Secondary Appointment, Department of Environmental and Occupational Health, Graduate School of Public Health , University of Pittsburgh , Pittsburgh , Pennsylvania 15261 , United States.

出版信息

Environ Sci Technol. 2019 Dec 3;53(23):13970-13980. doi: 10.1021/acs.est.9b04833. Epub 2019 Nov 19.

Abstract

A recent OECD report estimated that more than 4000 per- and polyfluorinated alkyl substances (PFASs) have been produced and used in a broad range of industrial and consumer applications. However, little is known about the potential hazards (e.g., bioactivity, bioaccumulation, and toxicity) of most PFASs. Here, we built machine-learning-based quantitative structure-activity relationship (QSAR) models to predict the bioactivity of those PFASs. By examining a number of available molecular data sets, we constructed the first PFAS-specific database that contains the bioactivity information on 1012 PFASs for 26 bioassays. On the basis of the collected PFAS data set, we trained 5 different machine learning models that cover a variety of conventional models (e.g., random forest and multitask neural network (MNN)) and advanced graph-based models (e.g., graph convolutional network). Those models were evaluated based on the validation data set. Both MNN and graph-based models demonstrated the best performance. The average of the best area-under-the-curve score for each bioassay is 0.916. For predictions on the OECD list, most of the biologically active PFASs have perfluoroalkyl chain lengths less than 12 and are categorized into fluorotelomer-related compounds and perfluoroalkyl acids and their precursors.

摘要

最近经合组织的一份报告估计,已经生产和使用了超过 4000 种全氟和多氟烷基物质(PFAS),广泛应用于工业和消费领域。然而,对于大多数 PFAS 的潜在危害(例如生物活性、生物累积性和毒性)知之甚少。在这里,我们建立了基于机器学习的定量构效关系(QSAR)模型,以预测这些 PFAS 的生物活性。通过检查许多可用的分子数据集,我们构建了第一个专门针对 PFAS 的数据库,其中包含了 26 种生物测定法的 1012 种 PFAS 的生物活性信息。基于收集的 PFAS 数据集,我们训练了 5 种不同的机器学习模型,涵盖了各种传统模型(例如随机森林和多任务神经网络(MNN))和先进的基于图的模型(例如图卷积网络)。这些模型基于验证数据集进行了评估。MNN 和基于图的模型都表现出了最佳性能。每个生物测定的最佳曲线下面积分数的平均值为 0.916。对于经合组织清单上的预测,大多数具有生物活性的 PFAS 的全氟烷基链长度小于 12,并且分为氟调聚物相关化合物、全氟烷基酸及其前体。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验