Sinclair Gabriel, Charest Nathaniel, Wetmore Barbara A, Frazier Olivia, Fisher Hunter A, Tornero-Velez Rogelio
Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States.
Oak Ridge Associated Universities, Assigned to Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States.
J Chem Inf Model. 2025 Aug 11;65(15):7994-8005. doi: 10.1021/acs.jcim.5c01040. Epub 2025 Jul 22.
Fraction unbound in plasma () is a crucial parameter in physiologically based toxicokinetic (PBTK) models, representing the fraction of a chemical compound that is not sequestered by plasma proteins when present in the bloodstream. This is often used as a proxy for the quantity of the compound that is bioavailable for metabolism or the exertion of physiological effects; on the other hand, a low is also a predictor for bioaccumulative potential. In this work, we propose and investigate a new machine learning methodology to improve our quantitative structure-activity relationship (QSAR) modeling of for specific chemical classes, including per- and polyfluoroalkyl substances (PFAS). We evaluate a novel transfer learning strategy across chemical space, using a deep learning model trained on a broad chemical library and fine-tuned on a small data set of PFAS, in terms of its added value compared to a global random forest model presented in a prior publication. Our results demonstrate increased statistical performance after the fine-tuning process when applied to other similarly small chemical families; however, due to the sparsity and imbalance of the data, the prior global model remains the most competitive for PFAS. We conclude our work with an investigation of the PFAS structural space in relation to the activity of interest, formulating recommendations for future experimental characterization to expand the knowledge space for modeling. The measurement of these data will inform our PFAS models and may ultimately produce sufficient data amenable to modeling to improve the viability of local and transfer learning approaches for this class of chemicals.
血浆中游离分数()是基于生理学的毒代动力学(PBTK)模型中的一个关键参数,它表示化合物在血液中时未被血浆蛋白结合的部分。这通常被用作该化合物可用于代谢或发挥生理效应的量的替代指标;另一方面,低游离分数也是生物累积潜力的一个预测指标。在这项工作中,我们提出并研究了一种新的机器学习方法,以改进我们针对特定化学类别(包括全氟和多氟烷基物质(PFAS))的血浆中游离分数的定量构效关系(QSAR)建模。我们评估了一种跨化学空间的新型迁移学习策略,该策略使用在广泛化学库上训练并在PFAS的小数据集上进行微调的深度学习模型,并将其与先前发表的全局随机森林模型相比的附加值进行评估。我们的结果表明,在应用于其他类似的小化学家族时,微调过程后统计性能有所提高;然而,由于数据的稀疏性和不平衡性,先前的全局模型在PFAS方面仍然最具竞争力。我们通过研究与感兴趣的活性相关的PFAS结构空间来结束我们的工作,为未来的实验表征制定建议,以扩大建模的知识空间。这些数据的测量将为我们的PFAS模型提供信息,并最终可能产生足够适合建模的数据,以提高针对这类化学品的局部和迁移学习方法的可行性。