Suppr超能文献

利用多种预测因素通过机器学习预测药物性肝损伤

Predicting Drug-Induced Liver Injury Using Machine Learning on a Diverse Set of Predictors.

作者信息

Adeluwa Temidayo, McGregor Brett A, Guo Kai, Hur Junguk

机构信息

Department of Biomedical Sciences, University of North Dakota, Grand Forks, ND, United States.

Department of Neurology, University of Michigan, Ann Arbor, MI, United States.

出版信息

Front Pharmacol. 2021 Aug 18;12:648805. doi: 10.3389/fphar.2021.648805. eCollection 2021.

Abstract

A major challenge in drug development is safety and toxicity concerns due to drug side effects. One such side effect, drug-induced liver injury (DILI), is considered a primary factor in regulatory clearance. The Critical Assessment of Massive Data Analysis (CAMDA) 2020 CMap Drug Safety Challenge goal was to develop prediction models based on gene perturbation of six preselected cell-lines (CMap L1000), extended structural information (MOLD2), toxicity data (TOX21), and FDA reporting of adverse events (FAERS). Four types of DILI classes were targeted, including two clinically relevant scores and two control classifications, designed by the CAMDA organizers. The L1000 gene expression data had variable drug coverage across cell lines with only 247 out of 617 drugs in the study measured in all six cell types. We addressed this coverage issue by using Kru-Bor ranked merging to generate a singular drug expression signature across all six cell lines. These merged signatures were then narrowed down to the top and bottom 100, 250, 500, or 1,000 genes most perturbed by drug treatment. These signatures were subject to feature selection using Fisher's exact test to identify genes predictive of DILI status. Models based solely on expression signatures had varying results for clinical DILI subtypes with an accuracy ranging from 0.49 to 0.67 and Matthews Correlation Coefficient (MCC) values ranging from -0.03 to 0.1. Models built using FAERS, MOLD2, and TOX21 also had similar results in predicting clinical DILI scores with accuracy ranging from 0.56 to 0.67 with MCC scores ranging from 0.12 to 0.36. To incorporate these various data types with expression-based models, we utilized soft, hard, and weighted ensemble voting methods using the top three performing models for each DILI classification. These voting models achieved a balanced accuracy up to 0.54 and 0.60 for the clinically relevant DILI subtypes. Overall, from our experiment, traditional machine learning approaches may not be optimal as a classification method for the current data.

摘要

药物研发中的一个重大挑战是药物副作用引起的安全性和毒性问题。其中一种副作用,即药物性肝损伤(DILI),被认为是监管批准的一个主要因素。2020年大规模数据分析关键评估(CAMDA)CMap药物安全挑战的目标是基于六种预先选定的细胞系(CMap L1000)的基因扰动、扩展结构信息(MOLD2)、毒性数据(TOX21)以及美国食品药品监督管理局(FDA)不良事件报告(FAERS)来开发预测模型。目标是四种类型的DILI类别,包括由CAMDA组织者设计的两种临床相关评分和两种对照分类。L1000基因表达数据在各细胞系中的药物覆盖范围不同,研究中的617种药物中只有247种在所有六种细胞类型中都有测量。我们通过使用Kru - Bor排序合并来解决这个覆盖问题,以生成跨越所有六种细胞系的单一药物表达特征。然后将这些合并的特征缩小到药物处理后扰动最大的前100、250、500或1000个基因。使用Fisher精确检验对这些特征进行特征选择,以识别预测DILI状态的基因。仅基于表达特征的模型对于临床DILI亚型的结果各不相同,准确率范围为0.49至0.67,马修斯相关系数(MCC)值范围为 - 0.03至 \alpha。使用FAERS、MOLD2和TOX21构建的模型在预测临床DILI评分方面也有类似结果,准确率范围为0.56至0.67,MCC评分范围为0.12至0.36。为了将这些不同的数据类型与基于表达的模型相结合,我们对每个DILI分类使用表现最佳的前三个模型,采用软、硬和加权集成投票方法。这些投票模型对于临床相关DILI亚型的平衡准确率高达0.54和0.60。总体而言,从我们的实验来看,传统机器学习方法作为当前数据的分类方法可能并非最佳选择。 (注:原文中“Matthews Correlation Coefficient (MCC) values ranging from -0.03 to 0.1”中0.1后似乎缺失了内容,这里按原文翻译,可能会影响整体理解。)

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf9f/8416433/e08159d16c5e/fphar-12-648805-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验