Suppr超能文献

存在非加性相互作用时遗传关联随机森林模型解释方法的比较

A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions.

作者信息

Orlenko Alena, Moore Jason H

机构信息

Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.

出版信息

BioData Min. 2021 Jan 29;14(1):9. doi: 10.1186/s13040-021-00243-0.

Abstract

BACKGROUND

Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer's, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest method is often employed in these efforts due to its ability to detect and model non-additive interactions. In addition, Random Forest has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach to calculate a feature contribution coefficient in units of the decrease in the model's performance and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which Random Forest feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis.

RESULTS

To address this issue, and to improve interpretability of Random Forest predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions.

CONCLUSIONS

By analyzing both real and simulated data, we established that the permutation feature importance metric provides more precise feature importance rank estimation in the presence of non-additive interactions.

摘要

背景

基因间的非加性相互作用常与多种表型相关,包括阿尔茨海默病、糖尿病和心血管疾病等已知的复杂疾病。检测相互作用需要仔细选择分析方法,一些机器学习算法无法检测或对表现出非加性的特征相互作用进行建模,或者能力不足。由于能够检测和对非加性相互作用进行建模,随机森林方法常用于这些研究中。此外,随机森林具有估计特征重要性得分的内置能力,这一特性使得模型能够根据特征与结果关联的顺序和效应大小进行解释。这一特性对于流行病学和临床研究非常重要,在这些研究中,预测模型的结果可用于确定研究工作的未来方向。解释模型的另一种方法是使用排列特征重要性度量,该方法采用排列方法以模型性能下降的单位来计算特征贡献系数,以及使用基于合作博弈论方法的沙普利加性解释。目前,尚不清楚哪种随机森林特征重要性度量能在基因关联分析中对特征的真正信息贡献提供更优估计。

结果

为解决这一问题并提高随机森林预测的可解释性,我们在具有非加性相互作用的真实和模拟数据集中比较了不同的特征重要性估计方法。结果,我们检测到真实世界数据集的度量之间存在差异,并进一步确定排列特征重要性度量为具有非加性相互作用的模拟数据集提供了更精确的特征重要性排名估计。

结论

通过对真实和模拟数据的分析,我们确定在存在非加性相互作用的情况下,排列特征重要性度量提供了更精确的特征重要性排名估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f35/7847145/cc7ce8fc6665/13040_2021_243_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验