Suppr超能文献

基于多元增益比的生物标志物相互作用选择和疾病检测。

Biomarker interaction selection and disease detection based on multivariate gain ratio.

机构信息

Academy of Mathematics and Systems Science Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China.

Academy of Mathematics and Systems Science Chinese Academy of Sciences, Beijing, China.

出版信息

BMC Bioinformatics. 2022 May 12;23(1):176. doi: 10.1186/s12859-022-04699-7.

Abstract

BACKGROUND

Disease detection is an important aspect of biotherapy. With the development of biotechnology and computer technology, there are many methods to detect disease based on single biomarker. However, biomarker does not influence disease alone in some cases. It's the interaction between biomarkers that determines disease status. The existing influence measure I-score is used to evaluate the importance of interaction in determining disease status, but there is a deviation about the number of variables in interaction when applying I-score. To solve the problem, we propose a new influence measure Multivariate Gain Ratio (MGR) based on Gain Ratio (GR) of single-variate, which provides us with multivariate combination called interaction.

RESULTS

We propose a preprocessing verification algorithm based on partial predictor variables to select an appropriate preprocessing method. In this paper, an algorithm for selecting key interactions of biomarkers and applying key interactions to construct a disease detection model is provided. MGR is more credible than I-score in the case of interaction containing small number of variables. Our method behaves better with average accuracy [Formula: see text] than I-score of [Formula: see text] in Breast Cancer Wisconsin (Diagnostic) Dataset. Compared to the classification results [Formula: see text] based on all predictor variables, MGR identifies the true main biomarkers and realizes the dimension reduction. In Leukemia Dataset, the experiment results show the effectiveness of MGR with the accuracy of [Formula: see text] compared to I-score with accuracy [Formula: see text]. The results can be explained by the nature of MGR and I-score mentioned above because every key interaction contains a small number of variables in Leukemia Dataset.

CONCLUSIONS

MGR is effective for selecting important biomarkers and biomarker interactions even in high-dimension feature space in which the interaction could contain more than two biomarkers. The prediction ability of interactions selected by MGR is better than I-score in the case of interaction containing small number of variables. MGR is generally applicable to various types of biomarker datasets including cell nuclei, gene, SNPs and protein datasets.

摘要

背景

疾病检测是生物疗法的一个重要方面。随着生物技术和计算机技术的发展,有许多基于单一生物标志物的疾病检测方法。然而,在某些情况下,生物标志物并不会单独影响疾病,而是生物标志物之间的相互作用决定了疾病的状态。现有的影响度量 I-score 用于评估相互作用在确定疾病状态中的重要性,但在应用 I-score 时,相互作用中变量的数量存在偏差。为了解决这个问题,我们提出了一种新的基于单变量增益比 (GR) 的影响度量多变量增益比 (MGR),它为我们提供了一种称为相互作用的多变量组合。

结果

我们提出了一种基于部分预测变量的预处理验证算法,以选择合适的预处理方法。本文提供了一种选择生物标志物关键相互作用并应用关键相互作用构建疾病检测模型的算法。在变量数量较少的相互作用情况下,MGR 比 I-score 更可信。在乳腺癌威斯康星州(诊断)数据集上,我们的方法的平均准确率 [Formula: see text] 优于 I-score 的 [Formula: see text]。与基于所有预测变量的分类结果 [Formula: see text] 相比,MGR 可以识别真正的主要生物标志物并实现降维。在白血病数据集上,实验结果表明,MGR 的准确率为 [Formula: see text] 优于 I-score 的准确率 [Formula: see text]。这可以用上面提到的 MGR 和 I-score 的性质来解释,因为在白血病数据集中,每个关键相互作用都包含少量的变量。

结论

即使在高维特征空间中,MGR 也可以有效地选择重要的生物标志物和生物标志物相互作用,并且相互作用可以包含多个生物标志物。在变量数量较少的相互作用情况下,MGR 选择的相互作用的预测能力优于 I-score。MGR 一般适用于包括细胞核、基因、SNP 和蛋白质数据集在内的各种类型的生物标志物数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5062/9103137/18e4e21e83bb/12859_2022_4699_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验