疾病进展模型的变量选择：肿瘤发生树方法及其在癌症和艾滋病中的应用

Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV.

作者信息

Hainke Katrin, Szugat Sebastian, Fried Roland, Rahnenführer Jörg

机构信息

Department of Statistics, TU Dortmund University, Dortmund, 44221, Germany.

出版信息

BMC Bioinformatics. 2017 Aug 1;18(1):358. doi: 10.1186/s12859-017-1762-1.

DOI:10.1186/s12859-017-1762-1

PMID:28764644

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5539896/

Abstract

BACKGROUND

Disease progression models are important for understanding the critical steps during the development of diseases. The models are imbedded in a statistical framework to deal with random variations due to biology and the sampling process when observing only a finite population. Conditional probabilities are used to describe dependencies between events that characterise the critical steps in the disease process. Many different model classes have been proposed in the literature, from simple path models to complex Bayesian networks. A popular and easy to understand but yet flexible model class are oncogenetic trees. These have been applied to describe the accumulation of genetic aberrations in cancer and HIV data. However, the number of potentially relevant aberrations is often by far larger than the maximal number of events that can be used for reliably estimating the progression models. Still, there are only a few approaches to variable selection, which have not yet been investigated in detail.

RESULTS

We fill this gap and propose specifically for oncogenetic trees ten variable selection methods, some of these being completely new. We compare them in an extensive simulation study and on real data from cancer and HIV. It turns out that the preselection of events by clique identification algorithms performs best. Here, events are selected if they belong to the largest or the maximum weight subgraph in which all pairs of vertices are connected.

CONCLUSIONS

The variable selection method of identifying cliques finds both the important frequent events and those related to disease pathways.

摘要

背景

疾病进展模型对于理解疾病发展过程中的关键步骤非常重要。这些模型嵌入在一个统计框架中，以处理由于生物学因素和仅观察有限总体时的抽样过程所导致的随机变化。条件概率用于描述表征疾病过程关键步骤的事件之间的依赖性。文献中已经提出了许多不同的模型类别，从简单的路径模型到复杂的贝叶斯网络。一种流行且易于理解但又灵活的模型类别是肿瘤发生树。这些模型已被应用于描述癌症和艾滋病数据中遗传畸变的积累。然而，潜在相关畸变的数量往往远大于可用于可靠估计进展模型的最大事件数量。尽管如此，变量选择的方法仍然很少，且尚未得到详细研究。

结果

我们填补了这一空白，特别针对肿瘤发生树提出了十种变量选择方法，其中一些是全新的。我们在广泛的模拟研究以及癌症和艾滋病的真实数据上对它们进行了比较。结果表明，通过团识别算法进行事件预选的方法表现最佳。在这里，如果事件属于所有顶点对都相连的最大或最大权重子图，则选择这些事件。

结论

识别团的变量选择方法既能找到重要的频繁事件，也能找到与疾病途径相关的事件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/78b8/5539896/d2f6f2680e79/12859_2017_1762_Fig1_HTML.jpg

相似文献

Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV.疾病进展模型的变量选择：肿瘤发生树方法及其在癌症和艾滋病中的应用

BMC Bioinformatics. 2017 Aug 1;18(1):358. doi: 10.1186/s12859-017-1762-1.

Cumulative disease progression models for cross-sectional data: a review and comparison.横断面数据的累积疾病进展模型：综述与比较

Biom J. 2012 Sep;54(5):617-40. doi: 10.1002/bimj.201100186. Epub 2012 Aug 8.

Timed hazard networks: Incorporating temporal difference for oncogenetic analysis.定时风险网络：纳入时变差异进行oncogenetic 分析。

PLoS One. 2023 Mar 16;18(3):e0283004. doi: 10.1371/journal.pone.0283004. eCollection 2023.

New probabilistic network models and algorithms for oncogenesis.肿瘤发生的新概率网络模型与算法

J Comput Biol. 2006 May;13(4):853-65. doi: 10.1089/cmb.2006.13.853.

k-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks.蛋白质相互作用的 k-分划团簇：一种用于 PPI 网络功能一致性分析的新子图拓扑结构。

J Theor Biol. 2014 Jan 7;340:146-54. doi: 10.1016/j.jtbi.2013.09.013. Epub 2013 Sep 19.

Estimating cancer survival and clinical outcome based on genetic tumor progression scores.基于基因肿瘤进展评分评估癌症生存率和临床结局。

Bioinformatics. 2005 May 15;21(10):2438-46. doi: 10.1093/bioinformatics/bti312. Epub 2005 Feb 10.

Learning oncogenetic networks by reducing to mixed integer linear programming.通过转化为混合整数线性规划来学习致癌基因网络。

PLoS One. 2013 Jun 14;8(6):e65773. doi: 10.1371/journal.pone.0065773. Print 2013.

Bias in random forest variable importance measures: illustrations, sources and a solution.随机森林变量重要性度量中的偏差：示例、来源及解决方案

BMC Bioinformatics. 2007 Jan 25;8:25. doi: 10.1186/1471-2105-8-25.

Model selection for mixtures of mutagenetic trees.诱变树混合物的模型选择。

Stat Appl Genet Mol Biol. 2006;5:Article17. doi: 10.2202/1544-6115.1164. Epub 2006 Jun 23.

Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling.识别肿瘤进展过程中突变累积顺序的限制因素：过客突变的影响、进化模型及抽样分析

BMC Bioinformatics. 2015 Feb 12;16:41. doi: 10.1186/s12859-015-0466-7.

引用本文的文献

Modelling cancer progression using Mutual Hazard Networks.使用相互风险网络对癌症进展进行建模。

Bioinformatics. 2020 Jan 1;36(1):241-249. doi: 10.1093/bioinformatics/btz513.

SNP variable selection by generalized graph domination.基于广义图控制的 SNP 变量选择。

PLoS One. 2019 Jan 24;14(1):e0203242. doi: 10.1371/journal.pone.0203242. eCollection 2019.

本文引用的文献

CAPRI: efficient inference of cancer progression models from cross-sectional data.CAPRI：从横截面数据中有效推断癌症进展模型。

Bioinformatics. 2015 Sep 15;31(18):3016-26. doi: 10.1093/bioinformatics/btv296. Epub 2015 May 13.

Inferring tree causal models of cancer progression with probability raising.基于概率提升推断癌症进展的树状因果模型。

PLoS One. 2014 Oct 9;9(10):e108358. doi: 10.1371/journal.pone.0108358. eCollection 2014.

Learning oncogenetic networks by reducing to mixed integer linear programming.通过转化为混合整数线性规划来学习致癌基因网络。

PLoS One. 2013 Jun 14;8(6):e65773. doi: 10.1371/journal.pone.0065773. Print 2013.

A method for finding consensus breakpoints in the cancer genome from copy number data.一种从拷贝数数据中寻找癌症基因组中一致性断点的方法。

Bioinformatics. 2013 Jul 15;29(14):1793-800. doi: 10.1093/bioinformatics/btt300. Epub 2013 May 28.

Cumulative disease progression models for cross-sectional data: a review and comparison.横断面数据的累积疾病进展模型：综述与比较

Biom J. 2012 Sep;54(5):617-40. doi: 10.1002/bimj.201100186. Epub 2012 Aug 8.

A mathematical methodology for determining the temporal order of pathway alterations arising during gliomagenesis.一种用于确定胶质瘤发生过程中途径改变的时间顺序的数学方法。

PLoS Comput Biol. 2012 Jan;8(1):e1002337. doi: 10.1371/journal.pcbi.1002337. Epub 2012 Jan 5.

The temporal order of genetic and pathway alterations in tumorigenesis.肿瘤发生中遗传和通路改变的时间顺序。

PLoS One. 2011;6(11):e27136. doi: 10.1371/journal.pone.0027136. Epub 2011 Nov 1.

Clonal cytogenetic progression within intratumorally heterogeneous meningiomas predicts tumor recurrence.肿瘤内异质性脑膜瘤中的克隆细胞遗传学进展可预测肿瘤复发。

Int J Oncol. 2011 Dec;39(6):1601-8. doi: 10.3892/ijo.2011.1199. Epub 2011 Sep 12.

Oncogenetic tree modeling of human hepatocarcinogenesis.人类肝癌发生的癌基因树建模。

Int J Cancer. 2012 Feb 1;130(3):575-83. doi: 10.1002/ijc.26063. Epub 2011 May 9.

A mathematical framework to determine the temporal sequence of somatic genetic events in cancer.用于确定癌症中体细胞遗传事件时间顺序的数学框架。

Proc Natl Acad Sci U S A. 2010 Oct 12;107(41):17604-9. doi: 10.1073/pnas.1009117107. Epub 2010 Sep 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

疾病进展模型的变量选择：肿瘤发生树方法及其在癌症和艾滋病中的应用

Variable selection for disease progression models: methods for oncogenetic trees and application to cancer and HIV.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献