Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences (NIEHS), Research Triangle Park, NC 27709, United States.
J Biomed Inform. 2022 Oct;134:104197. doi: 10.1016/j.jbi.2022.104197. Epub 2022 Sep 6.
An important aspect of cancer progression concerns the way in which gene mutations accumulate in cellular lineages. Comprehensive efforts into cataloging cancer genes have revealed that tumors demonstrate variability in genes that accumulate mutations which depend on the presence or absence of other mutations. However, understanding the stochastic process by which mutations arise across the genome is an important open problem of this nature in biology due to modeling discrete variate time-series is the most challenging, and, as yet, least well-developed of all areas of research in time-series. In this paper, a DEGBOE framework is proposed to model the mutation time-series given the sequence data of the gene mutations. The method relates the discrete-time, nonlinear and nonstationary series of gene mutations to the time-varying autoregressive moving average model. It presents the observation as a nonlinear function dependent on two variables: gene mutation, and gene-gene interactions characterizing the effects of the varying presence or absence of other gene mutations on a mutations' occurrence and evolution. DEGBOE is applied to model the dynamics of frequently mutated genes in lung cancer, includingEGFR,KRAS, and TP53. The results of the model are analyzed and compared to the original simulated data of theDNAwalk, and experimental lung cancer mutations data. It identifies the driver role of TP53 mutations in lung cancer progression.
癌症进展的一个重要方面涉及基因突变在细胞谱系中积累的方式。全面的癌症基因编目工作表明,肿瘤在积累突变的基因方面存在变异性,这取决于其他突变的存在与否。然而,由于对离散变量时间序列进行建模是生物学中此类性质的一个重要的开放性问题,因此了解基因组中突变的随机过程仍然是一个挑战,而且在时间序列的所有研究领域中,它仍然是发展最不完善的领域。在本文中,提出了一个 DEGBOE 框架来对基因突变的序列数据进行基因突变时间序列建模。该方法将基因突变的离散时间、非线性和非平稳序列与时变自回归移动平均模型联系起来。它将观察结果表示为一个依赖于两个变量的非线性函数:基因突变和基因-基因相互作用,它们描述了其他基因突变的不同存在或缺失对突变发生和进化的影响。DEGBOE 被应用于模拟肺癌中经常发生突变的基因的动力学,包括 EGFR、KRAS 和 TP53。对模型的结果进行了分析,并与 DNAwalk 的原始模拟数据和实验性肺癌突变数据进行了比较。它确定了 TP53 突变在肺癌进展中的驱动作用。