Suppr超能文献

基于灵活模型的非负矩阵分解及其在突变特征中的应用。

Flexible model-based non-negative matrix factorization with application to mutational signatures.

机构信息

Department of Mathematics, 1006 Aarhus University , Aarhus, Denmark.

Department of Clinical Medicine and Bioinformatics Research Center, 1006 Aarhus University , Aarhus, Denmark.

出版信息

Stat Appl Genet Mol Biol. 2024 May 16;23(1). doi: 10.1515/sagmb-2023-0034. eCollection 2024 Jan 1.

Abstract

Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation-maximization (EM) algorithm and regression in the log-linear quasi-Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.

摘要

癌症中的体细胞突变可以被视为几种突变特征的混合分布,可以使用非负矩阵分解 (NMF) 来推断。突变特征以前使用简单的单核苷酸相互作用模型或通用的三核苷酸相互作用模型进行参数化。我们描述了一种用于识别突变特征生物学上合理的参数化的灵活而新颖的框架,特别是用于估计二核苷酸相互作用模型。我们新的估计程序基于期望最大化 (EM) 算法和对数线性拟泊松模型中的回归。我们表明,二核苷酸相互作用特征在统计学上是稳定的,并且足够复杂,可以拟合突变模式。二核苷酸相互作用特征通常在适当拟合数据和避免过度拟合之间取得了适当的平衡。它们比单核苷酸相互作用特征更能很好地拟合数据,并且在生物学上更合理,而且参数化比参数丰富的三核苷酸相互作用特征更稳定。我们在一个大型模拟研究中说明了我们的框架,在该研究中,我们将其与最先进的方法进行了比较,并展示了来自乳腺癌、肝脏和尿路的癌症患者体细胞突变计数的三个数据集的结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验