Suppr超能文献

mSigHdp:用于突变特征发现的层次狄利克雷过程混合建模

mSigHdp: hierarchical Dirichlet process mixture modeling for mutational signature discovery.

作者信息

Liu Mo, Wu Yang, Jiang Nanhai, Boot Arnoud, Rozen Steven G

机构信息

Programme in Cancer & Stem Cell Biology, Duke-NUS Medical School, 169857 Singapore.

Centre for Computational Biology, Duke-NUS Medical School, 169857 Singapore.

出版信息

NAR Genom Bioinform. 2023 Jan 23;5(1):lqad005. doi: 10.1093/nargab/lqad005. eCollection 2023 Mar.

Abstract

Mutational signatures are characteristic patterns of mutations caused by endogenous or exogenous mutational processes. These signatures can be discovered by analyzing mutations in large sets of samples-usually somatic mutations in tumor samples. Most programs for discovering mutational signatures are based on non-negative matrix factorization (NMF). Alternatively, signatures can be discovered using hierarchical Dirichlet process (HDP) mixture models, an approach that has been less explored. These models assign mutations to clusters and view each cluster as being generated from the signature of a particular mutational process. Here, we describe mSigHdp, an improved approach to using HDP mixture models to discover mutational signatures. We benchmarked mSigHdp and state-of-the-art NMF-based approaches on four realistic synthetic data sets. These data sets encompassed 18 cancer types. In total, they contained 3.5 × 10 single-base-substitution mutations representing 32 signatures and 6.1 × 10 small insertion and deletion mutations representing 13 signatures. For three of the four data sets, mSigHdp had the best positive predictive value for discovering mutational signatures, and for all four data sets, it had the best true positive rate. Its CPU usage was similar to that of the NMF-based approaches. Thus, mSigHdp is an important and practical addition to the set of tools available for discovering mutational signatures.

摘要

突变特征是由内源性或外源性突变过程引起的突变特征模式。这些特征可以通过分析大量样本中的突变来发现——通常是肿瘤样本中的体细胞突变。大多数发现突变特征的程序都基于非负矩阵分解(NMF)。另外,也可以使用层次狄利克雷过程(HDP)混合模型来发现特征,这种方法的探索较少。这些模型将突变分配到不同簇,并将每个簇视为由特定突变过程的特征产生的。在这里,我们描述了mSigHdp,这是一种使用HDP混合模型发现突变特征的改进方法。我们在四个逼真的合成数据集上对mSigHdp和基于NMF的先进方法进行了基准测试。这些数据集涵盖了18种癌症类型。它们总共包含3.5×10个单碱基替换突变,代表32个特征,以及6.1×10个小插入和缺失突变,代表13个特征。对于四个数据集中的三个,mSigHdp在发现突变特征方面具有最佳的阳性预测值,对于所有四个数据集,它具有最佳的真阳性率。其CPU使用率与基于NMF的方法相似。因此,mSigHdp是可用于发现突变特征的工具集的一个重要且实用的补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e43/9869330/f81830b90530/lqad005fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验