Suppr超能文献

从 DNA 测序数据推断癌症中的染色体选择参数和错误分离率。

Inference of chromosome selection parameters and missegregation rate in cancer from DNA-sequencing data.

机构信息

Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA.

出版信息

Sci Rep. 2024 Jul 31;14(1):17699. doi: 10.1038/s41598-024-67842-9.

Abstract

Aneuploidy is frequently observed in cancers and has been linked to poor patient outcome. Analysis of aneuploidy in DNA-sequencing (DNA-seq) data necessitates untangling the effects of the Copy Number Aberration (CNA) occurrence rates and the selection coefficients that act upon the resulting karyotypes. We introduce a parameter inference algorithm that takes advantage of both bulk and single-cell DNA-seq cohorts. The method is based on Approximate Bayesian Computation (ABC) and utilizes CINner, our recently introduced simulation algorithm of chromosomal instability in cancer. We examine three groups of statistics to summarize the data in the ABC routine: (A) Copy Number-based measures, (B) phylogeny tip statistics, and (C) phylogeny balance indices. Using these statistics, our method can recover both the CNA probabilities and selection parameters from ground truth data, and performs well even for data cohorts of relatively small sizes. We find that only statistics in groups A and C are well-suited for identifying CNA probabilities, and only group A carries the signals for estimating selection parameters. Moreover, the low number of CNA events at large scale compared to cell counts in single-cell samples means that statistics in group B cannot be estimated accurately using phylogeny reconstruction algorithms at the chromosome level. As data from both bulk and single-cell DNA-sequencing techniques becomes increasingly available, our inference framework promises to facilitate the analysis of distinct cancer types, differentiation between selection and neutral drift, and prediction of cancer clonal dynamics.

摘要

非整倍体经常在癌症中观察到,并且与患者预后不良有关。在 DNA 测序(DNA-seq)数据中分析非整倍体需要理清拷贝数畸变(CNA)发生率的影响和作用于产生的核型的选择系数。我们引入了一种参数推断算法,该算法利用了批量和单细胞 DNA-seq 队列。该方法基于近似贝叶斯计算(ABC),并利用我们最近引入的癌症染色体不稳定性模拟算法 CINner。我们检查了 ABC 例程中用于总结数据的三组统计数据:(A)基于拷贝数的度量,(B)系统发育枝点统计,和(C)系统发育平衡指数。使用这些统计数据,我们的方法可以从真实数据中恢复 CNA 概率和选择参数,并且即使对于相对较小的数据集也能很好地执行。我们发现,只有 A 组和 C 组的统计数据适合识别 CNA 概率,而只有 A 组携带用于估计选择参数的信号。此外,与单细胞样本中的细胞计数相比,大规模的 CNA 事件数量较少,这意味着在染色体水平上使用系统发育重建算法无法准确估计 B 组的统计数据。随着批量和单细胞 DNA 测序技术的数据越来越多,我们的推断框架有望促进不同癌症类型的分析、选择和中性漂变之间的区分,以及癌症克隆动力学的预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c291/11291923/7a1b786587f0/41598_2024_67842_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验