一种用于重建基于亚克隆群体的体细胞拷贝数变异的下一代测序数据的流程。

A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data.

作者信息

Chu Yanshuo, Nie Chenxi, Wang Yadong

机构信息

Center of Bioinfomatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.

出版信息

Front Genet. 2020 Feb 27;10:1374. doi: 10.3389/fgene.2019.01374. eCollection 2019.

DOI:10.3389/fgene.2019.01374

PMID:32180789

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7058119/

Abstract

State-of-the-art next-generation sequencing (NGS)-based subclonal reconstruction methods perform poorly on somatic copy number alternations (SCNAs), due to not only it needs to simultaneously estimate the subclonal population frequency and the absolute copy number for each SCNA, but also there exist complex bias and noise in the tumor and its paired normal sequencing data. Both existing NGS-based SCNA detection methods and SCNA's subclonal population frequency inferring tools use the read count on radio (RCR) of tumor to its paired normal as the key feature of tumor sequencing data; however, the sequencing error and bias have great impact on RCR, which leads to a large number of redundant SCNA segments that make the subsequent process of SCNA's subclonal population frequency inferring and subclonal reconstruction time-consuming and inaccurate. We perform a mathematical analysis of the solution number of SCNA's subclonal frequency, and we propose a computational algorithm to reduce the impact of false breakpoints based on it. We construct a new probability model that incorporates the RCR bias correction algorithm, and by stringing it with the false breakpoint filtering algorithm, we construct a whole SCNA's subclonal population reconstruction pipeline. The experimental result shows that our pipeline outperforms the existing subclonal reconstruction programs both on simulated data and TCGA data. Source code is publicly available as a Python package at https://github.com/dustincys/msphy-SCNAClonal.

摘要

基于最先进的下一代测序（NGS）的亚克隆重建方法在体细胞拷贝数改变（SCNA）方面表现不佳，这不仅是因为它需要同时估计每个SCNA的亚克隆群体频率和绝对拷贝数，还因为在肿瘤及其配对的正常测序数据中存在复杂的偏差和噪声。现有的基于NGS的SCNA检测方法和SCNA的亚克隆群体频率推断工具都将肿瘤与其配对正常样本的读数计数比（RCR）作为肿瘤测序数据的关键特征；然而，测序错误和偏差对RCR有很大影响，这导致大量冗余的SCNA片段，使得后续的SCNA亚克隆群体频率推断和亚克隆重建过程既耗时又不准确。我们对SCNA亚克隆频率的解的数量进行了数学分析，并在此基础上提出了一种计算算法来减少错误断点的影响。我们构建了一个包含RCR偏差校正算法的新概率模型，并将其与错误断点过滤算法串联起来，构建了一个完整的SCNA亚克隆群体重建流程。实验结果表明，我们的流程在模拟数据和TCGA数据上均优于现有的亚克隆重建程序。源代码作为一个Python包在https://github.com/dustincys/msphy-SCNAClonal上公开可用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89a7/7058119/33db45f20f19/fgene-10-01374-g001.jpg

相似文献

A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data.一种用于重建基于亚克隆群体的体细胞拷贝数变异的下一代测序数据的流程。

Front Genet. 2020 Feb 27;10:1374. doi: 10.3389/fgene.2019.01374. eCollection 2019.

Modeling and correct the GC bias of tumor and normal WGS data for SCNA based tumor subclonal population inferring.基于肿瘤亚克隆群体推断的肿瘤和正常 WGS 数据的 GC 偏差建模和校正。

BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):112. doi: 10.1186/s12859-018-2099-0.

MixClone: a mixture model for inferring tumor subclonal populations.MixClone：一种用于推断肿瘤亚克隆群体的混合模型。

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-16-S2-S1. Epub 2015 Jan 21.

Comprehensive analysis of intratumoural heterogeneity of somatic copy number alterations in diffuse glioma reveals clonality-dependent prognostic patterns.弥漫性胶质瘤中体细胞拷贝数改变的肿瘤内异质性综合分析揭示了克隆依赖性预后模式。

Neuropathol Appl Neurobiol. 2022 Oct;48(6):e12831. doi: 10.1111/nan.12831. Epub 2022 Jul 6.

CLImAT-HET: detecting subclonal copy number alterations and loss of heterozygosity in heterogeneous tumor samples from whole-genome sequencing data.CLImAT-HET：从全基因组测序数据中检测异质性肿瘤样本中的亚克隆拷贝数改变和杂合性缺失

BMC Med Genomics. 2017 Mar 15;10(1):15. doi: 10.1186/s12920-017-0255-4.

PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.PhyloWGS：从肿瘤全基因组测序中重建亚克隆组成与进化

Genome Biol. 2015 Feb 13;16(1):35. doi: 10.1186/s13059-015-0602-8.

CloneCNA: detecting subclonal somatic copy number alterations in heterogeneous tumor samples from whole-exome sequencing data.CloneCNA：从全外显子测序数据中检测异质性肿瘤样本中的亚克隆体细胞拷贝数改变。

BMC Bioinformatics. 2016 Aug 19;17:310. doi: 10.1186/s12859-016-1174-7.

Decomposing the subclonal structure of tumors with two-way mixture models on copy number aberrations.基于拷贝数异常的双向混合模型分解肿瘤的亚克隆结构。

PLoS One. 2018 Dec 12;13(12):e0206579. doi: 10.1371/journal.pone.0206579. eCollection 2018.

Crowd-sourced benchmarking of single-sample tumor subclonal reconstruction.单样本肿瘤亚克隆重建的众包基准测试

Nat Biotechnol. 2025 Apr;43(4):581-592. doi: 10.1038/s41587-024-02250-y. Epub 2024 Jun 11.

TargetClone: A multi-sample approach for reconstructing subclonal evolution of tumors.TargetClone：一种用于重建肿瘤亚克隆进化的多样本方法。

PLoS One. 2018 Nov 29;13(11):e0208002. doi: 10.1371/journal.pone.0208002. eCollection 2018.

本文引用的文献

gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions.gutMDisorder：一个综合数据库，用于研究疾病和干预措施中肠道微生物失调。

Nucleic Acids Res. 2020 Jan 8;48(D1):D554-D560. doi: 10.1093/nar/gkz843.

Exposing the Causal Effect of C-Reactive Protein on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study.揭示C反应蛋白对2型糖尿病风险的因果效应：一项孟德尔随机化研究

Front Genet. 2018 Dec 20;9:657. doi: 10.3389/fgene.2018.00657. eCollection 2018.

LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse.LncRNA2Target v2.0：一个综合性的数据库，包含人类和小鼠中 lncRNA 的靶基因。

Nucleic Acids Res. 2019 Jan 8;47(D1):D140-D144. doi: 10.1093/nar/gky1051.

DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function.环状 RNA（circRNA）：探索疾病关联和 ncRNA 功能的综合性基于网络的生物信息学工具包。

Bioinformatics. 2018 Jun 1;34(11):1953-1956. doi: 10.1093/bioinformatics/bty002.

InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk.InfAcrOnt：使用信息流动的随机游走计算跨本体术语相似度。

BMC Genomics. 2018 Jan 19;19(Suppl 1):919. doi: 10.1186/s12864-017-4338-6.

MetSigDis: a manually curated resource for the metabolic signatures of diseases.MetSigDis：疾病代谢特征的人工策管资源。

Brief Bioinform. 2019 Jan 18;20(1):203-209. doi: 10.1093/bib/bbx103.

OAHG: an integrated resource for annotating human genes with multi-level ontologies.OAHG：一个用于使用多层次本体注释人类基因的综合资源。

Sci Rep. 2016 Oct 5;6:34820. doi: 10.1038/srep34820.

PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors.PhyloWGS：从肿瘤全基因组测序中重建亚克隆组成与进化

Genome Biol. 2015 Feb 13;16(1):35. doi: 10.1186/s13059-015-0602-8.

MixClone: a mixture model for inferring tumor subclonal populations.MixClone：一种用于推断肿瘤亚克隆群体的混合模型。

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-16-S2-S1. Epub 2015 Jan 21.

Inferring clonal evolution of tumors from single nucleotide somatic mutations.从单核苷酸体细胞突变推断肿瘤的克隆进化。

BMC Bioinformatics. 2014 Feb 1;15:35. doi: 10.1186/1471-2105-15-35.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于重建基于亚克隆群体的体细胞拷贝数变异的下一代测序数据的流程。

A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献