基于从下一代测序数据中检测到的拷贝数变异进行的人群聚类。

Population clustering based on copy number variations detected from next generation sequencing data.

作者信息

Duan Junbo, Zhang Ji-Gang, Wan Mingxi, Deng Hong-Wen, Wang Yu-Ping

机构信息

Department of Biomedical Engineering, Xi'an Jiaotong University, Xi'an, P. R. China.

出版信息

J Bioinform Comput Biol. 2014 Aug;12(4):1450021. doi: 10.1142/S0219720014500218. Epub 2014 Aug 19.

DOI:10.1142/S0219720014500218

PMID:25152046

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4504183/

Abstract

Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

摘要

拷贝数变异（CNV）可作为重要的生物标志物，而新一代测序（NGS）能对这些CNV进行高分辨率检测。但如何从CNV中提取特征并将其进一步应用于群体聚类等基因组研究已成为一大挑战。在本文中，我们提出了一种基于NGS的CNV进行群体聚类的新方法。首先，从每个样本中提取CNV以形成特征矩阵。然后，使用非负矩阵分解（NMF）将该特征矩阵分解为源矩阵和权重矩阵。源矩阵由同一组所有样本共享的常见CNV组成，权重矩阵表示每个样本中CNV的相应水平。因此，通过对CNV进行NMF可以区分不同种族的样本，即进行群体聚类。为了验证该方法，我们将其应用于模拟数据以及千人基因组计划的两个真实数据集的分析。模拟数据的结果表明，所提出的方法能够高质量地恢复真实的常见CNV。第一次真实数据分析的结果表明，所提出的方法可以将具有不同祖先的两个家系三联体聚类为两个种族群体，第二次真实数据分析的结果表明，该方法可以应用于由多个群体组成的大样本全基因组。这两个结果都证明了所提出的方法在群体聚类方面的潜力。

相似文献

Population clustering based on copy number variations detected from next generation sequencing data.

J Bioinform Comput Biol. 2014 Aug;12(4):1450021. doi: 10.1142/S0219720014500218. Epub 2014 Aug 19.

Detection of common copy number variation with application to population clustering from next generation sequencing data.

Annu Int Conf IEEE Eng Med Biol Soc. 2012;2012:1246-9. doi: 10.1109/EMBC.2012.6346163.

Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data.

IEEE Trans Nanobioscience. 2018 Mar;17(1):12-20. doi: 10.1109/TNB.2017.2783910.

Copy number variations in the genome of the Qatari population.

BMC Genomics. 2015 Oct 22;16:834. doi: 10.1186/s12864-015-1991-5.

CNV-TV: a robust method to discover copy number variation from short sequencing reads.

BMC Bioinformatics. 2013 May 2;14:150. doi: 10.1186/1471-2105-14-150.

Evaluation of copy number variant detection from panel-based next-generation sequencing data.

Mol Genet Genomic Med. 2019 Jan;7(1):e00513. doi: 10.1002/mgg3.513. Epub 2018 Nov 22.

SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data.

BMC Bioinformatics. 2017 Mar 3;18(1):147. doi: 10.1186/s12859-017-1566-3.

Detection of Copy Number Variation Regions Using the DNA-Sequencing Data from Multiple Profiles with Correlated Structure.

J Comput Biol. 2018 Oct;25(10):1128-1140. doi: 10.1089/cmb.2018.0053. Epub 2018 Jul 27.

Analysis of five deep-sequenced trio-genomes of the Peninsular Malaysia Orang Asli and North Borneo populations.

BMC Genomics. 2019 Nov 12;20(1):842. doi: 10.1186/s12864-019-6226-8.

Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm.

Nucleic Acids Res. 2011 May;39(10):e65. doi: 10.1093/nar/gkr068. Epub 2011 Feb 14.

本文引用的文献

Clustering-Based Method for Developing a Genomic Copy Number Alteration Signature for Predicting the Metastatic Potential of Prostate Cancer.

J Probab Stat. 2012;2012(2012):873570. doi: 10.1155/2012/873570.

Subtyping of Gliomaby Combining Gene Expression and CNVs Data Based on a Compressive Sensing Approach.

Adv Genet Eng. 2012 Jan 16;1:101. doi: 10.4172/2169-0111.1000101.

Bioinformatics for next generation sequencing data.

Genes (Basel). 2010 Sep 14;1(2):294-307. doi: 10.3390/genes1020294.

CNV-TV: a robust method to discover copy number variation from short sequencing reads.

BMC Bioinformatics. 2013 May 2;14:150. doi: 10.1186/1471-2105-14-150.

Comparative studies of copy number variation detection methods for next-generation sequencing technologies.

PLoS One. 2013;8(3):e59128. doi: 10.1371/journal.pone.0059128. Epub 2013 Mar 20.

Detection of common copy number variation with application to population clustering from next generation sequencing data.

Annu Int Conf IEEE Eng Med Biol Soc. 2012;2012:1246-9. doi: 10.1109/EMBC.2012.6346163.

Copy number variation signature to predict human ancestry.

BMC Bioinformatics. 2012 Dec 27;13:336. doi: 10.1186/1471-2105-13-336.

Copy number variation leads to considerable diversity for B but not A haplotypes of the human KIR genes encoding NK cell receptors.

Genome Res. 2012 Oct;22(10):1845-54. doi: 10.1101/gr.137976.112. Epub 2012 Sep 4.

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

Nucleic Acids Res. 2012 May;40(9):e69. doi: 10.1093/nar/gks003. Epub 2012 Feb 1.

Comparative studies of de novo assembly tools for next-generation sequencing technologies.

Bioinformatics. 2011 Aug 1;27(15):2031-7. doi: 10.1093/bioinformatics/btr319. Epub 2011 Jun 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于从下一代测序数据中检测到的拷贝数变异进行的人群聚类。

Population clustering based on copy number variations detected from next generation sequencing data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献