Suppr超能文献

从群体规模的高通量测序数据中估计等位基因的拷贝数。

Estimating copy numbers of alleles from population-scale high-throughput sequencing data.

作者信息

Mimori Takahiro, Nariai Naoki, Kojima Kaname, Sato Yukuto, Kawai Yosuke, Yamaguchi-Kabata Yumi, Nagasaki Masao

出版信息

BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2105-16-S1-S4. Epub 2015 Jan 21.

Abstract

BACKGROUND

With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci.

RESULTS

We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring.

CONCLUSIONS

Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.

摘要

背景

随着微阵列和高通量测序(HTS)技术的最新发展,一些研究已经揭示了拷贝数变异(CNV)的目录及其与表型和复杂性状的关联。与此同时,针对微阵列和HTS数据,提出了许多预测CNV区域和基因型的方法。然而,只有少数方法专注于CNV位点的单倍型分型。

结果

我们提出了一种新颖的方法,通过对受潜在狄利克雷分配启发的生成概率模型进行变分贝叶斯推理,从群体规模的HTS数据中同时推断每个样本中的拷贝单元等位基因及其数量,潜在狄利克雷分配是一种针对文档分类问题进行了充分研究的模型。在模拟研究中,我们评估了低、中、高拷贝数数据集推断的和真实的拷贝单元等位基因之间的一致性,其中对于每个拷贝单元平均覆盖度≥10×的数据,精确率和召回率均≥0.9。我们还将该方法应用于1123个样本在唾液淀粉酶基因高度可变位点和一个假基因位点的HTS数据,并证实了属于CEPH/犹他州系谱1463且有11个后代的三人组样本中估计等位基因的一致性。

结论

我们提出的方法能够对拷贝数变异进行详细分析,例如拷贝单元等位基因与表型或包括人类疾病在内的生物学特征之间的关联研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c297/4331703/2cdf3c8a2b55/1471-2105-16-S1-S4-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验