Suppr超能文献

hg19K:解决基于hg19的变异检测中的一个重大空白。

hg19K: addressing a significant lacuna in hg19-based variant calling.

作者信息

Karthikeyan Savita, Bawa Pushpinder S, Srinivasan Subhashini

机构信息

Institute of Bioinformatics and Applied Biotechnology Biotech Park, Electronic City Phase I Bangalore 560100 India.

出版信息

Mol Genet Genomic Med. 2016 Nov 13;5(1):15-20. doi: 10.1002/mgg3.251. eCollection 2017 Jan.

Abstract

BACKGROUND

The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor alleles. Since commonly used variant call methods are developed under the assumption that hg19 reference harbors major alleles at all the ~3 billion positions, these methods mask the calls whenever an individual is homozygous to the minor allele at the respective positions. Hence, it is important to address the extent and impact of these minor alleles in hg19 from the point of view of individual genomes.

METHOD

We have created a reference genome, hg19K, in which all the positions in hg19 reference harboring minor allele were replaced by those from the phase 3 report of the 1000 genomes project. The genomes of five individuals, downloaded from the public repository, were analyzed using both hg19 and hg19K and compared.

RESULTS

Out of the 81 million SNPs in phase 3 report from the 1000 genomes project, 1.9 million positions were found to be major alleles compared to hg19 with many having an allele frequency of >0.9. We observed that ~30% of the SNVs found in individual genomes are confined to the 1.9 million positions. Also, there are ~8% unique SNVs predicted using hg19K-based approach, which are also confined to the 1.9 million positions.

CONCLUSION

We report that the presence of minor alleles in hg19 alone results in ~8% false negatives and ~30% false positives during variant calls. Also, among the variant calls unique to hg19K-based methods, which are missed in individuals homozygous to the minor alleles in hg19-based prediction, some are deleterious missense mutations at sites conserved across diverse species.

摘要

背景

人类基因组的hg19组装是注释最详尽且最常用于对个体基因组进行变异位点检测的参考序列。基于千人基因组计划(1000G)的第三阶段报告,现在已知hg19基因组中的许多位点代表次要等位基因。由于常用的变异位点检测方法是在假设hg19参考序列在所有约30亿个位点都含有主要等位基因的前提下开发的,所以当个体在相应位点为次要等位基因纯合子时,这些方法会掩盖变异位点的检测结果。因此,从个体基因组的角度出发,研究hg19中这些次要等位基因的程度和影响非常重要。

方法

我们创建了一个参考基因组hg19K,其中hg19参考序列中所有含有次要等位基因的位点都被千人基因组计划第三阶段报告中的相应位点所取代。从公共数据库下载的五个人的基因组,分别使用hg19和hg19K进行分析并比较。

结果

在千人基因组计划第三阶段报告的8100万个单核苷酸多态性(SNP)中,与hg19相比,发现有190万个位点是主要等位基因,其中许多位点的等位基因频率>0.9。我们观察到,在个体基因组中发现的约30%的单核苷酸变异(SNV)局限于这190万个位点。此外,使用基于hg19K的方法预测出约8%的独特SNV,这些SNV也局限于这190万个位点。

结论

我们报告称,仅hg19中次要等位基因的存在就会在变异位点检测过程中导致约8%的假阴性和约30%的假阳性。此外,在基于hg19K的方法所特有的变异位点检测结果中,那些在基于hg19的预测中次要等位基因纯合个体中未检测到的结果,有些是在不同物种间保守位点上的有害错义突变。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84e8/5241214/ad651a60f219/MGG3-5-15-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验