• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

hg19K:解决基于hg19的变异检测中的一个重大空白。

hg19K: addressing a significant lacuna in hg19-based variant calling.

作者信息

Karthikeyan Savita, Bawa Pushpinder S, Srinivasan Subhashini

机构信息

Institute of Bioinformatics and Applied Biotechnology Biotech Park, Electronic City Phase I Bangalore 560100 India.

出版信息

Mol Genet Genomic Med. 2016 Nov 13;5(1):15-20. doi: 10.1002/mgg3.251. eCollection 2017 Jan.

DOI:10.1002/mgg3.251
PMID:28116326
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5241214/
Abstract

BACKGROUND

The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor alleles. Since commonly used variant call methods are developed under the assumption that hg19 reference harbors major alleles at all the ~3 billion positions, these methods mask the calls whenever an individual is homozygous to the minor allele at the respective positions. Hence, it is important to address the extent and impact of these minor alleles in hg19 from the point of view of individual genomes.

METHOD

We have created a reference genome, hg19K, in which all the positions in hg19 reference harboring minor allele were replaced by those from the phase 3 report of the 1000 genomes project. The genomes of five individuals, downloaded from the public repository, were analyzed using both hg19 and hg19K and compared.

RESULTS

Out of the 81 million SNPs in phase 3 report from the 1000 genomes project, 1.9 million positions were found to be major alleles compared to hg19 with many having an allele frequency of >0.9. We observed that ~30% of the SNVs found in individual genomes are confined to the 1.9 million positions. Also, there are ~8% unique SNVs predicted using hg19K-based approach, which are also confined to the 1.9 million positions.

CONCLUSION

We report that the presence of minor alleles in hg19 alone results in ~8% false negatives and ~30% false positives during variant calls. Also, among the variant calls unique to hg19K-based methods, which are missed in individuals homozygous to the minor alleles in hg19-based prediction, some are deleterious missense mutations at sites conserved across diverse species.

摘要

背景

人类基因组的hg19组装是注释最详尽且最常用于对个体基因组进行变异位点检测的参考序列。基于千人基因组计划(1000G)的第三阶段报告,现在已知hg19基因组中的许多位点代表次要等位基因。由于常用的变异位点检测方法是在假设hg19参考序列在所有约30亿个位点都含有主要等位基因的前提下开发的,所以当个体在相应位点为次要等位基因纯合子时,这些方法会掩盖变异位点的检测结果。因此,从个体基因组的角度出发,研究hg19中这些次要等位基因的程度和影响非常重要。

方法

我们创建了一个参考基因组hg19K,其中hg19参考序列中所有含有次要等位基因的位点都被千人基因组计划第三阶段报告中的相应位点所取代。从公共数据库下载的五个人的基因组,分别使用hg19和hg19K进行分析并比较。

结果

在千人基因组计划第三阶段报告的8100万个单核苷酸多态性(SNP)中,与hg19相比,发现有190万个位点是主要等位基因,其中许多位点的等位基因频率>0.9。我们观察到,在个体基因组中发现的约30%的单核苷酸变异(SNV)局限于这190万个位点。此外,使用基于hg19K的方法预测出约8%的独特SNV,这些SNV也局限于这190万个位点。

结论

我们报告称,仅hg19中次要等位基因的存在就会在变异位点检测过程中导致约8%的假阴性和约30%的假阳性。此外,在基于hg19K的方法所特有的变异位点检测结果中,那些在基于hg19的预测中次要等位基因纯合个体中未检测到的结果,有些是在不同物种间保守位点上的有害错义突变。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84e8/5241214/0bc994f29093/MGG3-5-15-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84e8/5241214/ad651a60f219/MGG3-5-15-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84e8/5241214/52cd86f6abb2/MGG3-5-15-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84e8/5241214/0bc994f29093/MGG3-5-15-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84e8/5241214/ad651a60f219/MGG3-5-15-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84e8/5241214/52cd86f6abb2/MGG3-5-15-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84e8/5241214/0bc994f29093/MGG3-5-15-g003.jpg

相似文献

1
hg19K: addressing a significant lacuna in hg19-based variant calling.hg19K:解决基于hg19的变异检测中的一个重大空白。
Mol Genet Genomic Med. 2016 Nov 13;5(1):15-20. doi: 10.1002/mgg3.251. eCollection 2017 Jan.
2
hg19KIndel: ethnicity normalized human reference genome.hg19KIndel:经过族群标准化的人类参考基因组。
BMC Genomics. 2019 Jun 6;20(1):459. doi: 10.1186/s12864-019-5854-3.
3
Similarities and differences between variants called with human reference genome HG19 or HG38.与使用人类参考基因组 HG19 或 HG38 调用的变体之间的相似性和差异。
BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):101. doi: 10.1186/s12859-019-2620-0.
4
Challenges imposed by minor reference alleles on the identification and reporting of clinical variants from exome data.外显子组数据中临床变异体的鉴定和报告中次要参考等位基因带来的挑战。
BMC Genomics. 2018 Jan 15;19(1):46. doi: 10.1186/s12864-018-4433-3.
5
Mapping Bias Overestimates Reference Allele Frequencies at the HLA Genes in the 1000 Genomes Project Phase I Data.映射偏差高估了千人基因组计划一期数据中HLA基因的参考等位基因频率。
G3 (Bethesda). 2015 Mar 17;5(5):931-41. doi: 10.1534/g3.114.015784.
6
Genomic medicine and risk prediction across the disease spectrum.基因组医学与疾病谱中的风险预测。
Crit Rev Clin Lab Sci. 2015;52(3):120-37. doi: 10.3109/10408363.2014.997930. Epub 2015 Jan 19.
7
Characterization and identification of hidden rare variants in the human genome.人类基因组中隐藏的罕见变异的特征描述与鉴定
BMC Genomics. 2015 Apr 24;16(1):340. doi: 10.1186/s12864-015-1481-9.
8
The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data.利用非变异位点改善全基因组序列数据的临床评估
PLoS One. 2015 Jul 6;10(7):e0132180. doi: 10.1371/journal.pone.0132180. eCollection 2015.
9
A Population-Specific Major Allele Reference Genome From The United Arab Emirates Population.来自阿拉伯联合酋长国人群的特定人群主要等位基因参考基因组。
Front Genet. 2021 Apr 23;12:660428. doi: 10.3389/fgene.2021.660428. eCollection 2021.
10
Deep sequencing of Danish Holstein dairy cattle for variant detection and insight into potential loss-of-function variants in protein coding genes.对丹麦荷斯坦奶牛进行深度测序,以检测变异并深入了解蛋白质编码基因中潜在的功能丧失变异。
BMC Genomics. 2015 Dec 9;16:1043. doi: 10.1186/s12864-015-2249-y.

引用本文的文献

1
Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses.全人共识基因组显著提高了 RNA-seq 分析的准确性。
Genome Res. 2022 Apr;32(4):738-749. doi: 10.1101/gr.275613.121. Epub 2022 Mar 7.
2
Is it time to change the reference genome?是否到了改变参考基因组的时候了?
Genome Biol. 2019 Aug 9;20(1):159. doi: 10.1186/s13059-019-1774-4.
3
hg19KIndel: ethnicity normalized human reference genome.hg19KIndel:经过族群标准化的人类参考基因组。

本文引用的文献

1
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
2
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.一个用于注释和预测单核苷酸多态性影响的程序,即SnpEff:黑腹果蝇品系w1118、iso-2、iso-3基因组中的单核苷酸多态性。
Fly (Austin). 2012 Apr-Jun;6(2):80-92. doi: 10.4161/fly.19695.
3
Phased whole-genome genetic risk in a family quartet using a major allele reference sequence.
BMC Genomics. 2019 Jun 6;20(1):459. doi: 10.1186/s12864-019-5854-3.
4
FORGe: prioritizing variants for graph genomes.FORGe:对图基因组中的变体进行优先级排序。
Genome Biol. 2018 Dec 17;19(1):220. doi: 10.1186/s13059-018-1595-x.
采用主要等位基因参考序列对一个家系四重奏进行分阶段全基因组遗传风险评估。
PLoS Genet. 2011 Sep;7(9):e1002280. doi: 10.1371/journal.pgen.1002280. Epub 2011 Sep 15.
4
The sequence of the human genome.人类基因组序列。
Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040.