Suppr超能文献

从基于人群的全外显子组测序研究中提高数据质量的有效筛选策略。

Effective filtering strategies to improve data quality from population-based whole exome sequencing studies.

机构信息

Department of Pediatrics and Rady Children's Hospital, University of California San Diego, San Diego, USA.

出版信息

BMC Bioinformatics. 2014 May 2;15:125. doi: 10.1186/1471-2105-15-125.

Abstract

BACKGROUND

Genotypes generated in next generation sequencing studies contain errors which can significantly impact the power to detect signals in common and rare variant association tests. These genotyping errors are not explicitly filtered by the standard GATK Variant Quality Score Recalibration (VQSR) tool and thus remain a source of errors in whole exome sequencing (WES) projects that follow GATK's recommended best practices. Therefore, additional data filtering methods are required to effectively remove these errors before performing association analyses with complex phenotypes. Here we empirically derive thresholds for genotype and variant filters that, when used in conjunction with the VQSR tool, achieve higher data quality than when using VQSR alone.

RESULTS

The detailed filtering strategies improve the concordance of sequenced genotypes with array genotypes from 99.33% to 99.77%; improve the percent of discordant genotypes removed from 10.5% to 69.5%; and improve the Ti/Tv ratio from 2.63 to 2.75. We also demonstrate that managing batch effects by separating samples based on different target capture and sequencing chemistry protocols results in a final data set containing 40.9% more high-quality variants. In addition, imputation is an important component of WES studies and is used to estimate common variant genotypes to generate additional markers for association analyses. As such, we demonstrate filtering methods for imputed data that improve genotype concordance from 79.3% to 99.8% while removing 99.5% of discordant genotypes.

CONCLUSIONS

The described filtering methods are advantageous for large population-based WES studies designed to identify common and rare variation associated with complex diseases. Compared to data processed through standard practices, these strategies result in substantially higher quality data for common and rare association analyses.

摘要

背景

下一代测序研究中产生的基因型存在错误,这些错误会显著影响常见和罕见变异关联测试中信号的检测能力。这些基因分型错误并未被标准 GATK 变异质量评分重新校准(VQSR)工具明确过滤,因此仍然是遵循 GATK 推荐最佳实践的外显子组测序(WES)项目中的错误源。因此,在进行复杂表型的关联分析之前,需要额外的数据过滤方法来有效地去除这些错误。在这里,我们通过经验得出了基因型和变体过滤器的阈值,当与 VQSR 工具一起使用时,与单独使用 VQSR 相比,可以实现更高的数据质量。

结果

详细的过滤策略将测序基因型与阵列基因型的一致性从 99.33%提高到 99.77%;将去除的不一致基因型比例从 10.5%提高到 69.5%;将 Ti/Tv 比值从 2.63 提高到 2.75。我们还证明,通过根据不同的靶向捕获和测序化学协议将样本分开来管理批次效应,最终数据集包含 40.9%更多的高质量变体。此外,插补是 WES 研究的重要组成部分,用于估计常见变体基因型,以生成额外的标记进行关联分析。因此,我们展示了用于插补数据的过滤方法,这些方法可以将基因型一致性从 79.3%提高到 99.8%,同时去除 99.5%的不一致基因型。

结论

所描述的过滤方法对于旨在识别与复杂疾病相关的常见和罕见变异的大型基于人群的 WES 研究是有利的。与通过标准实践处理的数据相比,这些策略可显著提高常见和罕见关联分析的数据质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3aa1/4098776/0b2754807a48/1471-2105-15-125-1.jpg

相似文献

2
Ionizing Radiation Alters the Transition/Transversion Ratio in the Exome of Human Gingiva Fibroblasts.
Health Phys. 2020 Jul;119(1):109-117. doi: 10.1097/HP.0000000000001251.
4
Impact of variant-level batch effects on identification of genetic risk factors in large sequencing studies.
PLoS One. 2021 Apr 16;16(4):e0249305. doi: 10.1371/journal.pone.0249305. eCollection 2021.
5
An efficient and tunable parameter to improve variant calling for whole genome and exome sequencing data.
Genes Genomics. 2018 Jan;40(1):39-47. doi: 10.1007/s13258-017-0608-6. Epub 2017 Aug 29.
6
Improved variant calling accuracy by merging replicates in whole-exome sequencing studies.
Biomed Res Int. 2014;2014:319534. doi: 10.1155/2014/319534. Epub 2014 Aug 4.

引用本文的文献

1
Genetic variation among progeny shapes symbiosis in a basidiomycete with poplar.
New Phytol. 2025 Oct;248(1):157-177. doi: 10.1111/nph.70395. Epub 2025 Aug 7.
4
Are rare heterozygous SYNJ1 variants associated with Parkinson's disease?
NPJ Parkinsons Dis. 2024 Oct 25;10(1):201. doi: 10.1038/s41531-024-00809-9.
5
Resilience of genetic diversity in forest trees over the Quaternary.
Nat Commun. 2024 Oct 14;15(1):8538. doi: 10.1038/s41467-024-52612-y.
7
Lack of genetic evidence for NLRP3 inflammasome involvement in Parkinson's disease pathogenesis.
NPJ Parkinsons Dis. 2024 Aug 5;10(1):145. doi: 10.1038/s41531-024-00744-9.
10
Missing genotype imputation in non-model species using self-organizing maps.
Mol Ecol Resour. 2025 Apr;25(3):e13992. doi: 10.1111/1755-0998.13992. Epub 2024 Jul 6.

本文引用的文献

1
Variant callers for next-generation sequencing data: a comparison study.
PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.
3
Identifying rare variants associated with complex traits via sequencing.
Curr Protoc Hum Genet. 2013 Jul;Chapter 1:Unit 1.26. doi: 10.1002/0471142905.hg0126s78.
6
Assessing the impact of differential genotyping errors on rare variant tests of association.
PLoS One. 2013;8(3):e56626. doi: 10.1371/journal.pone.0056626. Epub 2013 Mar 5.
7
Assessment of genotype imputation performance using 1000 Genomes in African American studies.
PLoS One. 2012;7(11):e50610. doi: 10.1371/journal.pone.0050610. Epub 2012 Nov 30.
9
Identification of a novel mutation in the CDHR1 gene in a family with recessive retinal degeneration.
Arch Ophthalmol. 2012 Oct;130(10):1301-8. doi: 10.1001/archophthalmol.2012.1906.
10
Comprehensive molecular portraits of human breast tumours.
Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验