• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

inflated expectations:使用公共对照进行罕见变异关联分析。

Inflated expectations: Rare-variant association analysis using public controls.

机构信息

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America.

Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America.

出版信息

PLoS One. 2023 Jan 25;18(1):e0280951. doi: 10.1371/journal.pone.0280951. eCollection 2023.

DOI:10.1371/journal.pone.0280951
PMID:36696392
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9876209/
Abstract

The use of publicly available sequencing datasets as controls (hereafter, "public controls") in studies of rare variant disease associations has great promise but can increase the risk of false-positive discovery. The specific factors that could contribute to inflated distribution of test statistics have not been systematically examined. Here, we leveraged both public controls, gnomAD v2.1 and several datasets sequenced in our laboratory to systematically investigate factors that could contribute to the false-positive discovery, as measured by λΔ95, a measure to quantify the degree of inflation in statistical significance. Analyses of datasets in this investigation found that 1) the significantly inflated distribution of test statistics decreased substantially when the same variant caller and filtering pipelines were employed, 2) differences in library prep kits and sequencers did not affect the false-positive discovery rate and, 3) joint vs. separate variant-calling of cases and controls did not contribute to the inflation of test statistics. Currently available methods do not adequately adjust for the high false-positive discovery. These results, especially if replicated, emphasize the risks of using public controls for rare-variant association tests in which individual-level data and the computational pipeline are not readily accessible, which prevents the use of the same variant-calling and filtering pipelines on both cases and controls. A plausible solution exists with the emergence of cloud-based computing, which can make it possible to bring containerized analytical pipelines to the data (rather than the data to the pipeline) and could avert or minimize these issues. It is suggested that future reports account for this issue and provide this as a limitation in reporting new findings based on studies that cannot practically analyze all data on a single pipeline.

摘要

利用公开可用的测序数据集作为对照(以下简称“公共对照”)来研究罕见变异疾病关联具有很大的潜力,但也会增加假阳性发现的风险。导致检验统计量分布膨胀的具体因素尚未系统地进行检查。在这里,我们利用公共对照(gnomAD v2.1 及我们实验室测序的几个数据集),系统地研究了可能导致假阳性发现的因素,这一因素通过 λΔ95 来衡量,λΔ95 是一种量化统计显著性膨胀程度的指标。对本研究中数据集的分析发现:1)当使用相同的变异caller 和过滤管道时,检验统计量的显著膨胀分布显著减少;2)文库制备试剂盒和测序仪的差异不影响假阳性发现率;3)病例和对照的联合或单独变异calling 不会导致检验统计量的膨胀。目前可用的方法并不能充分调整高假阳性发现的问题。如果这些结果得到复制,它们将特别强调在无法获得个体水平数据和计算管道的情况下,使用公共对照进行罕见变异关联测试的风险,这使得无法在病例和对照上使用相同的变异calling 和过滤管道。云计算的出现提供了一个可行的解决方案,它可以使将容器化的分析管道带到数据(而不是将数据带到管道)成为可能,并可以避免或最小化这些问题。建议未来的报告考虑到这一问题,并将其作为无法在单一管道上实际分析所有数据的研究报告新发现的一个局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/8b4279c8a03d/pone.0280951.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/3d0c9a686c6a/pone.0280951.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/5a1696f64c37/pone.0280951.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/bbc70ea77934/pone.0280951.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/8b4279c8a03d/pone.0280951.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/3d0c9a686c6a/pone.0280951.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/5a1696f64c37/pone.0280951.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/bbc70ea77934/pone.0280951.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c808/9876209/8b4279c8a03d/pone.0280951.g004.jpg

相似文献

1
Inflated expectations: Rare-variant association analysis using public controls. inflated expectations:使用公共对照进行罕见变异关联分析。
PLoS One. 2023 Jan 25;18(1):e0280951. doi: 10.1371/journal.pone.0280951. eCollection 2023.
2
Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers.跨多种下一代测序仪的种系变异调用管道的系统比较。
Sci Rep. 2019 Jun 27;9(1):9345. doi: 10.1038/s41598-019-45835-3.
3
Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes.2535个人类基因组的基于云的低成本且高度可重复的变异检测
PLoS One. 2015 Jun 25;10(6):e0129277. doi: 10.1371/journal.pone.0129277. eCollection 2015.
4
tarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine.tarSVM:使用支持向量机提高基于微流控PCR的靶向新一代测序得出的变异检测准确性。
BMC Bioinformatics. 2016 Jun 10;17(1):233. doi: 10.1186/s12859-016-1108-4.
5
VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.变异元调用器:用于基于定量、精确性筛选的变异调用流程的自动融合。
BMC Genomics. 2015 Oct 28;16:875. doi: 10.1186/s12864-015-2050-y.
6
Read trimming has minimal effect on bacterial SNP-calling accuracy.reads 修剪对细菌 SNP 调用准确性的影响最小。
Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000434. Epub 2020 Dec 11.
7
An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome.利用来自小型真核生物基因组的模拟读数对单核苷酸多态性假阳性原因的调查。
BMC Bioinformatics. 2015 Nov 11;16:382. doi: 10.1186/s12859-015-0801-z.
8
Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels.用于分析NGS种系基因检测板的映射、变异位点检测和区间填充流程的性能评估。
BMC Bioinformatics. 2021 Apr 28;22(1):218. doi: 10.1186/s12859-021-04144-1.
9
Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.利用基因型阵列数据比较多样本和单样本变异检测结果,并改进来自深度覆盖全基因组测序数据的变异检测集。
Bioinformatics. 2017 Apr 15;33(8):1147-1153. doi: 10.1093/bioinformatics/btw786.
10
Improving power for rare-variant tests by integrating external controls.通过整合外部对照提高罕见变异检测的效能。
Genet Epidemiol. 2017 Nov;41(7):610-619. doi: 10.1002/gepi.22057. Epub 2017 Jun 28.

引用本文的文献

1
Increase in power by obtaining 10 or more controls per case when type-1 error is small in large-scale association studies.在大规模关联研究中,当Ⅰ类错误较小时,通过每例获得 10 个或更多对照来增加功效。
BMC Med Res Methodol. 2023 Jun 29;23(1):153. doi: 10.1186/s12874-023-01973-x.

本文引用的文献

1
Opportunities and challenges for the use of common controls in sequencing studies.测序研究中使用常见对照的机遇和挑战。
Nat Rev Genet. 2022 Nov;23(11):665-679. doi: 10.1038/s41576-022-00487-4. Epub 2022 May 17.
2
Variant interpretation using population databases: Lessons from gnomAD.使用人群数据库进行变异解释:来自 gnomAD 的经验。
Hum Mutat. 2022 Aug;43(8):1012-1030. doi: 10.1002/humu.24309. Epub 2021 Dec 16.
3
Frequency of Pathogenic Germline Variants in Cancer-Susceptibility Genes in the Childhood Cancer Survivor Study.
癌症幸存者研究中癌症易感性基因种系致病性变异的频率。
JNCI Cancer Spectr. 2021 Jan 23;5(2). doi: 10.1093/jncics/pkab007. eCollection 2021 Apr.
4
The mutational constraint spectrum quantified from variation in 141,456 humans.从 141456 名人类个体的变异中量化的突变约束谱。
Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.
5
Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results.结合多项研究的序列数据:分析策略对罕见变异调用和关联结果的影响。
Genet Epidemiol. 2020 Jan;44(1):41-51. doi: 10.1002/gepi.22261. Epub 2019 Sep 14.
6
ProxECAT: Proxy External Controls Association Test. A new case-control gene region association test using allele frequencies from public controls.ProxECAT:代理外部控制协会测试。一种新的病例对照基因区域关联测试,使用公共对照的等位基因频率。
PLoS Genet. 2018 Oct 16;14(10):e1007591. doi: 10.1371/journal.pgen.1007591. eCollection 2018 Oct.
7
Burden Testing of Rare Variants Identified through Exome Sequencing via Publicly Available Control Data.基于公共对照数据对全外显子测序发现的罕见变异进行负担测试。
Am J Hum Genet. 2018 Oct 4;103(4):522-534. doi: 10.1016/j.ajhg.2018.08.016. Epub 2018 Sep 27.
8
Improving power for rare-variant tests by integrating external controls.通过整合外部对照提高罕见变异检测的效能。
Genet Epidemiol. 2017 Nov;41(7):610-619. doi: 10.1002/gepi.22057. Epub 2017 Jun 28.
9
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
10
The American Cancer Society Cancer Prevention Study II Nutrition Cohort: rationale, study design, and baseline characteristics.美国癌症协会癌症预防研究II营养队列:基本原理、研究设计及基线特征
Cancer. 2002 May 1;94(9):2490-501. doi: 10.1002/cncr.101970.