由于低覆盖度测序导致的基因型错误会给多基因评分带来不确定性。

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.

机构信息

Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA.

Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.

出版信息

Am J Hum Genet. 2023 Aug 3;110(8):1319-1329. doi: 10.1016/j.ajhg.2023.06.015. Epub 2023 Jul 24.

DOI:10.1016/j.ajhg.2023.06.015

PMID:37490908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10432141/

Abstract

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.

摘要

多基因评分 (PGS) 已经成为一种从基因型数据中预测表型的标准方法，在从社会基因组学到个性化医学的广泛应用中都有使用。传统的 PGS 假设基因型数据是无错误的，忽略了从基因分型、测序和/或插补引入的可能错误和不确定性。在这项工作中，我们研究了由于测序深度低导致的基因分型错误对 PGS 估计的影响。我们利用来自 Dana-Farber PROFILE 队列的 802 个人的 SNP 阵列和低覆盖全基因组测序数据（lcWGS，中位数覆盖度为 0.04×），表明 PGS 错误与测序深度相关（p = 1.2×10）。我们开发了一种概率方法，将基因型错误纳入 PGS 估计中，以产生校准良好的 PGS 置信区间，并表明与忽略基因分型错误的传统 PGS 相比，概率方法最多可将分类准确性提高 6%。最后，我们使用模拟来探索基因分型和效应大小错误的综合影响及其对基于 PGS 的风险分层的影响。我们的结果说明了考虑基因分型错误作为 PGS 错误的来源的重要性，特别是对于具有不同基因分型技术和/或低覆盖测序的队列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/550b/10432141/a816614b2f60/fx1.jpg

相似文献

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.由于低覆盖度测序导致的基因型错误会给多基因评分带来不确定性。

Am J Hum Genet. 2023 Aug 3;110(8):1319-1329. doi: 10.1016/j.ajhg.2023.06.015. Epub 2023 Jul 24.

Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores.低覆盖度全基因组测序可实现常见变异的精确评估和全基因组多基因评分的计算。

Genome Med. 2019 Nov 26;11(1):74. doi: 10.1186/s13073-019-0682-2.

A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations.多基因评分与人类 SNP 芯片在不同人群中的基因型推断性能的综合评估。

Sci Rep. 2022 Oct 20;12(1):17556. doi: 10.1038/s41598-022-22215-y.

A beginner's guide to low-coverage whole genome sequencing for population genomics.人群基因组学低覆盖度全基因组测序入门指南。

Mol Ecol. 2021 Dec;30(23):5966-5993. doi: 10.1111/mec.16077. Epub 2021 Aug 31.

Imputation strategies for genomic prediction using nanopore sequencing.利用纳米孔测序进行基因组预测的插补策略。

BMC Biol. 2023 Dec 8;21(1):286. doi: 10.1186/s12915-023-01782-0.

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.评估低深度简化基因组测序（GBS）数据的插补算法

PLoS One. 2016 Aug 18;11(8):e0160733. doi: 10.1371/journal.pone.0160733. eCollection 2016.

An evaluation of sequencing coverage and genotyping strategies to assess neutral and adaptive diversity.评估测序覆盖度和基因分型策略，以评估中性和适应性多样性。

Mol Ecol Resour. 2019 Nov;19(6):1497-1515. doi: 10.1111/1755-0998.13070. Epub 2019 Sep 9.

Genomic prediction using low-coverage portable Nanopore sequencing.利用低覆盖度便携式纳米孔测序进行基因组预测。

PLoS One. 2021 Dec 15;16(12):e0261274. doi: 10.1371/journal.pone.0261274. eCollection 2021.

Low-pass sequencing increases the power of GWAS and decreases measurement error of polygenic risk scores compared to genotyping arrays.与基因分型芯片相比，低通测序可提高 GWAS 的功效，并降低多基因风险评分的测量误差。

Genome Res. 2021 Apr;31(4):529-537. doi: 10.1101/gr.266486.120. Epub 2021 Feb 3.

Comparing low-pass sequencing and genotyping for trait mapping in pharmacogenetics.比较低通量测序和基因分型用于药物遗传学中的性状定位

BMC Genomics. 2021 Mar 20;22(1):197. doi: 10.1186/s12864-021-07508-2.

引用本文的文献

Beyond predictive R: Quantile regression and non-equivalence tests reveal complex relationships of traits and polygenic scores.超越预测性R：分位数回归和非等效性检验揭示了性状与多基因评分之间的复杂关系。

Am J Hum Genet. 2025 Jun 5;112(6):1363-1375. doi: 10.1016/j.ajhg.2025.04.013.

Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients.针对高度选择性的重症COVID-19患者队列进行低覆盖度全基因组测序。

GigaByte. 2024 Jun 20;2024:gigabyte127. doi: 10.46471/gigabyte.127. eCollection 2024.

本文引用的文献

Polygenic scores in biomedical research.多基因评分在生物医学研究中的应用。

Nat Rev Genet. 2022 Sep;23(9):524-532. doi: 10.1038/s41576-022-00470-z. Epub 2022 Mar 30.

Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort.245 个多基因评分在英国生物样本库中得出并应用于来自同一队列的 9 个祖先群体时的可转移性。

Am J Hum Genet. 2022 Jan 6;109(1):12-23. doi: 10.1016/j.ajhg.2021.11.008.

Large uncertainty in individual polygenic risk score estimation impacts PRS-based risk stratification.个体多基因风险评分估计的不确定性较大，影响基于 PRS 的风险分层。

Nat Genet. 2022 Jan;54(1):30-39. doi: 10.1038/s41588-021-00961-5. Epub 2021 Dec 20.

Constructing germline research cohorts from the discarded reads of clinical tumor sequences.从临床肿瘤序列的丢弃读取中构建种系研究队列。

Genome Med. 2021 Nov 8;13(1):179. doi: 10.1186/s13073-021-00999-4.

Risk of Breast Cancer Among Carriers of Pathogenic Variants in Breast Cancer Predisposition Genes Varies by Polygenic Risk Score.携带乳腺癌易感基因致病性变异的个体患乳腺癌的风险因多基因风险评分而异。

J Clin Oncol. 2021 Aug 10;39(23):2564-2573. doi: 10.1200/JCO.20.01992. Epub 2021 Jun 8.

The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation.多基因风险评分目录作为一个开放的数据库，用于可重复性和系统评估。

Nat Genet. 2021 Apr;53(4):420-425. doi: 10.1038/s41588-021-00783-5.

Genome Res. 2021 Apr;31(4):529-537. doi: 10.1101/gr.266486.120. Epub 2021 Feb 3.

LDpred2: better, faster, stronger.LDpred2：更优、更快、更强。

Bioinformatics. 2021 Apr 1;36(22-23):5424-5431. doi: 10.1093/bioinformatics/btaa1029.

Genome Med. 2019 Nov 26;11(1):74. doi: 10.1186/s13073-019-0682-2.

Towards clinical utility of polygenic risk scores.迈向多基因风险评分的临床应用。

Hum Mol Genet. 2019 Nov 21;28(R2):R133-R142. doi: 10.1093/hmg/ddz187.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

由于低覆盖度测序导致的基因型错误会给多基因评分带来不确定性。

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献