• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于捕获的下一代测序中变异调用准确性的机器学习模型。

A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

机构信息

Color Genomics, 831 Mitten Road, Burlingame, CA, 94010, USA.

出版信息

BMC Genomics. 2018 Apr 17;19(1):263. doi: 10.1186/s12864-018-4659-0.

DOI:10.1186/s12864-018-4659-0
PMID:29665779
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5904977/
Abstract

BACKGROUND

Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation.

RESULTS

We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing.

CONCLUSIONS

This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to differentiate low from high confidence variants. Additionally, it reveals the importance of incorporating site-specific features as well as variant call features in such a model.

摘要

背景

下一代测序(NGS)已成为临床基因检测的常用技术。NGS 调用的质量差异很大,受到参考序列特征、读取深度和映射准确性等因素的影响。随着 NGS 技术和软件工具的最新进展,使用 NGS 单独调用的大多数变体实际上是准确和可靠的。然而,仍然存在一小部分难以调用的变体,仍然需要正交确认。出于这个原因,许多临床实验室使用 Sanger 测序等正交技术来确认 NGS 结果。在这里,我们报告了开发一种基于确定性机器学习的模型的情况,以区分这两种类型的变体调用:不需要使用正交技术确认的那些(高可信度),以及需要额外质量测试的那些(低可信度)。这种方法通过识别需要正交确认的少数重要变体调用,允许在临床环境中进行可靠的基于 NGS 的调用。

结果

我们使用一组由靶向 NGS 面板识别并通过 Sanger 测序重新测试的 7179 个变体开发并测试了该模型。该模型结合了几个序列特征和调用质量信号,以确定变体是被高可信度还是低可信度识别。该模型经过调整以消除假阳性,定义为被 NGS 调用但未被 Sanger 测序确认的变体。该模型达到了非常高的准确性:99.4%(95%置信区间:+/-0.03%)。它将 7179 个变体中的 92.2%(6622/7179)归类为高可信度,并且这些变体中的 100%通过 Sanger 测序被证实存在。在被归类为低可信度的变体中,定义为 NGS 调用质量低且可能是伪影的变体,92.1%(513/557)通过 Sanger 测序被发现不存在。

结论

这项工作表明,NGS 数据包含足够的特征,可让基于机器学习的模型区分低可信度和高可信度变体。此外,它揭示了在这种模型中纳入特定于站点的特征和变体调用特征的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/df71d190bc44/12864_2018_4659_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/b0265f056c67/12864_2018_4659_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/38a53bbf725a/12864_2018_4659_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/6dc2cce22834/12864_2018_4659_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/df71d190bc44/12864_2018_4659_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/b0265f056c67/12864_2018_4659_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/38a53bbf725a/12864_2018_4659_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/6dc2cce22834/12864_2018_4659_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a324/5904977/df71d190bc44/12864_2018_4659_Fig4_HTML.jpg

相似文献

1
A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.基于捕获的下一代测序中变异调用准确性的机器学习模型。
BMC Genomics. 2018 Apr 17;19(1):263. doi: 10.1186/s12864-018-4659-0.
2
Analysis of machine learning algorithms as integrative tools for validation of next generation sequencing data.分析机器学习算法作为下一代测序数据验证的综合工具。
Eur Rev Med Pharmacol Sci. 2019 Sep;23(18):8139-8147. doi: 10.26355/eurrev_201909_19034.
3
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
4
Software-Assisted Manual Review of Clinical Next-Generation Sequencing Data: An Alternative to Routine Sanger Sequencing Confirmation with Equivalent Results in >15,000 Germline DNA Screens.临床二代测序数据的软件辅助人工审核:常规桑格测序确认的替代方法,在超过15,000次种系DNA筛查中结果等效
J Mol Diagn. 2019 Mar;21(2):296-306. doi: 10.1016/j.jmoldx.2018.10.002. Epub 2018 Dec 4.
5
Sanger Confirmation Is Required to Achieve Optimal Sensitivity and Specificity in Next-Generation Sequencing Panel Testing.在新一代测序 panel 检测中,需要进行桑格验证以实现最佳的灵敏度和特异性。
J Mol Diagn. 2016 Nov;18(6):923-932. doi: 10.1016/j.jmoldx.2016.07.006. Epub 2016 Oct 6.
6
tarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine.tarSVM:使用支持向量机提高基于微流控PCR的靶向新一代测序得出的变异检测准确性。
BMC Bioinformatics. 2016 Jun 10;17(1):233. doi: 10.1186/s12859-016-1108-4.
7
Confirming Variants in Next-Generation Sequencing Panel Testing by Sanger Sequencing.通过桑格测序法确认下一代测序基因panel检测中的变异体
J Mol Diagn. 2015 Jul;17(4):456-61. doi: 10.1016/j.jmoldx.2015.03.004. Epub 2015 May 8.
8
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.SNooPer:一种基于机器学习从低深度下一代测序中识别体细胞变异的方法。
BMC Genomics. 2016 Nov 14;17(1):912. doi: 10.1186/s12864-016-3281-2.
9
Systematic Evaluation of Sanger Validation of Next-Generation Sequencing Variants.下一代测序变异的桑格验证的系统评价
Clin Chem. 2016 Apr;62(4):647-54. doi: 10.1373/clinchem.2015.249623. Epub 2016 Feb 4.
10
Sanger Validation of High-Throughput Sequencing in Genetic Diagnosis: Still the Best Practice?桑格法验证高通量测序在基因诊断中的应用:仍是最佳实践吗?
Front Genet. 2020 Dec 2;11:592588. doi: 10.3389/fgene.2020.592588. eCollection 2020.

引用本文的文献

1
Determination of high-confidence germline genetic variants in next-generation sequencing through machine learning models: an approach to reduce the burden of orthogonal confirmation.通过机器学习模型确定下一代测序中的高可信度种系遗传变异:一种减轻正交确认负担的方法。
BMC Genomics. 2025 Aug 6;26(1):728. doi: 10.1186/s12864-025-11889-z.
2
Correlation between variant call accuracy and quality parameters in comprehensive cancer genomic profiling tests.综合癌症基因组分析测试中变异检测准确性与质量参数之间的相关性
Pract Lab Med. 2024 Feb 15;39:e00369. doi: 10.1016/j.plabm.2024.e00369. eCollection 2024 Mar.
3
Artificial intelligence and database for NGS-based diagnosis in rare disease.

本文引用的文献

1
Multi-gene panel testing for hereditary cancer predisposition in unsolved high-risk breast and ovarian cancer patients.对未确诊的高危乳腺癌和卵巢癌患者进行遗传性癌症易感性的多基因检测。
Breast Cancer Res Treat. 2017 Jun;163(2):383-390. doi: 10.1007/s10549-017-4181-0. Epub 2017 Mar 9.
2
Development and validation of a 36-gene sequencing assay for hereditary cancer risk assessment.用于遗传性癌症风险评估的36基因测序检测方法的开发与验证
PeerJ. 2017 Feb 23;5:e3046. doi: 10.7717/peerj.3046. eCollection 2017.
3
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.
基于二代测序的罕见病诊断人工智能与数据库
Front Genet. 2024 Jan 25;14:1258083. doi: 10.3389/fgene.2023.1258083. eCollection 2023.
4
Improving the filtering of false positive single nucleotide variations by combining genomic features with quality metrics.通过将基因组特征与质量指标相结合,提高假阳性单核苷酸变异的过滤效果。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad694.
5
Satellite DNAs and human sex chromosome variation.卫星 DNA 与人类性染色体变异。
Semin Cell Dev Biol. 2022 Aug;128:15-25. doi: 10.1016/j.semcdb.2022.04.022. Epub 2022 May 27.
6
Privacy and ethical challenges in next-generation sequencing.下一代测序中的隐私和伦理挑战。
Expert Rev Precis Med Drug Dev. 2019;4(2):95-104. doi: 10.1080/23808993.2019.1599685. Epub 2019 Apr 8.
7
Prevalence of Inherited Mutations in Breast Cancer Predisposition Genes among Women in Uganda and Cameroon.乌干达和喀麦隆女性乳腺癌易感基因遗传突变的流行情况。
Cancer Epidemiol Biomarkers Prev. 2020 Feb;29(2):359-367. doi: 10.1158/1055-9965.EPI-19-0506. Epub 2019 Dec 23.
8
Methods for Identifying Patients with Tropomyosin Receptor Kinase (TRK) Fusion Cancer.鉴定具有原肌球蛋白受体激酶(TRK)融合癌的患者的方法。
Pathol Oncol Res. 2020 Jul;26(3):1385-1399. doi: 10.1007/s12253-019-00685-2. Epub 2019 Jun 29.
9
A Rigorous Interlaboratory Examination of the Need to Confirm Next-Generation Sequencing-Detected Variants with an Orthogonal Method in Clinical Genetic Testing.临床基因检测中采用正交方法确认下一代测序检测到的变异体必要性的严格实验室间检验
J Mol Diagn. 2019 Mar;21(2):318-329. doi: 10.1016/j.jmoldx.2018.10.009. Epub 2019 Jan 3.
SNooPer:一种基于机器学习从低深度下一代测序中识别体细胞变异的方法。
BMC Genomics. 2016 Nov 14;17(1):912. doi: 10.1186/s12864-016-3281-2.
4
Sanger Confirmation Is Required to Achieve Optimal Sensitivity and Specificity in Next-Generation Sequencing Panel Testing.在新一代测序 panel 检测中,需要进行桑格验证以实现最佳的灵敏度和特异性。
J Mol Diagn. 2016 Nov;18(6):923-932. doi: 10.1016/j.jmoldx.2016.07.006. Epub 2016 Oct 6.
5
Novel bioinformatic developments for exome sequencing.外显子组测序的新型生物信息学进展
Hum Genet. 2016 Jun;135(6):603-14. doi: 10.1007/s00439-016-1658-6. Epub 2016 Apr 13.
6
Systematic Evaluation of Sanger Validation of Next-Generation Sequencing Variants.下一代测序变异的桑格验证的系统评价
Clin Chem. 2016 Apr;62(4):647-54. doi: 10.1373/clinchem.2015.249623. Epub 2016 Feb 4.
7
Confirming Variants in Next-Generation Sequencing Panel Testing by Sanger Sequencing.通过桑格测序法确认下一代测序基因panel检测中的变异体
J Mol Diagn. 2015 Jul;17(4):456-61. doi: 10.1016/j.jmoldx.2015.03.004. Epub 2015 May 8.
8
Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.序列变异解读的标准与指南:美国医学遗传学与基因组学学会和分子病理学协会的联合共识推荐
Genet Med. 2015 May;17(5):405-24. doi: 10.1038/gim.2015.30. Epub 2015 Mar 5.
9
From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.从FastQ数据到高可信度变异检测:基因组分析工具包最佳实践流程
Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.
10
The validation and clinical implementation of BRCAplus: a comprehensive high-risk breast cancer diagnostic assay.BRCAplus的验证与临床应用:一种全面的高危乳腺癌诊断检测方法
PLoS One. 2014 May 15;9(5):e97408. doi: 10.1371/journal.pone.0097408. eCollection 2014.