用于大规模基因组数据集中高度分化病毒的基因型和亚型检测的有效引物设计。

Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.

作者信息

Demiralay Burak, Can Tolga

机构信息

Department of Health Informatics, Informatics Institute, Middle East Technical University, Dumlupınar Bulvarı No 1, 06800, Çankaya, Ankara, Turkey.

Department of Computer Science, Colorado School of Mines, 1501 Illionis St, Golden, 80401, CO, USA.

出版信息

BMC Bioinformatics. 2025 Sep 1;26(1):223. doi: 10.1186/s12859-025-06251-9.

DOI:10.1186/s12859-025-06251-9

PMID:40890622

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12400757/

Abstract

Identification of microorganisms in a biological sample is a crucial step in diagnostics, pathogen screening, biomedical research, evolutionary studies, agriculture, and biological threat assessment. While progress has been made in studying larger organisms, there is a need for an efficient and scalable method that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity such as single stranded viruses. In this study, we developed a novel method to identify subsequences for detection of a given species/subspecies in a (meta)genomic sample using the Polymerase Chain Reaction (PCR) method. Species detection in any analysis depends highly on the measurement method and since thermodynamic interactions are critical in PCR, thermodynamics is the main driving force in the proposed methodology. Our method is parallelized in multiple steps and involves extracting all oligonucleotides from target genomes. We then locate the target sites for each oligonucleotide using the constructed suffix array and local alignment followed by thermodynamic interaction assessment. An important requirement for subspecies identification is to avoid amplifying a non-target set of genomes and our method addresses this. We applied our method to three highly divergent viruses; (1) Hepatitis C virus (HCV), where the subtypes differ in 31-33% of nucleotide sites on average, (2) Human immunodeficiency virus (HIV), for which, 25-35% between-subtype and 15-20% within-subtype variation is observed, and (3) the Dengue virus, whose respective genomes (only DENV 1-4) share 60% sequence identity to each other. Using our method, we were able to select oligonucleotides that can identify in silico 99.9% of 1657 HCV genomes, 99.7% of 11,838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with more than 99.5% true positive and less than 0.05% false positive rate, on average. None of the state-of-the-art methods can produce oligonucleotides with this specificity and sensitivity on highly divergent viral genomes like the ones studied in this article.

摘要

在生物样本中鉴定微生物是诊断、病原体筛查、生物医学研究、进化研究、农业和生物威胁评估中的关键步骤。虽然在研究较大生物体方面已取得进展，但对于一种高效且可扩展的方法仍有需求，该方法能够处理数千个具有高突变率和遗传多样性的生物体的全基因组，如单链病毒。在本研究中，我们开发了一种新方法，使用聚合酶链反应（PCR）方法在（宏）基因组样本中鉴定用于检测给定物种/亚种的子序列。在任何分析中，物种检测高度依赖于测量方法，并且由于热力学相互作用在PCR中至关重要，所以热力学是所提出方法的主要驱动力。我们的方法在多个步骤中并行化，包括从目标基因组中提取所有寡核苷酸。然后，我们使用构建的后缀数组和局部比对来定位每个寡核苷酸的目标位点，随后进行热力学相互作用评估。亚种鉴定的一个重要要求是避免扩增非目标基因组集，我们的方法解决了这个问题。我们将我们的方法应用于三种高度分化的病毒：（1）丙型肝炎病毒（HCV），其亚型平均在31 - 33%的核苷酸位点上存在差异；（2）人类免疫缺陷病毒（HIV），观察到其亚型间差异为25 - 35%，亚型内差异为15 - 20%；（3）登革病毒，其各自的基因组（仅登革病毒1 - 4型）彼此间具有60%的序列同一性。使用我们的方法，我们能够选择出在计算机模拟中可鉴定1657个HCV基因组中的99.9%、11838个HIV基因组中的99.7%以及401个登革病毒基因组中的95.4%的寡核苷酸。我们还展示了对HCV的1 - 6基因型和登革病毒的1 - 4基因型进行亚种鉴定，平均真阳性率超过99.5%，假阳性率低于0.05%。目前的任何先进方法都无法在像本文所研究的这种高度分化的病毒基因组上产生具有这种特异性和灵敏度的寡核苷酸。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d24/12400757/925ceb59cbc3/12859_2025_6251_Fig1_HTML.jpg

相似文献

Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.用于大规模基因组数据集中高度分化病毒的基因型和亚型检测的有效引物设计。

BMC Bioinformatics. 2025 Sep 1;26(1):223. doi: 10.1186/s12859-025-06251-9.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

The effect of sample site and collection procedure on identification of SARS-CoV-2 infection.样本采集部位和采集程序对严重急性呼吸综合征冠状病毒2（SARS-CoV-2）感染鉴定的影响。

Cochrane Database Syst Rev. 2024 Dec 16;12(12):CD014780. doi: 10.1002/14651858.CD014780.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理：一项网络荟萃分析。

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

Direct-acting antivirals for chronic hepatitis C.用于慢性丙型肝炎的直接作用抗病毒药物。

Cochrane Database Syst Rev. 2017 Sep 18;9(9):CD012143. doi: 10.1002/14651858.CD012143.pub3.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA？一项初步评估。

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Systemic Inflammatory Response Syndrome全身炎症反应综合征

Sexual Harassment and Prevention Training性骚扰与预防培训

本文引用的文献

Rational Primer and Probe Construction in PCR-Based Assays for the Efficient Diagnosis of Drifting Variants of SARS-CoV-2.基于PCR的检测方法中用于高效诊断SARS-CoV-2漂移变体的合理引物和探针构建

Adv Virol. 2022 May 13;2022:2965666. doi: 10.1155/2022/2965666. eCollection 2022.

Designing sensitive viral diagnostics with machine learning.利用机器学习设计灵敏的病毒诊断方法。

Nat Biotechnol. 2022 Jul;40(7):1123-1131. doi: 10.1038/s41587-022-01213-5. Epub 2022 Mar 3.

Improved sensitivity using a dual target, E and RdRp assay for the diagnosis of SARS-CoV-2 infection: Experience at a large NHS Foundation Trust in the UK.采用双靶点E和RdRp检测法提高SARS-CoV-2感染诊断的敏感性：英国一家大型国民保健服务基金会信托机构的经验。

J Infect. 2021 Jan;82(1):159-198. doi: 10.1016/j.jinf.2020.05.061. Epub 2020 May 28.

Hepatitis C Virus Translation Regulation.丙型肝炎病毒翻译调控。

Int J Mol Sci. 2020 Mar 27;21(7):2328. doi: 10.3390/ijms21072328.

Development of the Automated Primer Design Workflow Uniqprimer and Diagnostic Primers for the Broad-Host-Range Plant Pathogen .自动化引物设计工作流程 Uniqprimer 的开发及广谱植物病原菌诊断引物

Plant Dis. 2019 Nov;103(11):2893-2902. doi: 10.1094/PDIS-10-18-1819-RE. Epub 2019 Aug 20.

Genome Detective: an automated system for virus identification from high-throughput sequencing data.基因组侦探：一种从高通量测序数据中自动识别病毒的系统。

Bioinformatics. 2019 Mar 1;35(5):871-873. doi: 10.1093/bioinformatics/bty695.

MUMmer4: A fast and versatile genome alignment system.MUMmer4：一种快速且通用的基因组比对系统。

PLoS Comput Biol. 2018 Jan 26;14(1):e1005944. doi: 10.1371/journal.pcbi.1005944. eCollection 2018 Jan.

Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations.海王星：一种用于快速发现细菌群体基因组变异的生物信息学工具。

Nucleic Acids Res. 2017 Oct 13;45(18):e159. doi: 10.1093/nar/gkx702.

Genomics, proteomics and evolution of dengue virus.登革热病毒的基因组学、蛋白质组学和进化。

Brief Funct Genomics. 2017 Jul 1;16(4):217-227. doi: 10.1093/bfgp/elw040.

Hepatitis C virus genetic variability and evolution.丙型肝炎病毒的基因变异性与进化

World J Hepatol. 2015 Apr 28;7(6):831-45. doi: 10.4254/wjh.v7.i6.831.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于大规模基因组数据集中高度分化病毒的基因型和亚型检测的有效引物设计。

Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献