用于在宏基因组序列数据中检测病毒的轮廓隐马尔可夫模型。

Profile hidden Markov models for the detection of viruses within metagenomic sequence data.

作者信息

Skewes-Cox Peter, Sharpton Thomas J, Pollard Katherine S, DeRisi Joseph L

机构信息

Biological and Medical Informatics Graduate Program, University of California San Francisco, San Francisco, California, United States of America; Departments of Medicine, Biochemistry and Biophysics, and Microbiology, University of California San Francisco, San Francisco, California, United States of America; Howard Hughes Medical Institute, Bethesda, Maryland, United States of America.

The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America.

出版信息

PLoS One. 2014 Aug 20;9(8):e105067. doi: 10.1371/journal.pone.0105067. eCollection 2014.

DOI:10.1371/journal.pone.0105067

PMID:25140992

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4139300/

Abstract

Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs ("vFams") to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (http://derisilab.ucsf.edu/software/vFam).

摘要

快速、灵敏且特异的病毒检测是临床诊断的重要组成部分。大规模平行测序带来了新的诊断机会，可补充传统的血清学和基于PCR的技术。虽然大规模平行测序有望比传统方法更全面且偏差更小，但它也带来了新的分析挑战，特别是在宏基因组背景下检测病原体序列方面。初步估计，病毒的初始检测可通过使用诸如BLAST等工具将序列读数或组装的重叠群与病原体基因组参考数据库进行比对来简单实现。然而，识别高度分化的病毒序列存在问题，并且可能因某些病毒类型（尤其是RNA病毒）固有的高突变率而进一步复杂化。在这些情况下，可通过在比对过程中利用位置特异性信息来提高灵敏度。在此，我们使用定制的生物信息学管道以自动化方式从RefSeq中所有经过病毒注释的蛋白质构建了与HMMER3兼容的轮廓隐马尔可夫模型（轮廓HMM）。然后，我们测试了这些病毒轮廓HMM（“vFams”）将序列准确分类为病毒或非病毒的能力。对全长基因序列进行的交叉验证实验表明，vFams能够召回91%被遗漏的病毒测试序列，且不会将任何非病毒序列错误分类到病毒蛋白簇中。用一组性能最佳的vFams对先前发表的宏基因组数据集进行全面重新分析表明，它们在检测源自已知病毒较远亲属的序列方面比BLAST更灵敏。为便于使用vFams在宏基因组数据中快速检测远距离病毒同源物，我们以HMMER3格式提供了两组vFams，每组包含4000多个vFams。我们还提供了构建定制轮廓HMM或随着发现更多病毒更新vFams所需的软件（http://derisilab.ucsf.edu/software/vFam）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc0/4139300/84275921540c/pone.0105067.g001.jpg

相似文献

Profile hidden Markov models for the detection of viruses within metagenomic sequence data.用于在宏基因组序列数据中检测病毒的轮廓隐马尔可夫模型。

PLoS One. 2014 Aug 20;9(8):e105067. doi: 10.1371/journal.pone.0105067. eCollection 2014.

Extension of the viral ecology in humans using viral profile hidden Markov models.利用病毒特征隐藏马尔可夫模型扩展人类病毒生态学研究

PLoS One. 2018 Jan 19;13(1):e0190938. doi: 10.1371/journal.pone.0190938. eCollection 2018.

Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut.比较不同的组装和注释工具在分析肠道中模拟病毒宏基因组群落中的应用。

BMC Genomics. 2014 Jan 18;15:37. doi: 10.1186/1471-2164-15-37.

Utilizing profile hidden Markov model databases for discovering viruses from metagenomic data: a comprehensive review.利用轮廓隐马尔可夫模型数据库从宏基因组数据中发现病毒：全面综述。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae292.

ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe：用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。

Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.

Cataloguing the taxonomic origins of sequences from a heterogeneous sample using phylogenomics: applications in adventitious agent detection.利用系统发育基因组学对异质样本中序列的分类学起源进行编目：在检测外来因子中的应用。

PDA J Pharm Sci Technol. 2014 Nov-Dec;68(6):602-18. doi: 10.5731/pdajpst.2014.01023.

RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data.RdRp-scan：一种用于在宏基因组序列数据中识别和注释不同RNA病毒的生物信息学资源。

Virus Evol. 2022 Sep 1;8(2):veac082. doi: 10.1093/ve/veac082. eCollection 2022.

Rational Design of Profile Hidden Markov Models for Viral Classification and Discovery用于病毒分类与发现的轮廓隐马尔可夫模型的合理设计

-mer-Based Metagenomics Tools Provide a Fast and Sensitive Approach for the Detection of Viral Contaminants in Biopharmaceutical and Vaccine Manufacturing Applications Using Next-Generation Sequencing.基于宏基因组学的工具采用下一代测序技术，为生物制药和疫苗生产应用中病毒污染物的检测提供了一种快速、灵敏的方法。

mSphere. 2021 Apr 21;6(2):e01336-20. doi: 10.1128/mSphere.01336-20.

Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

引用本文的文献

The dynamic genomes of Hydra and the anciently active repeat complement of animal chromosomes.水螅的动态基因组与动物染色体古老活跃的重复序列互补

Genome Biol. 2025 Jul 1;26(1):186. doi: 10.1186/s13059-025-03653-z.

Identification and characterization of novel CRESS-DNA viruses in the human respiratory tract.人类呼吸道中新型CRESS-DNA病毒的鉴定与特征分析

Virol J. 2025 Jun 30;22(1):211. doi: 10.1186/s12985-025-02742-6.

Novel Viral Sequences in a Patient with Cryptogenic Liver Cirrhosis Revealed by Serum Virome Sequencing.血清病毒组测序揭示隐源性肝硬化患者中的新型病毒序列

Viruses. 2025 Jun 3;17(6):812. doi: 10.3390/v17060812.

A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.

Evaluation of Enrichment Approaches for the Study of the Viromes in Mollusk Species.软体动物物种病毒组研究中富集方法的评估

Food Environ Virol. 2025 Jan 12;17(1):18. doi: 10.1007/s12560-024-09625-z.

Three Novel Spider Genomes Unveil Spidroin Diversification and Hox Cluster Architecture: Ryuthela nishihirai (Liphistiidae), Uloborus plumipes (Uloboridae) and Cheiracanthium punctorium (Cheiracanthiidae).三个新的蜘蛛基因组揭示了蜘蛛丝蛋白的多样性和Hox基因簇结构：西平隆突蛛（地蛛科）、栉足蛛（栉足蛛科）和斑螯蛛（球蛛科）。

Mol Ecol Resour. 2025 Jan;25(1):e14038. doi: 10.1111/1755-0998.14038. Epub 2024 Oct 22.

Unveiling the Virome of Wild Birds: Exploring CRESS-DNA Viral Dark Matter.揭示野生鸟类的病毒组：探索 CRESS-DNA 病毒暗物质。

Genome Biol Evol. 2024 Oct 9;16(10). doi: 10.1093/gbe/evae206.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae292.

Hecatomb: an integrated software platform for viral metagenomics.Hecatomb：病毒宏基因组学的集成软件平台。

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae020.

Differences between the intestinal microbial communities of healthy dogs from plateau and those of plateau dogs infected with Echinococcus.高原地区健康犬与高原感染细粒棘球蚴犬的肠道微生物群落差异。

Virol J. 2024 May 23;21(1):116. doi: 10.1186/s12985-024-02364-4.

本文引用的文献

PRICE: software for the targeted assembly of components of (Meta) genomic sequence data.PRICE：用于（元）基因组序列数据的组件靶向组装的软件。

G3 (Bethesda). 2013 May 20;3(5):865-80. doi: 10.1534/g3.113.005967.

Next-generation sequencing technology in clinical virology.临床病毒学中的下一代测序技术。

Clin Microbiol Infect. 2013 Jan;19(1):15-22. doi: 10.1111/1469-0691.12056.

Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource.通过迭代序列聚类筛选基因组，可产生大量具有系统发育多样性的蛋白质家族资源。

BMC Bioinformatics. 2012 Oct 13;13:264. doi: 10.1186/1471-2105-13-264.

Identification, characterization, and in vitro culture of highly divergent arenaviruses from boa constrictors and annulated tree boas: candidate etiological agents for snake inclusion body disease.从蟒蛇和环纹蟒中鉴定、表征和体外培养高度分化的沙粒病毒：蛇包涵体病的候选病因。

mBio. 2012 Aug 14;3(4):e00180-12. doi: 10.1128/mBio.00180-12. Print 2012.

Application of next-generation sequencing technologies in virology.下一代测序技术在病毒学中的应用。

J Gen Virol. 2012 Sep;93(Pt 9):1853-1868. doi: 10.1099/vir.0.043182-0. Epub 2012 May 30.

Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

Virus identification in unknown tropical febrile illness cases using deep sequencing.利用深度测序技术鉴定不明热带发热病例中的病毒。

PLoS Negl Trop Dis. 2012;6(2):e1485. doi: 10.1371/journal.pntd.0001485. Epub 2012 Feb 7.

Temporal analysis of the honey bee microbiome reveals four novel viruses and seasonal prevalence of known viruses, Nosema, and Crithidia.对蜜蜂微生物组的时间分析揭示了四种新病毒以及已知病毒、微孢子虫和克里蒂迪亚的季节性流行情况。

PLoS One. 2011;6(6):e20656. doi: 10.1371/journal.pone.0020656. Epub 2011 Jun 7.

Mimivirus shows dramatic genome reduction after intraamoebal culture.Mimivirus 在胞内黏菌培养后表现出显著的基因组缩减。

Proc Natl Acad Sci U S A. 2011 Jun 21;108(25):10296-301. doi: 10.1073/pnas.1101118108. Epub 2011 Jun 6.

HMMER web server: interactive sequence similarity searching.HMMER 网页服务器：交互式序列相似性搜索。

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于在宏基因组序列数据中检测病毒的轮廓隐马尔可夫模型。

Profile hidden Markov models for the detection of viruses within metagenomic sequence data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献