一种从文献中自动提取传染病相关引物和探针的方法。

A method for automatically extracting infectious disease-related primers and probes from the literature.

机构信息

Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain.

出版信息

BMC Bioinformatics. 2010 Aug 3;11:410. doi: 10.1186/1471-2105-11-410.

DOI:10.1186/1471-2105-11-410

PMID:20682041

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2923139/

Abstract

BACKGROUND

Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information.

RESULTS

We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name.

CONCLUSIONS

We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.

摘要

背景

引物和探针序列是基于核酸的检测系统的主要组成部分。生物学家使用引物和探针来完成不同的任务，其中一些与传染病的诊断和处方有关。生物文献是经验证的引物和探针序列的主要信息来源。因此，研究人员越来越需要能够在这些重要信息中进行导航。在本文中，我们提出了一种从文献中提取和注释引物/探针序列的四阶段方法。这些阶段是：（1）将每个文档转换为纸部分的树，（2）使用一组基于有限状态机的识别器检测候选序列，（3）使用基于规则的专家系统精炼有问题的序列，以及（4）用相关的生物体/基因信息对提取的序列进行注释。

结果

我们使用由 297 篇手稿组成的测试集来测试我们的方法。提取的序列及其生物体/基因注释由一组分子生物学家进行了手动评估。评估结果表明，我们的方法适用于自动提取 DNA 序列，分别达到 97.98%和 95.77%的精度/召回率。此外，76.66%的检测到的序列被正确地注释为其生物体名称。对于被分配正确生物体名称的序列中的 46.18%，该系统还提供了正确的基因相关信息。

结论

我们相信，所提出的方法可以为使用分子方法诊断和治疗不同传染病的生物医学研究人员提供常规任务的便利。此外，该方法可以扩展到从文献中检测和提取其他生物序列。提取的信息还可以用于快速更新现有的引物/探针数据库或从头开始创建新的数据库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d6f/2923139/5b9c1d716808/1471-2105-11-410-1.jpg

相似文献

A method for automatically extracting infectious disease-related primers and probes from the literature.

BMC Bioinformatics. 2010 Aug 3;11:410. doi: 10.1186/1471-2105-11-410.

A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

Annu Int Conf IEEE Eng Med Biol Soc. 2010;2010:1081-4. doi: 10.1109/IEMBS.2010.5627316.

Integrated minimum-set primers and unique probe design algorithms for differential detection on symptom-related pathogens.

Bioinformatics. 2005 Dec 15;21(24):4330-7. doi: 10.1093/bioinformatics/bti730. Epub 2005 Oct 25.

SeqState: primer design and sequence statistics for phylogenetic DNA datasets.

Appl Bioinformatics. 2005;4(1):65-9. doi: 10.2165/00822942-200504010-00008.

PCR primers and probes for the 16S rRNA gene of most species of pathogenic bacteria, including bacteria found in cerebrospinal fluid.

J Clin Microbiol. 1994 Feb;32(2):335-51. doi: 10.1128/jcm.32.2.335-351.1994.

Primer and probe sets for group-specific quantification of the genera Nitrosomonas and Nitrosospira using real-time PCR.

Biotechnol Bioeng. 2008 Apr 15;99(6):1374-83. doi: 10.1002/bit.21715.

OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.

Bioinformatics. 2011 Oct 1;27(19):2721-9. doi: 10.1093/bioinformatics/btr452. Epub 2011 Aug 9.

RExPrimer: an integrated primer designing tool increases PCR effectiveness by avoiding 3' SNP-in-primer and mis-priming from structural variation.

BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2164-10-S3-S4.

Group-specific primer and probe sets to detect methanogenic communities using quantitative real-time polymerase chain reaction.

Biotechnol Bioeng. 2005 Mar 20;89(6):670-9. doi: 10.1002/bit.20347.

Detection of Mycobacterium tuberculosis complex with Real Time PCR: comparison of different primer-probe sets based on the IS6110 element.

J Microbiol Methods. 2006 Jul;66(1):177-80. doi: 10.1016/j.mimet.2005.12.003. Epub 2006 Jan 19.

引用本文的文献

MiPRIME: an integrated and intelligent platform for mining primer and probe sequences of microbial species.

Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae429.

Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from the literature.

Biomed Res Int. 2013;2013:410294. doi: 10.1155/2013/410294. Epub 2012 Dec 27.

e-MIR2: a public online inventory of medical informatics resources.

BMC Med Inform Decis Mak. 2012 Aug 2;12:82. doi: 10.1186/1472-6947-12-82.

Annotating genes and genomes with DNA sequences extracted from biomedical articles.

Bioinformatics. 2011 Apr 1;27(7):980-6. doi: 10.1093/bioinformatics/btr043. Epub 2011 Feb 16.

本文引用的文献

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases.

BMC Bioinformatics. 2010 Mar 16;11:132. doi: 10.1186/1471-2105-11-132.

GenBank.

Nucleic Acids Res. 2010 Jan;38(Database issue):D46-51. doi: 10.1093/nar/gkp1024. Epub 2009 Nov 12.

PrimerBank: a resource of human and mouse PCR primer pairs for gene expression detection and quantification.

Nucleic Acids Res. 2010 Jan;38(Database issue):D792-9. doi: 10.1093/nar/gkp1005. Epub 2009 Nov 11.

Basic concepts of microarrays and potential applications in clinical microbiology.

Clin Microbiol Rev. 2009 Oct;22(4):611-33. doi: 10.1128/CMR.00019-09.

BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature.

BMC Bioinformatics. 2009 Oct 7;10:320. doi: 10.1186/1471-2105-10-320.

Generalized lattice graphs for 2D-visualization of biological information.

J Theor Biol. 2009 Nov 7;261(1):136-47. doi: 10.1016/j.jtbi.2009.07.029. Epub 2009 Jul 29.

Recent advances in diagnostic microbiology.

Semin Hematol. 2009 Jul;46(3):248-58. doi: 10.1053/j.seminhematol.2009.03.009.

Then and now: use of 16S rDNA gene sequencing for bacterial identification and discovery of novel bacteria in clinical microbiology laboratories.

Clin Microbiol Infect. 2008 Oct;14(10):908-34. doi: 10.1111/j.1469-0691.2008.02070.x.

Comparison of the Luminex xTAG respiratory viral panel with in-house nucleic acid amplification tests for diagnosis of respiratory virus infections.

J Clin Microbiol. 2008 Sep;46(9):3056-62. doi: 10.1128/JCM.00878-08. Epub 2008 Jul 16.

Twenty-five years of quantitative PCR for gene expression analysis.

Biotechniques. 2008 Apr;44(5):619-26. doi: 10.2144/000112776.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种从文献中自动提取传染病相关引物和探针的方法。

A method for automatically extracting infectious disease-related primers and probes from the literature.

机构信息

Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain.