微生物表型组学信息提取器（MicroPIE）：一种用于从文本来源自动获取原核生物表型特征的自然语言处理工具。 - Suppr | 超能文献

微生物表型组学信息提取器（MicroPIE）：一种用于从文本来源自动获取原核生物表型特征的自然语言处理工具。

Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources.

作者信息

Mao Jin, Moore Lisa R, Blank Carrine E, Wu Elvis Hsin-Hui, Ackerman Marcia, Ranade Sonali, Cui Hong

机构信息

School of Information, University of Arizona, Tucson, 85721, AZ, USA.

Department of Biological Sciences, University of Southern Maine, Portland, 04103, ME, USA.

出版信息

BMC Bioinformatics. 2016 Dec 13;17(1):528. doi: 10.1186/s12859-016-1396-8.

DOI:10.1186/s12859-016-1396-8

PMID:27955641

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5153691/

Abstract

BACKGROUND

The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages.

RESULTS

We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students.

CONCLUSION

MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types.

摘要

背景

微生物生物信息学中对表型组数据（即生物体的完整表型特征，如形状、代谢底物和生长条件）的大规模分析，一直受到缺乏从微生物学领域现有旧文本中快速准确提取表型数据工具的阻碍。为了快速获取有关微生物特征分布和进化的知识，需要开发一种信息提取系统，以便从大量分类描述中提取表型特征，从而将其用作现有系统发育分析软件包的输入。

结果

我们报告了微生物表型组信息提取器（MicroPIE，版本0.1.0）的开发和评估。MicroPIE是一个自然语言处理应用程序，它使用强大的监督分类算法（支持向量机）从原核生物分类描述的句子中识别特征，随后结合应用语言规则和已知术语组的算法来提取特征以及特征状态。MicroPIE的输入是一组分类描述（纯文本）。输出是一个按分类单元-特征矩阵，行是分类单元，列是一组42个预定义特征（例如，最适生长温度）。针对金标准矩阵和另一个学生制作的矩阵对MicroPIE的性能进行了评估。结果表明，与金标准相比，MicroPIE提取了21个特征（50%），其宽松F1分数>0.80，以及16个特征（38%），其宽松F1分数在0.50至0.80之间。包含特征预测组件（支持向量机）提高了MicroPIE的整体性能，尤其是精度。与相同的金标准相比，MicroPIE的表现明显优于本科生。

结论

MicroPIE是一种从原核生物分类描述中快速高效提取表型特征信息的有前景的新工具。然而，为了提高某些特征类型的提取性能，还需要进一步开发，包括纳入本体。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

微生物表型组学信息提取器（MicroPIE）：一种用于从文本来源自动获取原核生物表型特征的自然语言处理工具。

Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

微生物表型组学信息提取器（MicroPIE）：一种用于从文本来源自动获取原核生物表型特征的自然语言处理工具。

Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献