Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore Computational Biology Program, Faculty of Science.
Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore Department of Computer Science, National University of Singapore, Singapore, Singapore.
Bioinformatics. 2016 Oct 1;32(19):2981-7. doi: 10.1093/bioinformatics/btw357. Epub 2016 Jun 16.
Microbial consortia are frequently defined by numerous interactions within the community that are key to understanding their function. While microbial interactions have been extensively studied experimentally, information regarding them is dispersed in the scientific literature. As manual collation is an infeasible option, automated data processing tools are needed to make this information easily accessible.
We present @MInter, an automated information extraction system based on Support Vector Machines to analyze paper abstracts and infer microbial interactions. @MInter was trained and tested on a manually curated gold standard dataset of 735 species interactions and 3917 annotated abstracts, constructed as part of this study. Cross-validation analysis showed that @MInter was able to detect abstracts pertaining to one or more microbial interactions with high specificity (specificity = 95%, AUC = 0.97). Despite challenges in identifying specific microbial interactions in an abstract (interaction level recall = 95%, precision = 25%), @MInter was shown to reduce annotator workload 13-fold compared to alternate approaches. Applying @MInter to 175 bacterial species abundant on human skin, we identified a network of 357 literature-reported microbial interactions, demonstrating its utility for the study of microbial communities.
@MInter is freely available at https://github.com/CSB5/atminter
Supplementary data are available at Bioinformatics online.
微生物群落通常由群落内的许多相互作用来定义,这些相互作用是理解其功能的关键。虽然微生物相互作用已经在实验中得到了广泛研究,但关于它们的信息在科学文献中分散。由于手动整理是不可行的,因此需要自动化的数据处理工具来使这些信息易于访问。
我们提出了 @MInter,这是一个基于支持向量机的自动信息提取系统,用于分析论文摘要并推断微生物相互作用。@MInter 在一个手动整理的 735 种物种相互作用和 3917 个注释摘要的黄金标准数据集上进行了训练和测试,该数据集是本研究的一部分。交叉验证分析表明,@MInter 能够以高特异性(特异性=95%,AUC=0.97)检测涉及一个或多个微生物相互作用的摘要。尽管在摘要中识别特定微生物相互作用具有挑战性(相互作用水平召回率=95%,精度=25%),但与替代方法相比,@MInter 被证明可以将注释员的工作量减少 13 倍。将 @MInter 应用于在人类皮肤上丰富的 175 种细菌物种,我们鉴定出 357 种文献报道的微生物相互作用网络,证明了它在微生物群落研究中的实用性。
@MInter 可在 https://github.com/CSB5/atminter 上免费获得。
补充数据可在《生物信息学》在线获得。