Jones David E, Igo Sean, Hurdle John, Facelli Julio C
Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America.
Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States of America ; Center for High Performance Computing, University of Utah, Salt Lake City, Utah, United States of America.
PLoS One. 2014 Jan 2;9(1):e83932. doi: 10.1371/journal.pone.0083932. eCollection 2014.
In this study, we demonstrate the use of natural language processing methods to extract, from nanomedicine literature, numeric values of biomedical property terms of poly(amidoamine) dendrimers. We have developed a method for extracting these values for properties taken from the NanoParticle Ontology, using the General Architecture for Text Engineering and a Nearly-New Information Extraction System. We also created a method for associating the identified numeric values with their corresponding dendrimer properties, called NanoSifter. We demonstrate that our system can correctly extract numeric values of dendrimer properties reported in the cancer treatment literature with high recall, precision, and f-measure. The micro-averaged recall was 0.99, precision was 0.84, and f-measure was 0.91. Similarly, the macro-averaged recall was 0.99, precision was 0.87, and f-measure was 0.92. To our knowledge, these results are the first application of text mining to extract and associate dendrimer property terms and their corresponding numeric values.
在本研究中,我们展示了使用自然语言处理方法从纳米医学文献中提取聚(酰胺胺)树枝状大分子生物医学特性术语的数值。我们开发了一种方法,利用文本工程通用架构和近新信息提取系统,从纳米粒子本体中提取这些特性的值。我们还创建了一种将识别出的数值与其相应树枝状大分子特性相关联的方法,称为纳米筛选器。我们证明,我们的系统能够以高召回率、精确率和F值正确提取癌症治疗文献中报道的树枝状大分子特性的数值。微观平均召回率为0.99,精确率为0.84,F值为0.91。同样,宏观平均召回率为0.99,精确率为0.87,F值为0.92。据我们所知,这些结果是文本挖掘首次应用于提取和关联树枝状大分子特性术语及其相应的数值。