French Leon, Liu Po, Marais Olivia, Koreman Tianna, Tseng Lucia, Lai Artemis, Pavlidis Paul
Rotman Research Institute, University of Toronto Toronto, ON, Canada.
Department of Psychiatry, University of British Columbia Vancouver, BC, Canada.
Front Neuroinform. 2015 May 21;9:13. doi: 10.3389/fninf.2015.00013. eCollection 2015.
We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/.
我们介绍了WhiteText项目及其在从文本中自动提取神经解剖连接性陈述方面的进展。我们回顾了该项目三个主要步骤迄今为止的进展:脑区提及的识别、脑区提及到神经解剖学术语的标准化以及连接性陈述提取。我们进一步描述了我们人工整理语料库的新版本,该版本从另外1828篇摘要中添加了2111条连接性陈述。新语料库中的交叉验证分类在我们原始语料库上重现了结果,精确率为51%时召回了67%的连接性陈述。由此产生的合并语料库提供了5208条连接性陈述,可用于生成特定物种的连接性矩阵并更好地训练自动化技术。最后,我们展示了一个新的网络应用程序,它允许快速交互式浏览系统索引的70000多个句子,作为访问数据和协助进一步整理的工具。软件和数据可在http://www.chibi.ubc.ca/WhiteText/免费获取。