Department of Computer Science, University of New Mexico Albuquerque, NM, USA ; Mind Research Network Albuquerque, NM, USA ; Conjectural Systems Atlanta, GA, USA.
Department of Computer Science, University of New Mexico Albuquerque, NM, USA.
Front Neurosci. 2013 Dec 16;7:240. doi: 10.3389/fnins.2013.00240. eCollection 2013.
Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text.
确定人类神经影像学论文中的实验方法对于对类似实验进行元分析具有重要意义。目前,这只能由人工读者完成。我们展示了常见机器学习(文本挖掘)方法在自动分类或标记该文献问题上的性能。标签术语来自认知范式本体(CogPO),文本语料库是已发表的功能神经影像学论文的摘要,而方法则使用人类专家的性能作为训练数据。我们旨在复制专家对每个摘要的多个标签的注释,这些标签用于识别实验刺激、认知范式、响应类型以及实验的其他相关维度。我们使用了几种标准的机器学习方法:朴素贝叶斯(NB)、k-最近邻和支持向量机(特别是 SMO 或顺序最小优化)。精确匹配性能在最差情况下仅为 15%,在最佳情况下为 78%。与二进制相关性转换相结合的 NB 方法性能强劲,并且对过拟合具有鲁棒性。这一系列结果表明,使用现成的软件组件和几乎不需要对原始文本进行预处理就可以实现什么。