Lin John M, Bohland Jason W, Andrews Peter, Burns Gully A P C, Allen Cara B, Mitra Partha P
Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America.
PLoS One. 2008 Apr 30;3(4):e2052. doi: 10.1371/journal.pone.0002052.
Annual meeting abstracts published by scientific societies often contain rich arrays of information that can be computationally mined and distilled to elucidate the state and dynamics of the subject field. We extracted and processed abstract data from the Society for Neuroscience (SFN) annual meeting abstracts during the period 2001-2006 in order to gain an objective view of contemporary neuroscience. An important first step in the process was the application of data cleaning and disambiguation methods to construct a unified database, since the data were too noisy to be of full utility in the raw form initially available. Using natural language processing, text mining, and other data analysis techniques, we then examined the demographics and structure of the scientific collaboration network, the dynamics of the field over time, major research trends, and the structure of the sources of research funding. Some interesting findings include a high geographical concentration of neuroscience research in the north eastern United States, a surprisingly large transient population (66% of the authors appear in only one out of the six studied years), the central role played by the study of neurodegenerative disorders in the neuroscience community, and an apparent growth of behavioral/systems neuroscience with a corresponding shrinkage of cellular/molecular neuroscience over the six year period. The results from this work will prove useful for scientists, policy makers, and funding agencies seeking to gain a complete and unbiased picture of the community structure and body of knowledge encapsulated by a specific scientific domain.
科学协会发表的年会摘要通常包含丰富的信息阵列,这些信息可以通过计算挖掘和提炼,以阐明该学科领域的现状和动态。我们提取并处理了2001年至2006年期间神经科学学会(SFN)年会摘要中的数据,以便对当代神经科学有一个客观的认识。该过程中重要的第一步是应用数据清理和消除歧义的方法来构建一个统一的数据库,因为最初获得的原始数据噪声太大,无法充分利用。然后,我们使用自然语言处理、文本挖掘和其他数据分析技术,研究了科学合作网络的人口统计学和结构、该领域随时间的动态、主要研究趋势以及研究资金来源的结构。一些有趣的发现包括:神经科学研究在美国东北部高度集中; transient population(暂译:流动人口)数量惊人(66%的作者仅在六个研究年份中的一年出现);神经退行性疾病研究在神经科学界发挥的核心作用;以及在六年期间行为/系统神经科学明显增长,而细胞/分子神经科学相应萎缩。这项工作的结果将证明对寻求全面、公正地了解特定科学领域所涵盖的社区结构和知识体系的科学家、政策制定者和资助机构有用。