Turner Anne M, Liddy Elizabeth D, Bradley Jana, Wheatley Joyce A
School of Public Health and Community Medicine, University of Washington, Seattle, Washington 98195, USA.
J Med Libr Assoc. 2005 Oct;93(4):487-94.
Much of the useful information in public health (PH) is considered gray literature, literature that is not available through traditional, commercial pathways. The diversity and nontraditional format of this information makes it difficult to locate. The aim of this Robert Wood Johnson Foundation-funded project is to improve access to PH gray literature reports through established natural language processing (NLP) techniques. This paper summarizes the development of a model for representing gray literature documents concerning PH interventions.
The authors established a model-based approach for automatically analyzing and representing the PH gray literature through the evaluation of a corpus of PH gray literature from seven PH Websites. Input from fifteen PH professionals assisted in the development of the model and prioritization of elements for NLP extraction.
Of 365 documents collected, 320 documents were used for analysis to develop a model of key text elements of gray literature documents relating to PH interventions. Survey input from a group of potential users directed the selection of key elements to include in the document summaries.
A model of key elements relating to PH interventions in the gray literature can be developed from the ground up through document analysis and input from members of the PH workforce. The model provides a framework for developing a method to identify and store key elements from documents (metadata) as document surrogates that can be used for indexing, abstracting, and determining the shape of the PH gray literature.
公共卫生(PH)领域的许多有用信息被视为灰色文献,即无法通过传统商业途径获取的文献。这类信息的多样性和非传统格式使其难以查找。这个由罗伯特·伍德·约翰逊基金会资助的项目旨在通过既定的自然语言处理(NLP)技术,改善获取公共卫生灰色文献报告的途径。本文总结了一个用于表示有关公共卫生干预措施的灰色文献文档的模型的开发过程。
作者通过评估来自七个公共卫生网站的公共卫生灰色文献语料库,建立了一种基于模型的方法,用于自动分析和表示公共卫生灰色文献。十五位公共卫生专业人员的意见有助于模型的开发以及自然语言处理提取要素的优先级确定。
在收集的365份文档中,320份文档用于分析,以建立与公共卫生干预措施相关的灰色文献文档关键文本要素模型。一组潜在用户的调查意见指导了文档摘要中关键要素的选择。
通过文档分析和公共卫生工作队伍成员的意见,可以从头开始开发一个与灰色文献中公共卫生干预措施相关的关键要素模型。该模型为开发一种方法提供了框架,该方法用于识别和存储文档中的关键要素(元数据)作为文档替代物,可用于索引、摘要以及确定公共卫生灰色文献的形式。