Mollah Shamim Ara, Cimino Christopher
Albert Einstein College of Medicine.
AMIA Annu Symp Proc. 2007 Oct 11:1053.
At Albert Einstein College of Medicine a large part of online lecture materials contain PostScript files. As the collection grows it becomes essential to create a digital library to have easy access to relevant sections of the lecture material that is full-text indexed; to create this index it is necessary to extract all the text from the document files that constitute the originals of the lectures. In this study we present a semi automatic indexing method using robust technique for extracting text from PostScript files and National Library of Medicine's Medical Text Indexer (MTI) program for indexing the text. This model can be applied to other medical schools for indexing purposes.
在阿尔伯特·爱因斯坦医学院,很大一部分在线讲座资料包含PostScript文件。随着资料集的不断增加,创建一个数字图书馆变得至关重要,以便能够轻松访问具有全文索引的讲座资料的相关部分;为了创建这个索引,有必要从构成讲座原始资料的文档文件中提取所有文本。在本研究中,我们提出了一种半自动索引方法,该方法使用强大的技术从PostScript文件中提取文本,并使用美国国立医学图书馆的医学文本索引程序(MTI)对文本进行索引。该模型可应用于其他医学院校用于索引目的。