Quade G, Püschel N, Far F
Institut für Medizinische Statistik, Dokumentation und Datenverarbeitung der Universität Bonn.
Proc AMIA Annu Fall Symp. 1996:403-7.
CancerNet from the National Cancer Institute contains nearly 500 ASCII-files, updated monthly, with up-to-date information about cancer and the "Golden Standard" in tumor therapy. Perl scripts are used to convert these files to HTML-documents. A complex algorithm, using regular expression matching and extensive exception handling, detects headlines, listings and other constructs of the original ASCII-text and converts them into their HTML-counterparts. A table of contents is also created during the process. The resulting files are indexed for full-text search via WAIS. Building the complete CancerNet WWW redistribution takes less than two hours with a minimum of manual work. For 26,000 requests of information from our service per month the average costs for the worldwide delivery of one document is about 19 cents.
美国国立癌症研究所的癌症网络包含近500个ASCII文件,每月更新,提供有关癌症和肿瘤治疗“金标准”的最新信息。使用Perl脚本将这些文件转换为HTML文档。一种复杂的算法,利用正则表达式匹配和广泛的异常处理,检测原始ASCII文本的标题、列表和其他结构,并将它们转换为HTML对应物。在此过程中还会创建一个目录。生成的文件通过WAIS进行全文搜索索引。构建完整的癌症网络万维网再分发只需不到两个小时,且人工操作最少。对于我们每月收到的26000条信息请求,全球交付一份文档的平均成本约为19美分。