Aphinyanaphongs Yindalon, Statnikov Alexander, Aliferis Constantin F
Department of Biomedical Informatics, Eskind Biomedical Library, room 412, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA.
J Am Med Inform Assoc. 2006 Jul-Aug;13(4):446-55. doi: 10.1197/jamia.M2031. Epub 2006 Apr 18.
The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they are evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard).
Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors.
Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation.
For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters.
These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task.
本研究探讨了现有的和新型的特定金标准机器学习(GSS-ML)聚焦过滤模型(即专门为检索任务构建并以其进行评估的金标准的模型)的区分性能,并将其性能与引用次数和影响因子以及非特定机器学习(NS-ML)模型(即为不同任务和/或不同金标准构建的模型)进行比较。
使用SSOAB文献目录、被《美国内科医师学会杂志》引用的治疗文章以及被《美国内科医师学会杂志》引用的病因学文章构建了三个金标准语料库。获取了每篇文章的引用次数和影响因子。使用支持向量机模型,将内容、影响因子和引用次数的组合作为预测因子对文章进行分类。
使用受试者工作特征曲线下面积和n折交叉验证来估计区分性能。
对于所有三个金标准和任务,GSS-ML过滤器的表现均优于引用次数、影响因子和NS-ML过滤器。内容与影响因子或引用次数的组合对GSS机器学习过滤器没有产生或仅产生了可忽略不计的改进。
这些实验提供了证据,即在构建专注于检索任务和相应金标准的信息检索过滤器时,必须专门为该任务和金标准构建过滤模型。在这些条件下,机器学习过滤器的表现优于标准引用指标。此外,引用次数和影响因子对区分性能的增加值很小。之前在此处研究的其中一个语料库中声称引用指标比机器学习表现更好的研究,归因于使用了为不同金标准和任务构建的机器学习过滤器。