Gurney Thomas, Horlings Edwin, van den Besselaar Peter
Scientometrics. 2012 May;91(2):435-449. doi: 10.1007/s11192-011-0589-1. Epub 2011 Dec 30.
Key to accurate bibliometric analyses is the ability to correctly link individuals to their corpus of work, with an optimal balance between precision and recall. We have developed an algorithm that does this disambiguation task with a very high recall and precision. The method addresses the issues of discarded records due to null data fields and their resultant effect on recall, precision and F-measure results. We have implemented a dynamic approach to similarity calculations based on all available data fields. We have also included differences in author contribution and age difference between publications, both of which have meaningful effects on overall similarity measurements, resulting in significantly higher recall and precision of returned records. The results are presented from a test dataset of heterogeneous catalysis publications. Results demonstrate significantly high average F-measure scores and substantial improvements on previous and stand-alone techniques.
准确的文献计量分析的关键在于能够在精确性和召回率之间实现最佳平衡,将个人与其作品集正确关联起来。我们开发了一种算法,该算法在执行这种消歧任务时具有非常高的召回率和精确性。该方法解决了由于数据字段为空而导致记录被丢弃的问题,以及由此对召回率、精确性和F值结果产生的影响。我们基于所有可用数据字段实现了一种动态相似性计算方法。我们还纳入了作者贡献差异和出版物之间的年龄差异,这两者对整体相似性度量都有显著影响,从而使返回记录的召回率和精确性显著提高。结果来自异构催化出版物的测试数据集。结果表明平均F值得分显著提高,并且相对于之前的独立技术有实质性改进。