Torvik Vetle I, Weeber Marc, Swanson Don R, Smalheiser Neil R
Department of Psychiatry, University of Illinois at Chicago, IL, USA.
AMIA Annu Symp Proc. 2003;2003:1033.
We present a model for automatically generating training sets and estimating the probability that a pair of Medline records sharing a last and first name initial are authored by the same individual, based on shared title words, journal name, co-authors, medical subject headings, language, and affiliation, as well as distinctive features of the name itself (i.e., presence of middle initial, suffix, and prevalence in Medline).
我们提出了一个模型,用于自动生成训练集,并基于共享的标题词、期刊名称、共同作者、医学主题词、语言和机构,以及姓名本身的独特特征(即中间名首字母、后缀的存在情况以及在Medline中的出现频率),估计一对共享姓氏首字母和名字首字母的Medline记录是否由同一人撰写的概率。