Coldman A J, Braun T, Gallagher R P
Cancer Control Agency of British Columbia, Vancouver, Canada.
J Epidemiol Community Health. 1988 Dec;42(4):390-5. doi: 10.1136/jech.42.4.390.
Methodology is developed to classify ethnic status by name using a simple probabilistic model. This method involves the consideration of four rules which may be used to classify individuals using three name components (first, middle and last names). In order to do this, conditional probabilities of ethnic status are estimated from a sample in which the ethnic status is known. Using a split sample technique the sensitivity and specificity of this methodology were examined in a data set of death registrations. Each of the classification rules performed well on the data from which they were constructed but were not as efficient when applied to another population. Nevertheless a model (linear), in which the sum of the conditional probabilities of each home component is used, achieved a sensitivity and specificity of 97% and 100% respectively in males and 89% and 100% in females.
已开发出一种方法,通过使用简单的概率模型按姓名对种族身份进行分类。该方法涉及考虑四条规则,这些规则可用于利用三个姓名组成部分(名字、中间名和姓氏)对个人进行分类。为了做到这一点,从已知种族身份的样本中估计种族身份的条件概率。使用拆分样本技术,在死亡登记数据集上检验了该方法的敏感性和特异性。每个分类规则在构建它们的数据上表现良好,但应用于另一人群时效率不高。然而,一个模型(线性模型),其中使用每个家乡组成部分的条件概率之和,在男性中分别实现了97%的敏感性和100%的特异性,在女性中分别为89%和100%。