Carriço J A, Pinto F R, Simas C, Nunes S, Sousa N G, Frazão N, de Lencastre H, Almeida J S
Biomathematics Group, Universidade Nova de Lisboa, Rua da Quinta Grande 6, 2780-156 Oeiras, Portugal.
J Clin Microbiol. 2005 Nov;43(11):5483-90. doi: 10.1128/JCM.43.11.5483-5490.2005.
Pulsed-field gel electrophoresis (PFGE) has been the typing method of choice for strain identification in epidemiological studies of several bacterial species of medical importance. The usual procedure for the comparison of strains and assignment of strain type and subtype relies on visual assessment of band difference number, followed by an incremental assignment to the group hosting the most similar type previously seen. Band-based similarity coefficients, such as the Dice or the Jaccard coefficient, are then used for dendrogram construction, which provides a quantitative assessment of strain similarity. PFGE type assignment is based on the definition of a threshold linkage value, below which strains are assigned to the same group. This is typically performed empirically by inspecting the hierarchical cluster analysis dendrogram containing the strains of interest. This approach has the problem that the threshold value selected is dependent on the linkage method used for dendrogram construction. Furthermore, the use of a linkage method skews the original similarity values between strains. In this paper we assess the goodness of classification of several band-based similarity coefficients by comparing it with the band difference number for PFGE type and subtype classification using receiver operating characteristic curves. The procedure described was applied to a collection of PFGE results for 1,798 isolates of Streptococcus pneumoniae, which documented 96 types and 396 subtypes. The band-based similarity coefficients were found to perform equally well for type classification, but with different proportions of false-positive and false-negative classifications in their minimal false discovery rate when they were used for subtype classification.
脉冲场凝胶电泳(PFGE)一直是几种具有医学重要性的细菌物种流行病学研究中菌株鉴定的首选分型方法。比较菌株以及指定菌株类型和亚型的常规程序依赖于对条带差异数量的视觉评估,随后逐步指定到先前见过的最相似类型所在的组。然后使用基于条带的相似系数,如Dice系数或Jaccard系数,来构建树状图,从而对菌株相似性进行定量评估。PFGE类型指定基于阈值连锁值的定义,低于该阈值的菌株被指定到同一组。这通常通过检查包含感兴趣菌株的层次聚类分析树状图凭经验进行。这种方法存在的问题是所选的阈值取决于用于构建树状图的连锁方法。此外,使用连锁方法会扭曲菌株之间的原始相似性值。在本文中,我们通过使用接受者操作特征曲线将基于条带的相似系数与PFGE类型和亚型分类的条带差异数量进行比较,来评估其分类的优劣。所描述的程序应用于1798株肺炎链球菌的PFGE结果集合,该集合记录了96种类型和396种亚型。发现基于条带的相似系数在类型分类方面表现同样出色,但在用于亚型分类时,其最小错误发现率中的假阳性和假阴性分类比例不同。