Suppr超能文献

华莱士一致性系数的置信区间及其在微生物分型方法中的应用。

A confidence interval for the wallace coefficient of concordance and its application to microbial typing methods.

作者信息

Pinto Francisco R, Melo-Cristino José, Ramirez Mário

机构信息

Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina de Lisboa, Lisboa, Portugal.

出版信息

PLoS One. 2008;3(11):e3696. doi: 10.1371/journal.pone.0003696. Epub 2008 Nov 11.

Abstract

Very diverse research fields frequently deal with the analysis of multiple clustering results, which should imply an objective detection of overlaps and divergences between the formed groupings. The congruence between these multiple results can be quantified by clustering comparison measures such as the Wallace coefficient (W). Since the measured congruence is dependent on the particular sample taken from the population, there is variability in the estimated values relatively to those of the true population. In the present work we propose the use of a confidence interval (CI) to account for this variability when W is used. The CI analytical formula is derived assuming a Gaussian sampling distribution and recurring to the algebraic relationship between W and the Simpson's index of diversity. This relationship also allows the estimation of the expected Wallace value under the assumption of independence of classifications. We evaluated the CI performance using simulated and published microbial typing data sets. The simulations showed that the CI has the desired 95% coverage when the W is greater than 0.5. This behaviour is robust to changes in cluster number, cluster size distributions and sample size. The analysis of the published data sets demonstrated the usefulness of the new CI by objectively validating some of the previous interpretations, while showing that other conclusions lacked statistical support.

摘要

非常不同的研究领域经常涉及对多个聚类结果的分析,这意味着要客观地检测所形成分组之间的重叠和差异。这些多个结果之间的一致性可以通过聚类比较度量来量化,例如华莱士系数(W)。由于所测量的一致性取决于从总体中抽取的特定样本,相对于真实总体的估计值存在变异性。在本研究中,我们建议在使用W时使用置信区间(CI)来考虑这种变异性。CI分析公式是在假设高斯抽样分布的情况下推导出来的,并借助W与辛普森多样性指数之间的代数关系。这种关系还允许在分类独立的假设下估计预期的华莱士值。我们使用模拟和已发表的微生物分型数据集评估了CI的性能。模拟结果表明,当W大于0.5时,CI具有所需的95%覆盖率。这种行为对于聚类数量、聚类大小分布和样本大小的变化具有鲁棒性。对已发表数据集的分析通过客观验证一些先前的解释证明了新CI的有用性,同时表明其他结论缺乏统计支持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a3a/2577298/9f0b2ad9720a/pone.0003696.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验