Lussier Y A, Li J
Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, NY 10032, USA.
Pac Symp Biocomput. 2004:202-13. doi: 10.1142/9789812704856_0020.
Comparative biological studies have led to remarkable biomedical discoveries. While genomic science and technologies are advancing rapidly, our ability to precisely specify a phenotype and compare it to related phenotypes of other organisms remains challenging. This study has examined the systematic use of terminology and knowledge based technologies to enable high-throughput comparative phenomics. More specifically, we measured the accuracy of a multi-strategy automated classification method to bridge the phenotype gap between a phenotypic terminology (MGD: Phenoslim) and a broad-coverage clinical terminology (SNOMED CT). Furthermore, we qualitatively evaluate the additional emerging properties of the combined terminological network for comparative biology and discovery science. According to the gold standard (n = 100), the accuracies (precision / recall) of the composite automated methods were 67% / 97% (mapping for identical concepts) and 85% / 98% (classification). Quantitatively, only 2% of the phenotypic concepts were missing from the clinical terminology, however, qualitatively the gap was larger: conceptual scope, granularity and subtle yet significant, homonymy problems were observed. These results suggest that, as observed in other domains, additional strategies are required for combining terminologies.
比较生物学研究带来了卓越的生物医学发现。虽然基因组科学和技术正在迅速发展,但我们精确确定一种表型并将其与其他生物的相关表型进行比较的能力仍然具有挑战性。本研究考察了基于术语和知识的技术的系统应用,以实现高通量比较表型组学。更具体地说,我们测量了一种多策略自动分类方法的准确性,以弥合表型术语(MGD:Phenoslim)和广泛覆盖的临床术语(SNOMED CT)之间的表型差距。此外,我们定性评估了用于比较生物学和发现科学的组合术语网络的其他新兴特性。根据金标准(n = 100),复合自动方法的准确率(精确率/召回率)分别为67% / 97%(相同概念的映射)和85% / 98%(分类)。从数量上看,临床术语中仅2%的表型概念缺失,然而,从质量上看差距更大:观察到概念范围、粒度以及细微但显著的同名问题。这些结果表明,正如在其他领域所观察到的那样,组合术语需要额外的策略。