Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa-shi, Chiba-ken 277-0871, Japan; Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa-shi, Chiba-ken 277-8561, Japan.
Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa-shi, Chiba-ken 277-0871, Japan.
Am J Hum Genet. 2018 Sep 6;103(3):389-399. doi: 10.1016/j.ajhg.2018.08.003. Epub 2018 Aug 30.
Recently, to speed up the differential-diagnosis process based on symptoms and signs observed from an affected individual in the diagnosis of rare diseases, researchers have developed and implemented phenotype-driven differential-diagnosis systems. The performance of those systems relies on the quantity and quality of underlying databases of disease-phenotype associations (DPAs). Although such databases are often developed by manual curation, they inherently suffer from limited coverage. To address this problem, we propose a text-mining approach to increase the coverage of DPA databases and consequently improve the performance of differential-diagnosis systems. Our analysis showed that a text-mining approach using one million case reports obtained from PubMed could increase the coverage of manually curated DPAs in Orphanet by 125.6%. We also present PubCaseFinder (see Web Resources), a new phenotype-driven differential-diagnosis system in a freely available web application. By utilizing automatically extracted DPAs from case reports in addition to manually curated DPAs, PubCaseFinder improves the performance of automated differential diagnosis. Moreover, PubCaseFinder helps clinicians search for relevant case reports by using phenotype-based comparisons and confirm the results with detailed contextual information.
最近,为了加快基于受影响个体的症状和体征进行的罕见病诊断中的鉴别诊断过程,研究人员已经开发并实施了基于表型的鉴别诊断系统。这些系统的性能依赖于疾病表型关联 (DPA) 基础数据库的数量和质量。尽管这些数据库通常是通过手动整理开发的,但它们本质上受到覆盖范围有限的限制。为了解决这个问题,我们提出了一种文本挖掘方法来增加 DPA 数据库的覆盖范围,从而提高鉴别诊断系统的性能。我们的分析表明,使用从 PubMed 获得的 100 万份病例报告的文本挖掘方法可以将 Orphanet 中手动整理的 DPA 的覆盖范围提高 125.6%。我们还展示了 PubCaseFinder(请参见参考资料),这是一个新的基于表型的免费在线应用程序中的鉴别诊断系统。通过利用病例报告中自动提取的 DPA 以及手动整理的 DPA,PubCaseFinder 提高了自动鉴别诊断的性能。此外,PubCaseFinder 还通过基于表型的比较帮助临床医生搜索相关的病例报告,并使用详细的上下文信息来确认结果。