Dept. of Biomedical Informatics, Vanderbilt University, 2525 West End Ave. Suite 1475, Nashville, TN 37203, USA.
Dept. of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
J Biomed Inform. 2018 Aug;84:75-81. doi: 10.1016/j.jbi.2018.06.014. Epub 2018 Jun 22.
Evaluate potential for data mining auditing techniques to identify hidden concepts in diagnostic knowledge bases (KB). Improving completeness enhances KB applications such as differential diagnosis and patient case simulation.
Authors used unsupervised (Pearson's correlation - PC, Kendall's correlation - KC, and a heuristic algorithm - HA) methods to identify existing and discover new finding-finding interrelationships ("properties") in the INTERNIST-1/QMR KB. Authors estimated KB maintenance efficiency gains (effort reduction) of the approaches.
The methods discovered new properties at 95% CI rates of [0.1%, 5.4%] (PC), [2.8%, 12.5%] (KC), and [5.6%, 18.8%] (HA). Estimated manual effort reduction for HA-assisted determination of new properties was approximately 50-fold.
Data mining can provide an efficient supplement to ensuring the completeness of finding-finding interdependencies in diagnostic knowledge bases. Authors' findings should be applicable to other diagnostic systems that record finding frequencies within diseases (e.g., DXplain, ISABEL).
评估数据挖掘审计技术在识别诊断知识库 (KB) 中隐藏概念的潜力。提高完整性可以增强 KB 应用程序,如鉴别诊断和患者病例模拟。
作者使用非监督方法(Pearson 相关系数 - PC、Kendall 相关系数 - KC 和启发式算法 - HA)来识别 INTERNIST-1/QMR KB 中现有的和新的发现 - 发现相互关系(“属性”)。作者估计了这些方法的 KB 维护效率增益(减少工作量)。
这些方法在 [0.1%,5.4%](PC)、[2.8%,12.5%](KC)和 [5.6%,18.8%](HA)的 95%置信区间内发现了新的属性。HA 辅助确定新属性的手动工作量减少估计约为 50 倍。
数据挖掘可以为确保诊断知识库中发现 - 发现相互依赖关系的完整性提供有效的补充。作者的发现应该适用于记录疾病内发现频率的其他诊断系统(例如,DXplain、ISABEL)。