Min Hua, Mobahi Hedyeh, Irvin Katherine, Avramovic Sanja, Wojtusiak Janusz
Department of Health Administration and Policy, College of Health and Human Services, George Mason University, MS: 1J3, 4400 University Drive, Fairfax, VA, 22030-4444, USA.
J Biomed Semantics. 2017 Sep 16;8(1):39. doi: 10.1186/s13326-017-0149-6.
Bio-ontologies are becoming increasingly important in knowledge representation and in the machine learning (ML) fields. This paper presents a ML approach that incorporates bio-ontologies and its application to the SEER-MHOS dataset to discover patterns of patient characteristics that impact the ability to perform activities of daily living (ADLs). Bio-ontologies are used to provide computable knowledge for ML methods to "understand" biomedical data.
This retrospective study included 723 cancer patients from the SEER-MHOS dataset. Two ML methods were applied to create predictive models for ADL disabilities for the first year after a patient's cancer diagnosis. The first method is a standard rule learning algorithm; the second is that same algorithm additionally equipped with methods for reasoning with ontologies. The models showed that a patient's race, ethnicity, smoking preference, treatment plan and tumor characteristics including histology, staging, cancer site, and morphology were predictors for ADL performance levels one year after cancer diagnosis. The ontology-guided ML method was more accurate at predicting ADL performance levels (P < 0.1) than methods without ontologies.
This study demonstrated that bio-ontologies can be harnessed to provide medical knowledge for ML algorithms. The presented method demonstrates that encoding specific types of hierarchical relationships to guide rule learning is possible, and can be extended to other types of semantic relationships present in biomedical ontologies. The ontology-guided ML method achieved better performance than the method without ontologies. The presented method can also be used to promote the effectiveness and efficiency of ML in healthcare, in which use of background knowledge and consistency with existing clinical expertise is critical.
生物本体在知识表示和机器学习(ML)领域正变得越来越重要。本文提出了一种结合生物本体的ML方法及其在SEER-MHOS数据集中的应用,以发现影响日常生活活动(ADL)能力的患者特征模式。生物本体用于为ML方法提供可计算的知识,以便“理解”生物医学数据。
这项回顾性研究纳入了SEER-MHOS数据集中的723名癌症患者。应用两种ML方法为患者癌症诊断后的第一年ADL残疾创建预测模型。第一种方法是标准规则学习算法;第二种是在该算法基础上额外配备本体推理方法。模型显示,患者的种族、民族、吸烟偏好、治疗计划以及肿瘤特征(包括组织学、分期、癌症部位和形态)是癌症诊断一年后ADL表现水平的预测因素。与没有本体的方法相比,本体引导的ML方法在预测ADL表现水平方面更准确(P < 0.1)。
本研究表明,可以利用生物本体为ML算法提供医学知识。所提出的方法表明,对特定类型的层次关系进行编码以指导规则学习是可行的,并且可以扩展到生物医学本体中存在的其他类型的语义关系。本体引导的ML方法比没有本体的方法表现更好。所提出的方法还可用于提高ML在医疗保健中的有效性和效率,其中背景知识的使用以及与现有临床专业知识的一致性至关重要。