Kim Hyeoneui, Mentzer Jessica, Taira Ricky
School of Nursing, Duke University, Durham, NC, United States.
Department of Radiological Science, University of California Los Angeles, Los Angeles, CA, United States.
J Med Internet Res. 2019 Apr 23;21(4):e12776. doi: 10.2196/12776.
Physical activity data provides important information on disease onset, progression, and treatment outcomes. Although analyzing physical activity data in conjunction with other clinical and microbiological data will lead to new insights crucial for improving human health, it has been hampered partly because of the large variations in the way the data are collected and presented.
The aim of this study was to develop a Physical Activity Ontology (PACO) to support structuring and standardizing heterogeneous descriptions of physical activities.
We prepared a corpus of 1140 unique sentences collected from various physical activity questionnaires and scales as well as existing standardized terminologies and ontologies. We extracted concepts relevant to physical activity from the corpus using a natural language processing toolkit called Multipurpose Text Processing Tool. The target concepts were formalized into an ontology using Protégé (version 4). Evaluation of PACO was performed to ensure logical and structural consistency as well as adherence to the best practice principles of building an ontology. A use case application of PACO was demonstrated by structuring and standardizing 36 exercise habit statements and then automatically classifying them to a defined class of either sufficiently active or insufficiently active using FaCT++, an ontology reasoner available in Protégé.
PACO was constructed using 268 unique concepts extracted from the questionnaires and assessment scales. PACO contains 225 classes including 9 defined classes, 20 object properties, 1 data property, and 23 instances (excluding 36 exercise statements). The maximum depth of classes is 4, and the maximum number of siblings is 38. The evaluations with ontology auditing tools confirmed that PACO is structurally and logically consistent and satisfies the majority of the best practice rules of ontology authoring. We showed in a small sample of 36 exercise habit statements that we could formally represent them using PACO concepts and object properties. The formal representation was used to infer a patient activity status category of sufficiently active or insufficiently active using the FaCT++ reasoner.
As a first step toward standardizing and structuring heterogeneous descriptions of physical activities for integrative data analyses, PACO was constructed based on the concepts collected from physical activity questionnaires and assessment scales. PACO was evaluated to be structurally consistent and compliant to ontology authoring principles. PACO was also demonstrated to be potentially useful in standardizing heterogeneous physical activity descriptions and classifying them into clinically meaningful categories that reflect adequacy of exercise.
身体活动数据为疾病的发生、发展及治疗结果提供了重要信息。尽管将身体活动数据与其他临床和微生物学数据相结合进行分析会带来对改善人类健康至关重要的新见解,但部分由于数据收集和呈现方式的巨大差异,这一过程受到了阻碍。
本研究的目的是开发一个身体活动本体(PACO),以支持对身体活动的异构描述进行结构化和标准化。
我们准备了一个语料库,其中包含从各种身体活动问卷和量表以及现有的标准化术语和本体中收集的1140个独特句子。我们使用一个名为多用途文本处理工具的自然语言处理工具包从语料库中提取与身体活动相关的概念。目标概念使用Protégé(4版本)形式化为一个本体。对PACO进行评估,以确保逻辑和结构的一致性以及遵循构建本体的最佳实践原则。通过对36条运动习惯陈述进行结构化和标准化,然后使用Protégé中可用的本体推理器FaCT++将它们自动分类为足够活跃或不够活跃的定义类别,展示了PACO的一个用例应用。
PACO是使用从问卷和评估量表中提取的268个独特概念构建的。PACO包含225个类,包括9个定义类、20个对象属性、1个数据属性和23个实例(不包括36条运动陈述)。类的最大深度为4,兄弟类的最大数量为38。使用本体审核工具进行的评估证实,PACO在结构和逻辑上是一致的,并且满足本体创作的大多数最佳实践规则。我们在一个包含36条运动习惯陈述的小样本中表明,我们可以使用PACO概念和对象属性对它们进行形式化表示。这种形式化表示用于使用FaCT++推理器推断患者的活动状态类别为足够活跃或不够活跃。
作为对身体活动的异构描述进行标准化和结构化以进行综合数据分析的第一步,PACO是基于从身体活动问卷和评估量表中收集的概念构建的。PACO经评估在结构上是一致的,并且符合本体创作原则。PACO还被证明在标准化异构身体活动描述并将它们分类为反映运动充足性的具有临床意义的类别方面可能是有用的。