College of Nursing, Gachon University, Incheon, Republic of Korea.
College of Nursing, Seoul National University, Seoul, Republic of Korea.
J Med Internet Res. 2020 Dec 7;22(12):e18767. doi: 10.2196/18767.
Analysis of posts on social media is effective in investigating health information needs for disease management and identifying people's emotional status related to disease. An ontology is needed for semantic analysis of social media data.
This study was performed to develop a cancer ontology with terminology containing consumer terms and to analyze social media data to identify health information needs and emotions related to cancer.
A cancer ontology was developed using social media data, collected with a crawler, from online communities and blogs between January 1, 2014 and June 30, 2017 in South Korea. The relative frequencies of posts containing ontology concepts were counted and compared by cancer type.
The ontology had 9 superclasses, 213 class concepts, and 4061 synonyms. Ontology-driven natural language processing was performed on the text from 754,744 cancer-related posts. Colon, breast, stomach, cervical, lung, liver, pancreatic, and prostate cancer; brain tumors; and leukemia appeared most in these posts. At the superclass level, risk factor was the most frequent, followed by emotions, symptoms, treatments, and dealing with cancer.
Information needs and emotions differed according to cancer type. The observations of this study could be used to provide tailored information to consumers according to cancer type and care process. Attention should be paid to provision of cancer-related information to not only patients but also their families and the general public seeking information on cancer.
社交媒体帖子分析在调查疾病管理的健康信息需求和识别与疾病相关的人们的情绪状态方面非常有效。社交媒体数据的语义分析需要本体论。
本研究旨在开发一个包含消费者术语的癌症本体论,并分析社交媒体数据,以确定与癌症相关的健康信息需求和情绪。
使用社交媒体数据开发癌症本体论,这些数据是通过爬虫从 2014 年 1 月 1 日至 2017 年 6 月 30 日在韩国的在线社区和博客中收集的。通过癌症类型对包含本体论概念的帖子的相对频率进行计数和比较。
本体论有 9 个超类、213 个类概念和 4061 个同义词。对 754744 条与癌症相关的帖子的文本进行了本体驱动的自然语言处理。结肠癌、乳腺癌、胃癌、宫颈癌、肺癌、肝癌、胰腺癌和前列腺癌;脑肿瘤;和白血病在这些帖子中出现最多。在超级类水平上,风险因素最常见,其次是情绪、症状、治疗和应对癌症。
信息需求和情绪因癌症类型而异。本研究的观察结果可用于根据癌症类型和护理过程为消费者提供定制化的信息。应注意向不仅是患者而且是他们的家人和一般公众提供癌症相关信息,他们正在寻求癌症信息。