Jeong Wonjeong, Song Eunkyoung, Jeong Eunzi, Oh Kyoung Hee, Lee Hye-Sun, Jun Jae Kwan
Cancer Knowledge & Information Center, National Cancer Control Institute, National Cancer Center, Goyang, Korea.
Healthc Inform Res. 2024 Oct;30(4):398-408. doi: 10.4258/hir.2024.30.4.398. Epub 2024 Oct 31.
With the growing importance of monitoring cancer patients' internet usage, there is an increasing need for technology that expands access to relevant information through text mining. This study analyzed internet articles from portal sites in 2023 to identify trends in the information available to cancer patients and to derive meaningful insights.
This study analyzed 19,578 news articles published on Naver, a major Korean portal site, from January 1, 2023, to December 31, 2023. Natural language processing, text mining, network analysis, and word cloud analysis were employed. The search term "am" (Korean for "cancer") was used to identify keywords related to cancer.
In 2023, an average of 1,631 cancer-related articles were published monthly, with a peak of 1,946 in September and a low of 1,371 in February. A total of 132,456 keywords were extracted, with "cure" (2,218 occurrences), "lung cancer" (1,652), and "breast cancer" (1,235) being the most frequent. Term frequency-inverse document frequency analysis ranked "struggle" (1064.172) as the most significant keyword, followed by "lung cancer" (839.988) and "breast cancer" (744.840). Network analysis revealed four distinct clusters focusing on treatment, celebrity-related issues, major cancer types, and cancer-causing factors.
The analysis of cancer-related keywords in 2023 indicates that news articles often prioritize gossip over essential information. These findings provide foundational data for future policy directions and strategies to address misinformation. This study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers insights to guide official policies and healthcare practices.
随着监测癌症患者互联网使用情况的重要性日益增加,对通过文本挖掘扩大获取相关信息渠道的技术需求也在不断增长。本研究分析了2023年门户网站上的互联网文章,以确定癌症患者可获取信息的趋势并得出有意义的见解。
本研究分析了2023年1月1日至2023年12月31日在韩国主要门户网站Naver上发表的19578篇新闻文章。采用了自然语言处理、文本挖掘、网络分析和词云分析。搜索词“암”(韩语中“癌症”的意思)用于识别与癌症相关的关键词。
2023年,每月平均发表1631篇与癌症相关的文章,9月达到峰值1946篇,2月降至最低点1371篇。共提取了132456个关键词,其中“治愈”(出现2218次)、“肺癌”(1652次)和“乳腺癌”(1235次)最为常见。词频逆文档频率分析将“抗争”(1064.172)列为最显著的关键词,其次是“肺癌”(839.988)和“乳腺癌”(744.840)。网络分析揭示了四个不同的聚类,分别关注治疗、名人相关问题、主要癌症类型和致癌因素。
对2023年癌症相关关键词的分析表明,新闻文章往往将八卦置于重要信息之上。这些发现为未来解决错误信息的政策方向和策略提供了基础数据。本研究强调了了解公众消费的癌症相关信息性质的重要性,并为指导官方政策和医疗实践提供了见解。