Xiang Kun, Shi Danxi
Research Center of Machine Learning and Public Health, China Three Gorges University, Yichang, China.
Front Public Health. 2025 May 9;13:1467117. doi: 10.3389/fpubh.2025.1467117. eCollection 2025.
Liver diseases pose a significant global health burden with complex management challenges. Online health consultation platforms provide a valuable resource of unstructured patient-physician interactions. This study applies an integrated text mining framework to extract insights from this data, aiming to inform liver disease research and care strategies.
We analyzed 8,149 liver disease-related online consultation records from a leading Chinese health platform. The analytical framework integrated KeyBERT-enhanced keyword extraction with traditional approaches (TF-IDF, TextRank), BERT-CRF medical entity recognition, topic modeling (LDA), and association rule mining. Expert validation by hepatology specialists provided clinical verification of extracted patterns. Stratified analyses across demographic factors and disease types identified subgroup-specific patterns.
Text mining analyses demonstrated robust performance in medical terminology extraction (KeyBERT F1-score: 0.87), identified key topic patterns in liver disease consultations through enhanced entity recognition (F1-scores: 0.89-0.91), and revealed significant clinical associations through comprehensive rule mining (lift: 2.2-4.5). Stratified analyses further highlighted notable demographic variations in disease patterns and progression pathways.
This study validates the effectiveness of integrated text mining approaches in uncovering clinically relevant patterns from online consultation data, with particular strength in medical entity recognition and association detection. The robust methodological framework provides empirical support for differentiated approaches in liver disease management, while demographic variations in disease patterns underscore the necessity for personalized clinical strategies. However, translation of these findings into clinical practice requires longitudinal validation studies integrating multiple data sources.
肝脏疾病给全球健康带来了重大负担,其管理面临复杂挑战。在线健康咨询平台提供了大量非结构化的医患互动资源。本研究应用综合文本挖掘框架从这些数据中提取见解,旨在为肝脏疾病研究和护理策略提供信息。
我们分析了来自中国一家领先健康平台的8149条与肝脏疾病相关的在线咨询记录。分析框架将KeyBERT增强的关键词提取与传统方法(TF-IDF、TextRank)、BERT-CRF医学实体识别、主题建模(LDA)和关联规则挖掘相结合。肝病专家的专家验证为提取的模式提供了临床验证。对人口统计学因素和疾病类型进行分层分析,确定了特定亚组的模式。
文本挖掘分析在医学术语提取方面表现出色(KeyBERT F1分数:0.87),通过增强实体识别确定了肝脏疾病咨询中的关键主题模式(F1分数:0.89 - 0.91),并通过全面的规则挖掘揭示了显著的临床关联(提升度:2.2 - 4.5)。分层分析进一步突出了疾病模式和进展途径中显著的人口统计学差异。
本研究验证了综合文本挖掘方法在从在线咨询数据中发现临床相关模式方面的有效性,在医学实体识别和关联检测方面具有特别优势。强大的方法框架为肝脏疾病管理中的差异化方法提供了实证支持,而疾病模式的人口统计学差异强调了个性化临床策略的必要性。然而,将这些发现转化为临床实践需要整合多个数据源的纵向验证研究。