College of Business Administration, Henan Finance University, Zhengzhou 451464, China.
Business School, Henan University, Kaifeng 475004, China.
Int J Environ Res Public Health. 2022 Dec 9;19(24):16590. doi: 10.3390/ijerph192416590.
Knowledge extraction from rich text in online health communities can supplement and improve the existing knowledge base, supporting evidence-based medicine and clinical decision making. The extracted time series health management data of users can help users with similar conditions when managing their health. By annotating four relationships, this study constructed a deep learning model, BERT-BiGRU-ATT, to extract disease-medication relationships. A Chinese-pretrained BERT model was used to generate word embeddings for the question-and-answer data from online health communities in China. In addition, the bidirectional gated recurrent unit, combined with an attention mechanism, was employed to capture sequence context features and then to classify text related to diseases and drugs using a softmax classifier and to obtain the time series data provided by users. By using various word embedding training experiments and comparisons with classical models, the superiority of our model in relation to extraction was verified. Based on the knowledge extraction, the evolution of a user's disease progression was analyzed according to the time series data provided by users to further analyze the evolution of the user's disease progression. BERT word embedding, GRU, and attention mechanisms in our research play major roles in knowledge extraction. The knowledge extraction results obtained are expected to supplement and improve the existing knowledge base, assist doctors' diagnosis, and help users with dynamic lifecycle health management, such as user disease treatment management. In future studies, a co-reference resolution can be introduced to further improve the effect of extracting the relationships among diseases, drugs, and drug effects.
从在线健康社区的丰富文本中提取知识可以补充和改进现有的知识库,支持循证医学和临床决策。用户的提取时间序列健康管理数据可以帮助管理健康时具有相似条件的用户。通过注释四种关系,本研究构建了一个深度学习模型 BERT-BiGRU-ATT,用于提取疾病-药物关系。使用来自中国在线健康社区的问答数据的中文预训练 BERT 模型生成词嵌入。此外,双向门控循环单元与注意力机制结合,用于捕获序列上下文特征,然后使用 softmax 分类器对与疾病和药物相关的文本进行分类,并获得用户提供的时间序列数据。通过使用各种词嵌入训练实验和与经典模型的比较,验证了我们的模型在提取方面的优越性。基于知识提取,根据用户提供的时间序列数据分析用户疾病进展的演变,进一步分析用户疾病进展的演变。BERT 词嵌入、GRU 和注意力机制在我们的研究中发挥着重要作用。预计提取疾病、药物和药物效果之间关系的知识提取结果将补充和改进现有的知识库,协助医生诊断,并帮助用户进行动态生命周期健康管理,例如用户疾病治疗管理。在未来的研究中,可以引入共指消解来进一步提高提取疾病、药物和药物效果之间关系的效果。