Fan Yadan, He Lu, Pakhomov Serguei V S, Melton Genevieve B, Zhang Rui
Institute for Health Informatics, Minneapolis, MN.
Department of Computer Science, Minneapolis, MN.
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:493-501. eCollection 2017.
Clinical notes contain rich information about supplement use that is critical for detecting adverse interactions between supplements and prescribed medications. It is important to know the context in which supplements are mentioned in clinical notes to be able to correctly identify patients that either currently take the supplement or did so in the past. We applied text mining methods to automatically classify supplement use into four status categories: Continuing (C), Discontinued (D), Started (S), and Unclassified (U). We manually classified 1,300 sentences into these categories, which were further split as training (1000 sentences) and testing (300 sentences) sets. We evaluated the 7 types of feature sets and 5 algorithms, and the best model (SVM with unigram, bigram and indicator word within certain distance) performed F-measure of 0.906, 0.913, 0.914, 0.715 for status C, D, S, U, respectively on the testing set. This study demonstrates the feasibility of using text mining methods to classify supplement use status from clinical notes.
临床记录包含有关补充剂使用的丰富信息,这对于检测补充剂与处方药之间的不良相互作用至关重要。了解临床记录中提及补充剂的背景情况很重要,以便能够正确识别当前正在服用补充剂或过去服用过补充剂的患者。我们应用文本挖掘方法将补充剂使用自动分类为四个状态类别:持续使用(C)、已停用(D)、开始使用(S)和未分类(U)。我们将1300个句子手动分类到这些类别中,并进一步分为训练集(1000个句子)和测试集(300个句子)。我们评估了7种特征集和5种算法,最佳模型(带有单字、双字和特定距离内指示词的支持向量机)在测试集上对状态C、D、S、U的F值分别为0.906、0.913、0.914、0.715。本研究证明了使用文本挖掘方法从临床记录中分类补充剂使用状态的可行性。