Zheng Jiaping, Yarzebski Jorge, Ramesh Balaji Polepalli, Goldberg Robert J, Yu Hong
University of Massachusetts, Amherst, MA.
University of Massachusetts Medical School, Worcester, MA.
AMIA Annu Symp Proc. 2014 Nov 14;2014:1286-93. eCollection 2014.
The Worcester Heart Attack Study (WHAS) is a population-based surveillance project examining trends in the incidence, in-hospital, and long-term survival rates of acute myocardial infarction (AMI) among residents of central Massachusetts. It provides insights into various aspects of AMI. Much of the data has been assessed manually. We are developing supervised machine learning approaches to automate this process. Since the existing WHAS data cannot be used directly for an automated system, we first annotated the AMI information in electronic health records (EHR). With strict inter-annotator agreement over 0.74 and un-strict agreement over 0.9 of Cohen's κ, we annotated 105 EHR discharge summaries (135k tokens). Subsequently, we applied the state-of-the-art supervised machine-learning model, Conditional Random Fields (CRFs) for AMI detection. We explored different approaches to overcome the data sparseness challenge and our results showed that cluster-based word features achieved the highest performance.
伍斯特心脏病发作研究(WHAS)是一个基于人群的监测项目,旨在研究马萨诸塞州中部居民急性心肌梗死(AMI)的发病率、住院期间及长期生存率趋势。该研究为AMI的各个方面提供了见解。大部分数据都是人工评估的。我们正在开发监督式机器学习方法来实现这一过程的自动化。由于现有的WHAS数据不能直接用于自动化系统,我们首先对电子健康记录(EHR)中的AMI信息进行了标注。在科恩κ系数的严格标注者间一致性超过0.74且非严格一致性超过0.9的情况下,我们标注了105份EHR出院小结(13.5万个词元)。随后,我们应用了最先进的监督式机器学习模型——条件随机场(CRF)来进行AMI检测。我们探索了不同方法来克服数据稀疏性挑战,结果表明基于聚类的词特征取得了最高性能。