Esteban Santiago, Rodríguez Tablado Manuel, Ricci Ricardo Ignacio, Terrasa Sergio, Kopitowski Karin
Family and Community Medicine Division, Hospital Italiano de Buenos Aires, Tte. J. D. Peron, 4272, Buenos Aires, Argentina.
Research Department, Instituto Universitario del Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.
BMC Res Notes. 2017 Jul 14;10(1):281. doi: 10.1186/s13104-017-2600-2.
The implementation of electronic medical records (EMR) is becoming increasingly common. Error and data loss reduction, patient-care efficiency increase, decision-making assistance and facilitation of event surveillance, are some of the many processes that EMRs help improve. In addition, they show a lot of promise in terms of data collection to facilitate observational epidemiological studies and their use for this purpose has increased significantly over the recent years. Even though the quantity and availability of the data are clearly improved thanks to EMRs, still, the problem of the quality of the data remains. This is especially important when attempting to determine if an event has actually occurred or not. We sought to assess the sensitivity, specificity, and agreement level of a codes-based algorithm for the detection of clinically relevant cardiovascular (CaVD) and cerebrovascular (CeVD) disease cases, using data from EMRs.
Three family physicians from the research group selected clinically relevant CaVD and CeVD terms from the international classification of primary care, Second Edition (ICPC-2), the ICD 10 version 2015 and SNOMED-CT 2015 Edition. These terms included both signs, symptoms, diagnoses and procedures associated with CaVD and CeVD. Terms not related to symptoms, signs, diagnoses or procedures of CaVD or CeVD and also those describing incidental findings without clinical relevance were excluded. The algorithm yielded a positive result if the patient had at least one of the selected terms in their medical records, as long as it was not recorded as an error. Else, if no terms were found, the patient was classified as negative. This algorithm was applied to a randomly selected sample of the active patients within the hospital's HMO by 1/1/2005 that were 40-79 years old, had at least one year of seniority in the HMO and at least one clinical encounter. Thus, patients were classified into four groups: (1) Negative patients (2) Patients with CaVD but without CeVD; (3) Patients with CeVD but without disease CaVD; (4) Patients with both diseases. To facilitate the validation process, a stratified sample was taken so that each of the groups represented approximately 25% of the sample. Manual chart review was used as the gold standard for assessing the algorithm's performance. One-third of the patients were assigned randomly to each reviewer (Cohen's kappa 0.91). Both coded and un-coded (free text) sections of the EMR were reviewed. This was done from the first present clinical note in the patients chart to the last one registered prior to 1/1/2005.
The performance of the algorithm was compared against manual chart review. It yielded high sensitivity (0.99, 95% CI 0.938-0.9971) and acceptable specificity (0.86, 95% CI 0.818-0.895) for detecting cases of CaVD and CeVD combined. A qualitative analysis of the false positives and false negatives was performed.
We developed a simple algorithm, using only standardized and non-standardized coded terms within an EMR that can properly detect clinically relevant events and symptoms of CaVD and CeVD. We believe that combining it with an analysis of the free text using an NLP approach would yield even better results.
电子病历(EMR)的应用日益普遍。电子病历有助于改进诸多流程,如减少错误和数据丢失、提高患者护理效率、辅助决策以及促进事件监测等。此外,在数据收集方面,电子病历展现出诸多优势,有助于推动观察性流行病学研究,并且近年来其用于该目的的情况显著增加。尽管电子病历明显提升了数据的数量和可得性,但数据质量问题依然存在。在试图确定某事件是否实际发生时,这一问题尤为重要。我们试图利用电子病历数据,评估一种基于编码的算法检测临床相关心血管疾病(CaVD)和脑血管疾病(CeVD)病例的敏感性、特异性及一致性水平。
研究小组的三位家庭医生从《国际初级保健分类》第二版(ICPC - 2)、ICD - 10 2015版以及SNOMED - CT 2015版中选取临床相关的CaVD和CeVD术语。这些术语包括与CaVD和CeVD相关的体征、症状、诊断及程序。排除与CaVD或CeVD的症状、体征、诊断或程序无关的术语,以及描述无临床相关性的偶然发现的术语。若患者病历中至少有一个所选术语,且未被记录为错误,则该算法得出阳性结果。否则,若未发现任何术语,则将患者分类为阴性。该算法应用于2005年1月1日医院健康维护组织(HMO)内年龄在40 - 79岁、在HMO至少有一年资历且至少有一次临床诊疗的随机抽取的活跃患者样本。因此,患者被分为四组:(1)阴性患者;(2)患有CaVD但无CeVD的患者;(3)患有CeVD但无CaVD疾病的患者;(4)患有两种疾病的患者。为便于验证过程,抽取分层样本,使每组约占样本的25%。采用人工病历审查作为评估该算法性能的金标准。三分之一的患者随机分配给每位审查员(科恩kappa系数为0.91)。审查了电子病历的编码部分和未编码(自由文本)部分。这是从患者病历中的首份当前临床记录到2005年1月1日前记录的最后一份记录进行的。
将该算法的性能与人工病历审查进行比较。在检测CaVD和CeVD合并病例时,该算法具有较高的敏感性(0.99,95%置信区间0.938 - 0.9971)和可接受的特异性(0.86,95%置信区间0.818 - 0.895)。对假阳性和假阴性进行了定性分析。
我们开发了一种简单算法,仅使用电子病历中的标准化和非标准化编码术语,可正确检测CaVD和CeVD的临床相关事件和症状。我们认为,将其与使用自然语言处理方法对自由文本的分析相结合会产生更好的结果。