Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Center for Pharmacoepidemiology Research and Training, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Clin Transl Sci. 2018 Jan;11(1):85-92. doi: 10.1111/cts.12514.
Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource-intense and not readily scalable. Informatics-based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule-based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor-positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart-reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR-based precision medicine phenotypes.
精准医学处于生物医学研究的前沿。癌症登记处提供了丰富的视角,电子健康记录(EHR)通常被用于收集转化研究所需的其他临床数据元素。然而,手动注释需要大量资源,且难以扩展。基于信息学的表型分析提供了一个理想的解决方案,但所获得的视角可能会受到数据源和算法选择的影响。我们使用基于规则的算法(包括自然语言处理(NLP))从结构化和非结构化的 EHR 数据中推导出乳腺癌(BC)受体状态表型。总的来说,与仅使用结构化药物信息相比,NLP 的使用将 BC 受体状态的覆盖率从 69.1%提高了 39.2%。使用所有可用的 EHR 数据,与金标准的图表审查患者相比,雌激素受体阳性的 BC 病例具有很高的精确性(P=0.976)和召回率(R=0.987)。然而,仅依靠结构化药物时,状态否定率(R=0.591)降低了 40.2%。为了推导出基于 EHR 的精准医学表型,需要使用多种 EHR 数据类型(并充分了解所提供的视角)。