Mowery Danielle L, Kawamoto Kensaku, Bradshaw Rick, Kohlmann Wendy, Schiffman Joshua D, Weir Charlene, Borbolla Damian, Chapman Wendy W, Del Fiol Guilherme
Biomedical Informatics.
Informatics, Decision-Enhancement, and Analytic Sciences (IDEAS) Center, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT.
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:173-181. eCollection 2019.
. Family health history (FHH) can be used to identify individuals at elevated risk for familial cancers. Risk criteria for common cancers rely on age of onset, which is documented inconsistently as structured and unstructured data in electronic health records (EHRs). . To investigate a natural language processing (NLP) approach to extract age of onset and age of death from free-text EHR fields. . Using 474,651 FHH entries from 89,814 patients, we investigated two methods - frequent patterns (baseline) and NLP classifier. . For age of onset, the NLP classifier outperformed the baseline in precision (96% vs. 83%; 95% CI [94, 97] and [80, 86]) with equivalent recall (both 93%; 95% CI [91, 95]). When applied to the full dataset, the NLP approach increased the percentage of FHH entries for which cancer risk criteria could be applied from 10% to 15%. . NLP combined with structured data may improve the computation of familial cancer risk criteria for various use cases.
家族健康史(FHH)可用于识别患家族性癌症风险较高的个体。常见癌症的风险标准依赖于发病年龄,而发病年龄在电子健康记录(EHR)中作为结构化和非结构化数据的记录并不一致。为了研究一种自然语言处理(NLP)方法,以从自由文本EHR字段中提取发病年龄和死亡年龄。使用来自89814名患者的474651条FHH记录,我们研究了两种方法——频繁模式(基线)和NLP分类器。对于发病年龄,NLP分类器在精度上优于基线(96%对83%;95%置信区间[94,97]和[80,86]),召回率相当(均为93%;95%置信区间[91,95])。当应用于完整数据集时,NLP方法将可应用癌症风险标准的FHH记录百分比从10%提高到了15%。NLP与结构化数据相结合可能会改善各种用例下家族性癌症风险标准的计算。