Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN.
Division of Urology, University of Pennsylvania Health System, Philadelphia, PA; Leonard Davis Institute of Health Economics, Philadelphia, PA.
Urology. 2019 Oct;132:56-62. doi: 10.1016/j.urology.2019.07.007. Epub 2019 Jul 13.
To demonstrate the utility of a natural language processing (NLP) algorithm for mining kidney stone composition in a large-scale electronic health records (EHR) repository.
We developed StoneX, a pattern-matching method for extracting kidney stone composition information from clinical notes. We trained the extraction algorithm on manually annotated text mentions of calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, uric acid, and struvite stones. We employed StoneX to identify patients with kidney stone composition data and mine >125 million notes from our institutional EHR. Analyses performed on the extracted patients included stone type conversions over time, survival analysis from a second stone surgery, and disease associations by stone composition to validate the phenotyping method against known associations.
The NLP algorithm identified 45,235 text mentions corresponding to 11,585 patients. Overall, the system achieved positive predictive value >90% for calcium oxalate monohydrate, calcium oxalate dihydrate, hydroxyapatite, brushite, and struvite; except for uric acid (positive predictive value = 87.5%). Survival analysis from a second stone surgery showed statistically significant differences among stone types (P = .03). Several phenotype associations were found: uric acid-type 2 diabetes (odds ratio, OR = 2.69, 95% confidence intervals, CI = 1.91-3.79), struvite-neurogenic bladder (OR = 12.27, 95% CI = 4.33-34.79), struvite-urinary tract infection (OR = 7.36, 95% CI = 3.01-17.99), hydroxyapatite-pulmonary collapse (OR = 3.67, 95% CI = 2.10-6.42), hydroxyapatite-neurogenic bladder (OR = 5.23, 95% CI = 2.05-13.36), brushite-calcium metabolism disorder (OR = 4.59, 95% CI = 2.14-9.81), and brushite-hypercalcemia (OR = 4.09, 95% CI = 1.90-8.80).
NLP extraction of kidney stone composition from large-scale EHRs is feasible with high precision, enabling high-throughput epidemiological studies of kidney stone disease. These tools will enable high fidelity kidney stone research from the EHR.
展示自然语言处理(NLP)算法在大规模电子健康记录(EHR)存储库中挖掘肾结石成分的实用性。
我们开发了 StoneX,这是一种从临床笔记中提取肾结石成分信息的模式匹配方法。我们在手动注释的一水合草酸钙、二水草酸钙、羟基磷灰石、六水合磷酸氢钙、尿酸和鸟粪石结石文本提及物上训练了提取算法。我们使用 StoneX 识别肾结石成分数据的患者,并从我们的机构 EHR 中挖掘超过 1.25 亿条注释。对提取患者进行的分析包括随时间的结石类型转换、第二次结石手术后的生存分析以及按结石成分进行的疾病关联,以验证该表型方法与已知关联的一致性。
NLP 算法识别了 45,235 个对应 11,585 名患者的文本提及。总体而言,该系统对一水合草酸钙、二水草酸钙、羟基磷灰石、六水合磷酸氢钙和鸟粪石的阳性预测值均>90%;尿酸除外(阳性预测值=87.5%)。第二次结石手术后的生存分析显示结石类型之间存在统计学差异(P=0.03)。发现了几种表型关联:尿酸型 2 型糖尿病(比值比,OR=2.69,95%置信区间,CI=1.91-3.79)、鸟粪石-神经源性膀胱(OR=12.27,95%CI=4.33-34.79)、鸟粪石-尿路感染(OR=7.36,95%CI=3.01-17.99)、羟基磷灰石-肺塌陷(OR=3.67,95%CI=2.10-6.42)、羟基磷灰石-神经源性膀胱(OR=5.23,95%CI=2.05-13.36)、六水合磷酸氢钙-钙代谢紊乱(OR=4.59,95%CI=2.14-9.81)和六水合磷酸氢钙-高钙血症(OR=4.09,95%CI=1.90-8.80)。
从大型 EHR 中提取肾结石成分的 NLP 提取具有高精度,可实现肾结石疾病的高通量流行病学研究。这些工具将使 EHR 能够进行高保真度的肾结石研究。