Yale School of Medicine, New Haven, CT, USA.
VA Connicticut Healthcare System, West Haven, CT, USA.
Health Informatics J. 2024 Oct-Dec;30(4):14604582241295930. doi: 10.1177/14604582241295930.
To develop and test an NLP algorithm that accurately detects the presence of information reported from DXA scans containing femoral neck T-scores of the patients scanned. A rule-based NLP algorithm that iteratively built a collection of regular expressions in testing data consisting of 889 snippets of text pulled from DXA reports. This was manually checked by clinical experts to determine the proportion of manually verified annotations that contained T-score information detected by this algorithm called 'BoneScore'. Testing of 30- and 50-word lengths on each side of the key term 'femoral' were pursued until achievement of adequate accuracy. A separate clinical validation regressed the extracted T-score values on five risk factors with established associations. BoneScore built a set of 20 regular expressions that in concert with a width of 50 words on each side of the key term yielded an accuracy of 98% in the testing data. The extracted T-scores, when modeled with multivariable linear regression, consistently exhibited associations supported by the literature. BoneScore uses regular expressions to accurately extract annotations of T-score values of bone mineral density with a width of 50 words on each side of the key term. The extracted T-scores exhibit clinical face validity.
开发并测试一种自然语言处理算法,以准确检测从包含扫描患者股骨颈 T 值的 DXA 扫描报告中提取的信息。一种基于规则的自然语言处理算法,在包含 889 个从 DXA 报告中提取的文本片段的测试数据中迭代构建一组正则表达式。这些正则表达式由临床专家手动检查,以确定由算法“BoneScore”检测到的手动验证注释中包含 T 值信息的比例。对“股骨”关键词两侧的 30 字和 50 字长度进行了测试,直到达到足够的准确性。一个单独的临床验证将提取的 T 值与五个具有明确关联的风险因素进行了回归分析。BoneScore 构建了一组 20 个正则表达式,与关键词两侧各 50 个字的宽度相结合,在测试数据中的准确率达到 98%。当使用多变量线性回归对提取的 T 值进行建模时,始终表现出与文献一致的关联。BoneScore 使用正则表达式从包含关键字两侧各 50 个字的宽度的骨密度 T 值注释中准确提取 T 值注释。提取的 T 值具有临床表面有效性。