Yu Shun, Le Anh, Feld Emily, Schriver Emily, Gabriel Peter, Doucette Abigail, Narayan Vivek, Feldman Michael, Schwartz Lauren, Maxwell Kara, Mowery Danielle
University of Pennsylvania Health System, Philadelphia, PA, United States.
Perelman School of Medicine, Philadelphia, PA, United States.
JMIR Cancer. 2021 Jul 2;7(3):e27970. doi: 10.2196/27970.
Natural language processing (NLP) offers significantly faster variable extraction compared to traditional human extraction but cannot interpret complicated notes as well as humans can. Thus, we hypothesized that an "NLP-assisted" extraction system, which uses humans for complicated notes and NLP for uncomplicated notes, could produce faster extraction without compromising accuracy.
The aim of this study was to develop and pilot an NLP-assisted extraction system to leverage the strengths of both human and NLP extraction of prostate cancer Gleason scores.
We collected all available clinical and pathology notes for prostate cancer patients in an unselected academic biobank cohort. We developed an NLP system to extract prostate cancer Gleason scores from both clinical and pathology notes. Next, we designed and implemented the NLP-assisted extraction system algorithm to categorize notes into "uncomplicated" and "complicated" notes. Uncomplicated notes were assigned to NLP extraction and complicated notes were assigned to human extraction. We randomly reviewed 200 patients to assess the accuracy and speed of our NLP-assisted extraction system and compared it to NLP extraction alone and human extraction alone.
Of the 2051 patients in our cohort, the NLP system extracted a prostate surgery Gleason score from 1147 (55.92%) patients and a prostate biopsy Gleason score from 1624 (79.18%) patients. Our NLP-assisted extraction system had an overall accuracy rate of 98.7%, which was similar to the accuracy of human extraction alone (97.5%; P=.17) and significantly higher than the accuracy of NLP extraction alone (95.3%; P<.001). Moreover, our NLP-assisted extraction system reduced the workload of human extractors by approximately 95%, resulting in an average extraction time of 12.7 seconds per patient (vs 256.1 seconds per patient for human extraction alone).
We demonstrated that an NLP-assisted extraction system was able to achieve much faster Gleason score extraction compared to traditional human extraction without sacrificing accuracy.
与传统人工提取相比,自然语言处理(NLP)能显著加快变量提取速度,但在解读复杂病历方面不如人类。因此,我们推测一种“NLP辅助”提取系统,即对复杂病历采用人工提取、对简单病历采用NLP提取,能够在不影响准确性的前提下加快提取速度。
本研究旨在开发并试行一种NLP辅助提取系统,以利用人工和NLP提取前列腺癌Gleason评分的优势。
我们在一个未经过筛选的学术生物样本队列中收集了所有前列腺癌患者的临床和病理记录。我们开发了一个NLP系统,用于从临床和病理记录中提取前列腺癌Gleason评分。接下来,我们设计并实施了NLP辅助提取系统算法,将记录分为“简单”和“复杂”记录。简单记录交由NLP提取,复杂记录则交由人工提取。我们随机抽取200名患者进行评估,以检验我们的NLP辅助提取系统的准确性和速度,并将其与单纯的NLP提取和单纯的人工提取进行比较。
在我们的队列中的2051名患者中,NLP系统从1147名(55.92%)患者中提取了前列腺手术Gleason评分,从1624名(79.18%)患者中提取了前列腺活检Gleason评分。我们的NLP辅助提取系统的总体准确率为98.7%,这与单纯人工提取的准确率(97.5%;P = 0.17)相似,且显著高于单纯NLP提取的准确率(95.3%;P < 0.001)。此外,我们的NLP辅助提取系统将人工提取人员的工作量减少了约95%,每位患者的平均提取时间为12.7秒(而单纯人工提取为每位患者256.1秒)。
我们证明,与传统人工提取相比,NLP辅助提取系统能够在不牺牲准确性的情况下更快地提取Gleason评分。