文献检索，用中文搜 PubMed

BACKGROUND

Coronavirus disease 2019 (COVID-19) has spread quickly throughout the United States (US) causing significant disruption in healthcare and society. Tools to identify hot spots are important for public health planning. The goal of our study was to determine if natural language processing (NLP) algorithm assessment of thoracic computed tomography (CT) imaging reports correlated with the incidence of official COVID-19 cases in the US.

METHODS

Using de-identified HIPAA compliant patient data from our common imaging platform interconnected with over 2,100 facilities covering all 50 states, we developed three NLP algorithms to track positive CT imaging features of respiratory illness typical in SARS-CoV-2 viral infection. We compared our findings against the number of official COVID-19 daily, weekly and state-wide.

RESULTS

The NLP algorithms were applied to 450,114 patient chest CT comprehensive reports gathered from January 1 to October 3, 2020. The best performing NLP model exhibited strong correlation with daily official COVID-19 cases (r=0.82, p<0.005). The NLP models demonstrated an early rise in cases followed by the increase of official cases, suggesting the possibility of an early predictive marker, with strong correlation to official cases on a weekly basis (r=0.91, p<0.005). There was also substantial correlation between the NLP and official COVID-19 incidence by state (r=0.92, p<0.005).

CONCLUSION

Using big data, we developed a novel machine-learning based NLP algorithm that can track imaging findings of respiratory illness detected on chest CT imaging reports with strong correlation with the progression of the COVID-19 pandemic in the US.

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

2019年冠状病毒病（COVID-19）已在美国迅速传播，给医疗保健和社会造成了重大破坏。识别热点地区的工具对于公共卫生规划至关重要。我们研究的目的是确定对胸部计算机断层扫描（CT）成像报告进行自然语言处理（NLP）算法评估是否与美国官方COVID-19病例的发病率相关。

方法

利用来自我们通用成像平台的符合HIPAA规定的去识别化患者数据，该平台与覆盖美国所有50个州的2100多家医疗机构相连，我们开发了三种NLP算法，以追踪SARS-CoV-2病毒感染中典型的呼吸道疾病的阳性CT成像特征。我们将研究结果与官方每日、每周和全州范围的COVID-19病例数进行了比较。

结果

NLP算法应用于2020年1月1日至10月3日收集的450114份患者胸部CT综合报告。表现最佳的NLP模型与官方每日COVID-19病例数呈现出强相关性（r=0.82，p<0.005）。NLP模型显示病例数先早期上升，随后官方病例数增加，这表明存在早期预测指标的可能性，且与官方病例数在每周基础上具有强相关性（r=0.91，p<0.005）。NLP与各州官方COVID-19发病率之间也存在显著相关性（r=0.92，p<0.005）。