Suppr超能文献

评估在网络搜索日志中筛查肺癌早期迹象患者的可行性。

Evaluation of the Feasibility of Screening Patients for Early Signs of Lung Carcinoma in Web Search Logs.

机构信息

Microsoft Research, Redmond, Washington.

出版信息

JAMA Oncol. 2017 Mar 1;3(3):398-401. doi: 10.1001/jamaoncol.2016.4911.

Abstract

IMPORTANCE

A statistical model that predicts the appearance of strong evidence of a lung carcinoma diagnosis via analysis of large-scale anonymized logs of web search queries from millions of people across the United States.

OBJECTIVE

To evaluate the feasibility of screening patients at risk of lung carcinoma via analysis of signals from online search activity.

DESIGN, SETTING, AND PARTICIPANTS: We identified people who issue special queries that provide strong evidence of a recent diagnosis of lung carcinoma. We then considered patterns of symptoms expressed as searches about concerning symptoms over several months prior to the appearance of the landmark web queries. We built statistical classifiers that predict the future appearance of landmark queries based on the search log signals. This was a retrospective log analysis of the online activity of millions of web searchers seeking health-related information online. Of web searchers who queried for symptoms related to lung carcinoma, some (n = 5443 of 4 813 985) later issued queries that provide strong evidence of recent clinical diagnosis of lung carcinoma and are regarded as positive cases in our analysis. Additional evidence on the reliability of these queries as representing clinical diagnoses is based on the significant increase in follow-on searches for treatments and medications for these searchers and on the correlation between lung carcinoma incidence rates and our log-based statistics. The remaining symptom searchers (n = 4 808 542) are regarded as negative cases.

MAIN OUTCOMES AND MEASURES

Performance of the statistical model for early detection from online search behavior, for different lead times, different sets of signals, and different cohorts of searchers stratified by potential risk.

RESULTS

The statistical classifier predicting the future appearance of landmark web queries based on search log signals identified searchers who later input queries consistent with a lung carcinoma diagnosis, with a true-positive rate ranging from 3% to 57% for false-positive rates ranging from 0.00001 to 0.001, respectively. The methods can be used to identify people at highest risk up to a year in advance of the inferred diagnosis time. The 5 factors associated with the highest relative risk (RR) were evidence of family history (RR = 7.548; 95% CI, 3.937-14.470), age (RR = 3.558; 95% CI, 3.357-3.772), radon (RR = 2.529; 95% CI, 1.137-5.624), primary location (RR = 2.463; 95% CI, 1.364-4.446), and occupation (RR = 1.969; 95% CI, 1.143-3.391). Evidence of smoking (RR = 1.646; 95% CI, 1.032-2.260) was important but not top-ranked, which was due to the difficulty of identifying smoking history from search terms.

CONCLUSIONS AND RELEVANCE

Pattern recognition based on data drawn from large-scale web search queries holds opportunity for identifying risk factors and frames new directions with early detection of lung carcinoma.

摘要

重要性

通过分析来自美国数以百万计的匿名大规模网络搜索查询日志,建立一个可以预测肺癌明确诊断出现的统计模型。

目的

评估通过分析在线搜索活动信号来筛选肺癌高危患者的可行性。

设计、设置和参与者:我们确定了那些发出特殊查询的人,这些查询强烈表明最近被诊断出患有肺癌。然后,我们考虑了在标志性网络查询出现之前数月内表达出有关症状的搜索模式。我们构建了基于搜索日志信号预测未来标志性查询出现的统计分类器。这是对数百万在线搜索者在线搜索健康相关信息的在线活动进行的回顾性日志分析。在搜索与肺癌相关症状的网络搜索者中,有些人(4813985 人中的 5443 人)后来发出了强烈表明最近临床诊断为肺癌的查询,在我们的分析中被视为阳性病例。这些查询作为临床诊断的代表性的更多证据是基于这些搜索者对这些搜索的后续治疗和药物搜索的显著增加,以及基于肺癌发病率的在线搜索行为的统计数据之间的相关性。其余的症状搜索者(4808542 人)被视为阴性病例。

主要结果和测量

基于在线搜索行为的统计模型在不同的提前时间、不同的信号集和按潜在风险分层的不同搜索者队列中进行早期检测的性能。

结果

基于搜索日志信号预测未来标志性网络查询出现的统计分类器识别出了后来输入与肺癌诊断一致的查询的搜索者,假阳性率为 0.00001 至 0.001 时,真阳性率范围为 3%至 57%。这些方法可以用于在推断诊断时间之前长达一年的时间内识别出风险最高的人。与最高相对风险 (RR) 相关的 5 个因素是家族史证据 (RR = 7.548;95%CI,3.937-14.470)、年龄 (RR = 3.558;95%CI,3.357-3.772)、氡 (RR = 2.529;95%CI,1.137-5.624)、原发部位 (RR = 2.463;95%CI,1.364-4.446) 和职业 (RR = 1.969;95%CI,1.143-3.391)。吸烟证据 (RR = 1.646;95%CI,1.032-2.260) 虽然很重要,但并非排名最高,这是因为从搜索词中识别吸烟史具有一定难度。

结论和相关性

基于从大规模网络搜索查询中提取的数据的模式识别为识别肺癌的危险因素和早期检测提供了新的方向。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验