Onega Tracy, Kamra Dharmanshu, Alford-Teaster Jennifer, Hassanpour Saeed
Tracy Onega, Dharmanshu Kamra, Jennifer Alford-Teaster, Saeed Hassanpour, Geisel School of Medicine, Dartmouth College; and Saeed Hassanpour, Dartmouth College, Hanover, NH.
JCO Clin Cancer Inform. 2018 Dec;2:1-10. doi: 10.1200/CCI.17.00150.
To our knowledge, integration of Web content mining of publicly available addresses with a geographic information system (GIS) has not been applied to the timely monitoring of medical technology adoption. Here, we explore the diffusion of a new breast imaging technology, digital breast tomosynthesis (DBT).
We used natural language processing and machine learning to extract DBT facility location information using a set of potential sites for the New England region of the United States via a Google search application program interface. We assessed the accuracy of the algorithm using a validated set of publicly available addresses of locations that provide DBT from the DBT technology vendor, Hologic. We quantified precision, recall, and F1 score, aiming for an F1 score of ≥ 95% as the desirable performance. By reverse geocoding on the basis of the results of the Google Maps application program interface, we derived a spatial data set for use in an ArcGIS environment. Within the GIS, a host of spatiotemporal analyses and geovisualization techniques are possible.
We developed a semiautomated system that integrated DBT location information into a GIS that was feasible and of reasonable quality. Initial accuracy of the algorithm was poor using only a search term list for information retrieval (precision, 35%; recall, 44%; F1 score, 39%), but performance dramatically improved by leveraging natural language processing and simple machine learning techniques to isolate single, valid instances of DBT location information (precision, 92%; recall, 96%; F1 score, 94%). Reverse geocoding yielded reliable geographic coordinates for easy implementation into a GIS for mapping and planned monitoring.
Our novel approach can be applicable to technologies beyond DBT, which may inform equitable access over time and space.
据我们所知,将公开可用地址的网络内容挖掘与地理信息系统(GIS)相结合尚未应用于医疗技术采用情况的及时监测。在此,我们探讨一种新的乳腺成像技术——数字乳腺断层合成(DBT)的扩散情况。
我们使用自然语言处理和机器学习,通过谷歌搜索应用程序接口,从美国新英格兰地区的一组潜在地点中提取DBT设备位置信息。我们使用来自DBT技术供应商Hologic的一组经过验证的公开可用的DBT设备地址,评估了该算法的准确性。我们对精确率、召回率和F1分数进行了量化,目标是F1分数≥95%作为理想性能。通过基于谷歌地图应用程序接口的结果进行反向地理编码,我们得出了一个用于ArcGIS环境的空间数据集。在GIS内,可以进行一系列时空分析和地理可视化技术。
我们开发了一个半自动系统,将DBT位置信息集成到一个可行且质量合理的GIS中。仅使用搜索词列表进行信息检索时,算法的初始准确性较差(精确率35%;召回率44%;F1分数39%),但通过利用自然语言处理和简单机器学习技术来分离DBT位置信息的单个有效实例,性能显著提高(精确率92%;召回率96%;F1分数94%)。反向地理编码产生了可靠的地理坐标,便于在GIS中进行映射和规划监测。
我们的新方法可应用于DBT以外的技术,这可能有助于在时间和空间上实现公平获取。