Center for Quantitative Medicine, University of Connecticut Health Center, Farmington, CT, 06030, USA.
Office of the Vice President for Research, University of Connecticut, Storrs, CT, 06269, USA.
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):80. doi: 10.1186/s12911-019-0788-x.
Accurate information in provider directories are vital in health care including health information exchange, health benefits exchange, quality reporting, and in the reimbursement and delivery of care. Maintaining provider directory data and keeping it up to date is challenging. The objective of this study is to determine the feasibility of using natural language processing (NLP) techniques to combine disparate resources and acquire accurate information on health providers.
Publically available state licensure lists in Connecticut were obtained along with National Plan and Provider Enumeration System (NPPES) public use files. Connecticut licensure lists textual information of each health professional who is licensed to practice within the state. A NLP-based system was developed based on healthcare provider taxonomy code, location, name and address information to identify textual data within the state and federal records. Qualitative and quantitative evaluation were performed, and the recall and precision were calculated.
We identified nurse midwives, nurse practitioners, and dentists in the State of Connecticut. The recall and precision were 0.95 and 0.93 respectively. Using the system, we were able to accurately acquire 6849 of the 7177 records of health provider directory information.
The authors demonstrated that the NLP- based approach was effective at acquiring health provider information. Furthermore, the NLP-based system can always be applied to update information further reducing processing burdens as data changes.
准确的供应商目录信息对于医疗保健至关重要,包括健康信息交换、健康福利交换、质量报告,以及在报销和提供护理方面。维护供应商目录数据并保持其最新状态具有挑战性。本研究的目的是确定使用自然语言处理(NLP)技术来组合不同资源并获取有关医疗保健提供者的准确信息的可行性。
获取康涅狄格州的公开州许可清单以及国家计划和提供者编目系统(NPPES)公共使用文件。康涅狄格州许可清单包含在该州获得行医许可的每位医疗专业人员的文本信息。基于 NLP 的系统是根据医疗保健提供者分类代码、位置、姓名和地址信息开发的,用于识别州和联邦记录中的文本数据。进行了定性和定量评估,并计算了召回率和精度。
我们在康涅狄格州确定了护士助产士、执业护士和牙医。召回率和精度分别为 0.95 和 0.93。使用该系统,我们能够准确地获取 6849 条健康提供者目录信息记录中的 7177 条记录。
作者证明了基于 NLP 的方法在获取医疗保健提供者信息方面是有效的。此外,基于 NLP 的系统可以随时应用于进一步更新信息,从而随着数据的变化减少处理负担。