Gong Mengchun, Yu Yue, Ouyang Zihao, Shi Wenzhao, Liu Chao, Wang Qilin, Nan Jiale, Cai Endi, Ding Fen, Nie Sheng
School of Biomedical Engineering, Guangdong Medical University, Dongguan, China.
Digital Health China Technologies Co., Ltd., Beijing, China.
Sci Rep. 2025 Jan 8;15(1):1296. doi: 10.1038/s41598-024-84658-9.
The comprehensive adoption of Electronic Medical Records (EMRs) offers numerous benefits but also introduces risks of privacy leakage, particularly for patients with Sexually Transmitted Infections (STI) who need protection from social secondary harm. Despite advancements in privacy protection research, the effectiveness of these strategies in real-world data remains debatable. The objective is to develop effective information extraction and privacy protection strategies to safeguard STI patients in the Chinese healthcare environment and prevent unnecessary privacy leakage during the data-sharing process of EMRs. The research was conducted at a national healthcare data center, where a committee of experts designed rule-based protocols utilizing natural language processing techniques to extract STI information. Extraction Protocol of Sexually Transmitted Infections Information (EPSTII), designed specifically for the Chinese EMRs system, enables accurate and complete identification and extraction of STI-related information, ensuring high protection performance. The protocol was refined multiple times based on the calculated precision and recall. Final protocol was applied to 5,000 randomly selected EMRs to calculate the success rate of privacy protection. A total of 3,233,174 patients were selected based on the inclusion criteria and a 50% entry ratio. Of these, 148,856 patients with sensitive STI information were identified from disease history. The identification frequency varied, with the diagnosis sub-dataset being the highest at 4.8%. Both the precision and recall rates have reached over 95%, demonstrating the effectiveness of our method. The success rate of privacy protection was 98.25%, ensuring the utmost privacy protection for patients with STI. Finding an effective method to protect privacy information in EMRs is meaningful. We demonstrated the feasibility of applying the EPSTII method to EMRs. Our protocol offers more comprehensive results compared to traditional methods of including STI information.
全面采用电子病历(EMR)有诸多益处,但也带来了隐私泄露的风险,尤其是对于性传播感染(STI)患者而言,他们需要防范社会二次伤害。尽管隐私保护研究取得了进展,但这些策略在实际数据中的有效性仍存在争议。目的是制定有效的信息提取和隐私保护策略,以保护中国医疗环境中的性传播感染患者,并防止在电子病历数据共享过程中出现不必要的隐私泄露。该研究在一个国家医疗数据中心进行,由专家委员会利用自然语言处理技术设计基于规则的协议来提取性传播感染信息。专门为中国电子病历系统设计的性传播感染信息提取协议(EPSTII)能够准确、完整地识别和提取与性传播感染相关的信息,确保高度的保护性能。该协议根据计算出的精确率和召回率进行了多次优化。最终协议应用于随机选取的5000份电子病历,以计算隐私保护的成功率。根据纳入标准和50%的录入比例,共选取了3233174名患者。其中,从病史中识别出148856名患有敏感性传播感染信息的患者。识别频率各不相同,诊断子数据集最高,为4.8%。精确率和召回率均达到95%以上,证明了我们方法的有效性。隐私保护成功率为98.25%,确保了对性传播感染患者的最大隐私保护。找到一种有效的方法来保护电子病历中的隐私信息具有重要意义。我们证明了将EPSTII方法应用于电子病历的可行性。与包含性传播感染信息的传统方法相比,我们的协议提供了更全面的结果。