• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从讣告构建族谱知识图谱:多任务神经网络提取系统。

Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System.

机构信息

School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, China.

National Engineering Lab for Big Data Analytics, Xi'an Jiaotong University, Xi'an, China.

出版信息

J Med Internet Res. 2021 Aug 4;23(8):e25670. doi: 10.2196/25670.

DOI:10.2196/25670
PMID:34346903
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8374669/
Abstract

BACKGROUND

Genealogical information, such as that found in family trees, is imperative for biomedical research such as disease heritability and risk prediction. Researchers have used policyholder and their dependent information in medical claims data and emergency contacts in electronic health records (EHRs) to infer family relationships at a large scale. We have previously demonstrated that online obituaries can be a novel data source for building more complete and accurate family trees.

OBJECTIVE

Aiming at supplementing EHR data with family relationships for biomedical research, we built an end-to-end information extraction system using a multitask-based artificial neural network model to construct genealogical knowledge graphs (GKGs) from online obituaries. GKGs are enriched family trees with detailed information including age, gender, death and birth dates, and residence.

METHODS

Built on a predefined family relationship map consisting of 4 types of entities (eg, people's name, residence, birth date, and death date) and 71 types of relationships, we curated a corpus containing 1700 online obituaries from the metropolitan area of Minneapolis and St Paul in Minnesota. We also adopted data augmentation technology to generate additional synthetic data to alleviate the issue of data scarcity for rare family relationships. A multitask-based artificial neural network model was then built to simultaneously detect names, extract relationships between them, and assign attributes (eg, birth dates and death dates, residence, age, and gender) to each individual. In the end, we assemble related GKGs into larger ones by identifying people appearing in multiple obituaries.

RESULTS

Our system achieved satisfying precision (94.79%), recall (91.45%), and F-1 measures (93.09%) on 10-fold cross-validation. We also constructed 12,407 GKGs, with the largest one made up of 4 generations and 30 people.

CONCLUSIONS

In this work, we discussed the meaning of GKGs for biomedical research, presented a new version of a corpus with a predefined family relationship map and augmented training data, and proposed a multitask deep neural system to construct and assemble GKGs. The results show our system can extract and demonstrate the potential of enriching EHR data for more genetic research. We share the source codes and system with the entire scientific community on GitHub without the corpus for privacy protection.

摘要

背景

族谱信息,如家谱中所发现的,对于疾病遗传率和风险预测等生物医学研究至关重要。研究人员曾使用医疗保险参保人和其家属信息在医疗理赔数据中以及电子健康记录 (EHR) 中的紧急联系人来大规模推断家庭关系。我们之前曾证明,在线讣告可以成为构建更完整、更准确家谱的新数据源。

目的

旨在通过补充 EHR 数据中的家庭关系来进行生物医学研究,我们使用基于多任务的人工神经网络模型构建了一个端到端信息提取系统,从在线讣告中构建基因族谱知识图 (GKG)。GKG 是一种详细信息丰富的家谱,包括年龄、性别、死亡和出生日期以及居住地。

方法

我们构建在一个预定义的家庭关系图上,该图由 4 种实体(例如人名、居住地、出生日期和死亡日期)和 71 种关系组成,我们从明尼苏达州明尼阿波利斯和圣保罗的大都市区整理了包含 1700 份在线讣告的语料库。我们还采用了数据扩充技术来生成额外的合成数据,以缓解稀有家庭关系的数据稀缺问题。然后,我们构建了一个基于多任务的人工神经网络模型,以同时检测人名、提取人名之间的关系,并为每个人分配属性(例如出生日期和死亡日期、居住地、年龄和性别)。最后,我们通过识别出现在多个讣告中的人,将相关的 GKG 组装成更大的 GKG。

结果

我们的系统在 10 折交叉验证中达到了令人满意的精度(94.79%)、召回率(91.45%)和 F1 度量值(93.09%)。我们还构建了 12407 个 GKG,其中最大的一个由 4 代 30 人组成。

结论

在这项工作中,我们讨论了 GKG 对生物医学研究的意义,提出了一个带有预定义家庭关系图和扩充训练数据的新版本语料库,并提出了一个多任务深度神经网络系统来构建和组装 GKG。结果表明,我们的系统可以提取和展示丰富 EHR 数据以进行更多遗传研究的潜力。我们为了隐私保护没有在 GitHub 上共享语料库,而是向整个科学界共享了源代码和系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/0c9171e977e7/jmir_v23i8e25670_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/6f26ed4d45eb/jmir_v23i8e25670_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/1dc0a3fc3759/jmir_v23i8e25670_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/b0d1446c48ac/jmir_v23i8e25670_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/9509e7b90de0/jmir_v23i8e25670_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/ffd8313fb6db/jmir_v23i8e25670_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/36e1801bf99d/jmir_v23i8e25670_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/0c9171e977e7/jmir_v23i8e25670_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/6f26ed4d45eb/jmir_v23i8e25670_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/1dc0a3fc3759/jmir_v23i8e25670_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/b0d1446c48ac/jmir_v23i8e25670_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/9509e7b90de0/jmir_v23i8e25670_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/ffd8313fb6db/jmir_v23i8e25670_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/36e1801bf99d/jmir_v23i8e25670_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6fa/8374669/0c9171e977e7/jmir_v23i8e25670_fig7.jpg

相似文献

1
Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System.从讣告构建族谱知识图谱:多任务神经网络提取系统。
J Med Internet Res. 2021 Aug 4;23(8):e25670. doi: 10.2196/25670.
2
Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation.从 COVID-19 文献、LitCovid 和 Pubtator 中深度去噪原始生物医学知识图谱:框架开发和验证。
J Med Internet Res. 2022 Jul 6;24(7):e38584. doi: 10.2196/38584.
3
Construction of a knowledge graph for breast cancer diagnosis based on Chinese electronic medical records: development and usability study.基于中文电子病历构建乳腺癌诊断知识图谱:开发与可用性研究。
BMC Med Inform Decis Mak. 2023 Oct 10;23(1):210. doi: 10.1186/s12911-023-02322-0.
4
Graph Neural Network-Based Diagnosis Prediction.基于图神经网络的诊断预测。
Big Data. 2020 Oct;8(5):379-390. doi: 10.1089/big.2020.0070. Epub 2020 Aug 12.
5
A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records.一种用于中文电子病历命名实体识别的多任务双向 RNN 模型。
BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):499. doi: 10.1186/s12859-018-2467-9.
6
Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。
J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.
7
JCBIE: a joint continual learning neural network for biomedical information extraction.JCBIE:一种用于生物医学信息提取的联合持续学习神经网络。
BMC Bioinformatics. 2022 Dec 19;23(1):549. doi: 10.1186/s12859-022-05096-w.
8
Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study.面向电子健康记录的医学知识图谱构建、补全与应用:文献研究。
J Biomed Inform. 2023 Jul;143:104403. doi: 10.1016/j.jbi.2023.104403. Epub 2023 May 24.
9
Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods.基于集成深度学习方法的电子健康记录中的药物不良反应和药物关系提取。
J Am Med Inform Assoc. 2020 Jan 1;27(1):39-46. doi: 10.1093/jamia/ocz101.
10
Constructing a Chinese electronic medical record corpus for named entity recognition on resident admit notes.构建用于住院记录中命名实体识别的中文电子病历语料库。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):56. doi: 10.1186/s12911-019-0759-2.

引用本文的文献

1
Automated Extraction of Mortality Information From Publicly Available Sources Using Large Language Models: Development and Evaluation Study.使用大语言模型从公开可用来源自动提取死亡率信息:开发与评估研究
J Med Internet Res. 2025 Aug 18;27:e71113. doi: 10.2196/71113.
2
Health Care Language Models and Their Fine-Tuning for Information Extraction: Scoping Review.医疗保健语言模型及其在信息提取方面的微调:范围综述。
JMIR Med Inform. 2024 Oct 21;12:e60164. doi: 10.2196/60164.
3
Sublinear information bottleneck based two-stage deep learning approach to genealogy layout recognition.

本文引用的文献

1
Disease Heritability Inferred from Familial Relationships Reported in Medical Records.从医疗记录中报告的家族关系推断出的疾病遗传率。
Cell. 2018 Jun 14;173(7):1692-1704.e11. doi: 10.1016/j.cell.2018.04.032. Epub 2018 May 17.
2
Classification of common human diseases derived from shared genetic and environmental determinants.源自共同遗传和环境决定因素的常见人类疾病分类。
Nat Genet. 2017 Sep;49(9):1319-1325. doi: 10.1038/ng.3931. Epub 2017 Aug 7.
3
The Medicare Access And CHIP Reauthorization Act And The Corporate Transformation Of American Medicine.
基于亚线性信息瓶颈的两阶段深度学习方法用于族谱布局识别。
Front Neurosci. 2023 Jun 30;17:1230786. doi: 10.3389/fnins.2023.1230786. eCollection 2023.
4
JCBIE: a joint continual learning neural network for biomedical information extraction.JCBIE:一种用于生物医学信息提取的联合持续学习神经网络。
BMC Bioinformatics. 2022 Dec 19;23(1):549. doi: 10.1186/s12859-022-05096-w.
5
Construction and application of COVID-19 infectors activity information knowledge graph.构建及应用 COVID-19 感染活性信息知识图谱。
Comput Biol Med. 2022 Sep;148:105908. doi: 10.1016/j.compbiomed.2022.105908. Epub 2022 Jul 19.
《平价医疗法案和儿童健康保险计划再授权法案与美国医疗的企业转型》。
Health Aff (Millwood). 2017 May 1;36(5):865-869. doi: 10.1377/hlthaff.2016.1536.
4
Ethics and Privacy Implications of Using the Internet and Social Media to Recruit Participants for Health Research: A Privacy-by-Design Framework for Online Recruitment.利用互联网和社交媒体招募健康研究参与者的伦理与隐私问题:在线招募的设计即隐私框架
J Med Internet Res. 2017 Apr 6;19(4):e104. doi: 10.2196/jmir.7029.
5
A neural joint model for entity and relation extraction from biomedical text.一种用于从生物医学文本中提取实体和关系的神经联合模型。
BMC Bioinformatics. 2017 Mar 31;18(1):198. doi: 10.1186/s12859-017-1609-9.
6
A novel web informatics approach for automated surveillance of cancer mortality trends.一种用于癌症死亡率趋势自动监测的新型网络信息学方法。
J Biomed Inform. 2016 Jun;61:110-8. doi: 10.1016/j.jbi.2016.03.027. Epub 2016 Apr 1.
7
Residential Mobility and Lung Cancer Risk: Data-Driven Exploration Using Internet Sources.居住流动性与肺癌风险:利用互联网资源进行数据驱动的探索
Soc Comput Behav Cult Model Predict (2015). 2015 Mar-Apr;9021:464-469. doi: 10.1007/978-3-319-16268-3_60. Epub 2015 Mar 17.
8
Ethical Issues of Social Media Usage in Healthcare.医疗保健领域社交媒体使用的伦理问题。
Yearb Med Inform. 2015 Aug 13;10(1):137-47. doi: 10.15265/IY-2015-001.
9
Use of an electronic medical record to create the marshfield clinic twin/multiple birth cohort.利用电子病历创建马什菲尔德诊所双胞胎/多胞胎队列。
Genet Epidemiol. 2014 Dec;38(8):692-8. doi: 10.1002/gepi.21855. Epub 2014 Sep 22.
10
Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project.利用病历链接系统对动态人群进行随时间的计数:罗切斯特流行病学项目。
Am J Epidemiol. 2011 May 1;173(9):1059-68. doi: 10.1093/aje/kwq482. Epub 2011 Mar 23.