• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于文献的全文本生成、特征加权的遗传性发育障碍疾病模型的建立与评估。

Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders.

机构信息

MRC Human Genetics Unit, Western General Hospital, Institute of Genetics and Cancer, The University of Edinburgh, Crewe Road South, Edinburgh EH4 2XU, UK.

Transforming Genetic Medicine Initiative, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

出版信息

Database (Oxford). 2022 Jun 7;2022. doi: 10.1093/database/baac038.

DOI:10.1093/database/baac038
PMID:35670729
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9216525/
Abstract

There are >2500 different genetically determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for the extraction of categorical phenotypic descriptors from the full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-84% precision and 65-73% recall. Mean terms per paper increased from 9 in title + abstract, to 68 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than widely used manually curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. The area under the curve for receiver operating characteristic (ROC) curves increased by 5-10% through the use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines. Database URL: https://doi.org/10.1093/database/baac038.

摘要

有超过 2500 种不同的遗传性发育障碍(DD),这些障碍在基因座和等位基因异质性方面都表现出非常高的水平。这导致了广泛使用基于证据的全基因组序列数据过滤作为 DD 的诊断工具。确定在特定基因座过滤的变异与表型之间的关联是否是先证者表型的合理解释至关重要,通常需要临床科学家和临床医生进行广泛的手动文献综述。访问从严格编辑的文献中提取的加权临床特征数据库将提高该过程的效率,并有助于开发稳健的表型相似性度量标准。然而,鉴于已发表信息的数量庞大且增长迅速,传统的生物注释方法变得不切实际。在这里,我们提出了一种可扩展的、自动化的从全文文献中提取分类表型描述符的方法。通过文献综述确定的论文通过 Cadmus 自定义检索包下载并解析。使用 MetaMap 提取人类表型本体论术语,具有 76-84%的精度和 65-73%的召回率。每篇论文的平均术语数从标题+摘要中的 9 个增加到使用全文时的 68 个。我们通过与 Deciphering Developmental Disorders 研究中前瞻性收集的数据进行比较,证明这些从文献中得出的疾病模型比广泛使用的手动编辑模型更能准确地反映真实的疾病表现。通过使用文献衍生模型,接收者操作特征(ROC)曲线下的面积增加了 5-10%。这项工作表明,可扩展的自动化文献注释提高了性能,并增加了将这种策略集成到信息变体分析管道中的必要性。数据库 URL:https://doi.org/10.1093/database/baac038。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/658a979df723/baac038f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/8efaeb60e331/baac038f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/9bdf8cc25666/baac038f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/424dd2476a36/baac038f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/f91b50520260/baac038f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/48025e64b3ac/baac038f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/658a979df723/baac038f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/8efaeb60e331/baac038f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/9bdf8cc25666/baac038f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/424dd2476a36/baac038f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/f91b50520260/baac038f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/48025e64b3ac/baac038f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caff/9216525/658a979df723/baac038f6.jpg

相似文献

1
Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders.基于文献的全文本生成、特征加权的遗传性发育障碍疾病模型的建立与评估。
Database (Oxford). 2022 Jun 7;2022. doi: 10.1093/database/baac038.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.Textpresso 中心:一个可定制的平台,用于搜索、文本挖掘、查看和管理生物医学文献。
BMC Bioinformatics. 2018 Mar 9;19(1):94. doi: 10.1186/s12859-018-2103-8.
4
miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.miRiaD:一种用于检测微小RNA与疾病关联的文本挖掘工具。
J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.
5
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.文本挖掘有助于数据库管理——从生物医学文献中提取突变与疾病的关联。
BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x.
6
IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders.IMPROVE-DD:整合多种表型资源可优化遗传所致发育障碍中的变异评估。
HGG Adv. 2022 Nov 24;4(1):100162. doi: 10.1016/j.xhgg.2022.100162. eCollection 2023 Jan 12.
7
Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.文本挖掘有效地对文献进行评分和排序,以提高比较毒理学基因组学数据库中的化学物质-基因-疾病的编纂工作。
PLoS One. 2013 Apr 17;8(4):e58201. doi: 10.1371/journal.pone.0058201. Print 2013.
8
Integrating image caption information into biomedical document classification in support of biocuration.将图像标题信息整合到生物医学文献分类中,以支持生物注释。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa024.
9
Machine learning approach to literature mining for the genetics of complex diseases.基于机器学习的复杂疾病遗传学文献挖掘方法。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz124.
10
Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.从文本和大规模数据分析中提取基因与疾病之间的关系:对转化研究的启示。
BMC Bioinformatics. 2015 Feb 21;16:55. doi: 10.1186/s12859-015-0472-9.

引用本文的文献

1
Phenotypic spectrum of dual diagnoses in developmental disorders.发育障碍中双重诊断的表型谱。
Am J Hum Genet. 2024 Nov 7;111(11):2382-2391. doi: 10.1016/j.ajhg.2024.08.025. Epub 2024 Sep 30.

本文引用的文献

1
The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms.罕见病语料库:一个标注了罕见病、其症状和体征的语料库。
J Biomed Inform. 2022 Jan;125:103961. doi: 10.1016/j.jbi.2021.103961. Epub 2021 Dec 5.
2
PheneBank: a literature-based database of phenotypes.PheneBank:基于文献的表型数据库。
Bioinformatics. 2022 Jan 27;38(4):1179-1180. doi: 10.1093/bioinformatics/btab740.
3
Text mining of gene-phenotype associations reveals new phenotypic profiles of autism-associated genes.基因-表型关联的文本挖掘揭示了自闭症相关基因的新表型特征。
Sci Rep. 2021 Jul 27;11(1):15269. doi: 10.1038/s41598-021-94742-z.
4
PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology.PhenoTagger:一种使用人类表型本体进行表型概念识别的混合方法。
Bioinformatics. 2021 Jul 27;37(13):1884-1890. doi: 10.1093/bioinformatics/btab019.
5
The Human Phenotype Ontology in 2021.2021 年人类表型本体论。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1207-D1217. doi: 10.1093/nar/gkaa1043.
6
Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。
Nucleic Acids Res. 2021 Jan 8;49(D1):D10-D17. doi: 10.1093/nar/gkaa892.
7
Evidence for 28 genetic disorders discovered by combining healthcare and research data.通过整合医疗保健和研究数据发现了 28 种遗传疾病的证据。
Nature. 2020 Oct;586(7831):757-762. doi: 10.1038/s41586-020-2832-5. Epub 2020 Oct 14.
8
Interpretable Clinical Genomics with a Likelihood Ratio Paradigm.基于似然比范式的可解释临床基因组学
Am J Hum Genet. 2020 Sep 3;107(3):403-417. doi: 10.1016/j.ajhg.2020.06.021. Epub 2020 Aug 4.
9
The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.2019 年君主计划:一个整合的数据和分析平台,连接不同物种的表型与基因型。
Nucleic Acids Res. 2020 Jan 8;48(D1):D704-D715. doi: 10.1093/nar/gkz997.
10
Mendelian Gene Discovery: Fast and Furious with No End in Sight.孟德尔基因发现:快速而激烈,没有尽头。
Am J Hum Genet. 2019 Sep 5;105(3):448-455. doi: 10.1016/j.ajhg.2019.07.011.