Suppr超能文献

使用美国国立医学图书馆(NCBI)对国家癌症研究所术语表(NCIT)基因组角色进行自动比较审核。

Automated comparative auditing of NCIT genomic roles using NCBI.

作者信息

Cohen Barry, Oren Marc, Min Hua, Perl Yehoshua, Halper Michael

机构信息

Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA.

出版信息

J Biomed Inform. 2008 Dec;41(6):904-13. doi: 10.1016/j.jbi.2008.03.010. Epub 2008 Mar 28.

Abstract

Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT's Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information's (NCBI's) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes play a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance.

摘要

生物医学研究已经识别出许多人类基因以及关于它们的各种知识。美国国立癌症研究所术语表(NCIT)将这些知识表示为概念和角色(关系)。由于该领域的快速发展,可以预期NCIT的基因层次结构会包含角色错误。本文提出了一种使用美国国立生物技术信息中心(NCBI)的Entrez基因数据库来审核基因层次结构的比较方法。通过一对网络爬虫访问这两个知识源,以确保数据的时效性。然后我们的算法比较从每个来源收集的知识,识别出可能代表错误的差异,并提出纠正措施。主要关注两种基因角色:(1)基因的染色体位置,以及(2)基因发挥作用的生物学过程。关于染色体位置,所揭示的差异是显著且系统的,表明存在结构上的共同起源。关于生物学过程,由于基因经常在多个过程中发挥作用,并且过程可能有许多名称(如同义词),所以会出现困难。我们的算法利用NCIT生物过程层次结构中定义的角色来发现NCIT中许多可能的基因角色错误。这些结果表明,自动比较审核是一种很有前景的技术,它可以在术语基因组知识库中识别大量可能的错误并对其进行纠正,从而便于其整体维护。

相似文献

4
Auditing as part of the terminology design life cycle.作为术语设计生命周期一部分的审核。
J Am Med Inform Assoc. 2006 Nov-Dec;13(6):676-90. doi: 10.1197/jamia.M2036. Epub 2006 Aug 23.
6
Entrez Gene: gene-centered information at NCBI.Entrez基因:美国国立医学图书馆国家生物技术信息中心的基因中心信息。
Nucleic Acids Res. 2011 Jan;39(Database issue):D52-7. doi: 10.1093/nar/gkq1237. Epub 2010 Nov 28.
7
Entrez Gene: gene-centered information at NCBI.Entrez基因:美国国立医学图书馆国家生物技术信息中心的基因中心信息。
Nucleic Acids Res. 2007 Jan;35(Database issue):D26-31. doi: 10.1093/nar/gkl993. Epub 2006 Dec 5.
8
Missing lateral relationships in top-level concepts of an ontology.本体论中顶级概念中缺失横向关系。
BMC Med Inform Decis Mak. 2020 Dec 15;20(Suppl 10):305. doi: 10.1186/s12911-020-01319-3.

本文引用的文献

6
Structural methodologies for auditing SNOMED.用于审核SNOMED的结构化方法。
J Biomed Inform. 2007 Oct;40(5):561-81. doi: 10.1016/j.jbi.2006.12.003. Epub 2006 Dec 24.
7
Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。
Nucleic Acids Res. 2007 Jan;35(Database issue):D5-12. doi: 10.1093/nar/gkl1031. Epub 2006 Dec 14.
9
Auditing as part of the terminology design life cycle.作为术语设计生命周期一部分的审核。
J Am Med Inform Assoc. 2006 Nov-Dec;13(6):676-90. doi: 10.1197/jamia.M2036. Epub 2006 Aug 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验