使用美国国立医学图书馆（NCBI）对国家癌症研究所术语表（NCIT）基因组角色进行自动比较审核。

Automated comparative auditing of NCIT genomic roles using NCBI.

作者信息

Cohen Barry, Oren Marc, Min Hua, Perl Yehoshua, Halper Michael

机构信息

Computer Science Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA.

出版信息

J Biomed Inform. 2008 Dec;41(6):904-13. doi: 10.1016/j.jbi.2008.03.010. Epub 2008 Mar 28.

DOI:10.1016/j.jbi.2008.03.010

PMID:18486558

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2630966/

Abstract

Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT's Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information's (NCBI's) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes play a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance.

摘要

生物医学研究已经识别出许多人类基因以及关于它们的各种知识。美国国立癌症研究所术语表（NCIT）将这些知识表示为概念和角色（关系）。由于该领域的快速发展，可以预期NCIT的基因层次结构会包含角色错误。本文提出了一种使用美国国立生物技术信息中心（NCBI）的Entrez基因数据库来审核基因层次结构的比较方法。通过一对网络爬虫访问这两个知识源，以确保数据的时效性。然后我们的算法比较从每个来源收集的知识，识别出可能代表错误的差异，并提出纠正措施。主要关注两种基因角色：（1）基因的染色体位置，以及（2）基因发挥作用的生物学过程。关于染色体位置，所揭示的差异是显著且系统的，表明存在结构上的共同起源。关于生物学过程，由于基因经常在多个过程中发挥作用，并且过程可能有许多名称（如同义词），所以会出现困难。我们的算法利用NCIT生物过程层次结构中定义的角色来发现NCIT中许多可能的基因角色错误。这些结果表明，自动比较审核是一种很有前景的技术，它可以在术语基因组知识库中识别大量可能的错误并对其进行纠正，从而便于其整体维护。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/742c/2630966/cb8313f75647/nihms81774f1.jpg

相似文献

Automated comparative auditing of NCIT genomic roles using NCBI.使用美国国立医学图书馆（NCBI）对国家癌症研究所术语表（NCIT）基因组角色进行自动比较审核。

J Biomed Inform. 2008 Dec;41(6):904-13. doi: 10.1016/j.jbi.2008.03.010. Epub 2008 Mar 28.

Detecting role errors in the gene hierarchy of the NCI Thesaurus.检测美国国立癌症研究所叙词表基因层级中的角色错误。

Cancer Inform. 2008;6:293-313. doi: 10.4137/cin.s440.

Gene: a gene-centered information resource at NCBI.基因：美国国立医学图书馆国家生物技术信息中心的一个以基因为中心的信息资源库。

Nucleic Acids Res. 2015 Jan;43(Database issue):D36-42. doi: 10.1093/nar/gku1055. Epub 2014 Oct 29.

Auditing as part of the terminology design life cycle.作为术语设计生命周期一部分的审核。

J Am Med Inform Assoc. 2006 Nov-Dec;13(6):676-90. doi: 10.1197/jamia.M2036. Epub 2006 Aug 23.

Relating Complexity and Error Rates of Ontology Concepts. More Complex NCIt Concepts Have More Errors.关联本体概念的复杂性和错误率。更复杂的国家癌症研究所（NCIt）概念有更多错误。

Methods Inf Med. 2017 May 18;56(3):200-208. doi: 10.3414/ME16-01-0085. Epub 2017 Feb 28.

Entrez Gene: gene-centered information at NCBI.Entrez基因：美国国立医学图书馆国家生物技术信息中心的基因中心信息。

Nucleic Acids Res. 2011 Jan;39(Database issue):D52-7. doi: 10.1093/nar/gkq1237. Epub 2010 Nov 28.

Entrez Gene: gene-centered information at NCBI.Entrez基因：美国国立医学图书馆国家生物技术信息中心的基因中心信息。

Nucleic Acids Res. 2007 Jan;35(Database issue):D26-31. doi: 10.1093/nar/gkl993. Epub 2006 Dec 5.

Missing lateral relationships in top-level concepts of an ontology.本体论中顶级概念中缺失横向关系。

BMC Med Inform Decis Mak. 2020 Dec 15;20(Suppl 10):305. doi: 10.1186/s12911-020-01319-3.

Entrez Gene: gene-centered information at NCBI.Entrez基因：美国国立医学图书馆国家生物技术信息中心的基因中心信息。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D54-8. doi: 10.1093/nar/gki031.

Genomic databases and resources at the National Center for Biotechnology Information.美国国立生物技术信息中心的基因组数据库和资源。

Methods Mol Biol. 2010;609:17-44. doi: 10.1007/978-1-60327-241-4_2.

引用本文的文献

Extended Analysis of Topological-Pattern-Based Ontology Enrichment.基于拓扑模式的本体富集的扩展分析

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018 Dec;2018:1641-1648. doi: 10.1109/BIBM.2018.8621564. Epub 2019 Jan 24.

Topological-Pattern-Based Recommendation of UMLS Concepts for National Cancer Institute Thesaurus.基于拓扑模式的美国国立癌症研究所叙词表的统一医学语言系统概念推荐

AMIA Annu Symp Proc. 2017 Feb 10;2016:618-627. eCollection 2016.

Preliminary Analysis of Difficulty of Importing Pattern-Based Concepts into the National Cancer Institute Thesaurus.将基于模式的概念导入美国国立癌症研究所叙词表的难度初步分析

Stud Health Technol Inform. 2016;228:389-93.

Relationship auditing of the FMA ontology.FMA本体的关系审计

J Biomed Inform. 2009 Jun;42(3):550-7. doi: 10.1016/j.jbi.2009.01.001.

Auditing associative relations across two knowledge sources.审核跨两个知识源的关联关系。

J Biomed Inform. 2009 Jun;42(3):426-39. doi: 10.1016/j.jbi.2009.01.004.

A review of auditing methods applied to the content of controlled biomedical terminologies.对应用于受控生物医学术语内容的审核方法的综述。

J Biomed Inform. 2009 Jun;42(3):413-25. doi: 10.1016/j.jbi.2009.03.003. Epub 2009 Mar 12.

本文引用的文献

Digital Libraries and Medicine.数字图书馆与医学

Yearb Med Inform. 2001(1):4-6.

Mapping the gene ontology into the unified medical language system.将基因本体映射到统一医学语言系统中。

Comp Funct Genomics. 2004;5(4):354-61. doi: 10.1002/cfg.407.

A fault model for ontology mapping, alignment, and linking systems.一种用于本体映射、对齐和链接系统的故障模型。

Pac Symp Biocomput. 2007:233-44.

Annotation and query of tissue microarray data using the NCI Thesaurus.使用美国国立癌症研究所术语表对组织微阵列数据进行注释和查询。

BMC Bioinformatics. 2007 Aug 8;8:296. doi: 10.1186/1471-2105-8-296.

Manual curation is not sufficient for annotation of genomic databases.人工整理对于基因组数据库的注释来说并不足够。

Bioinformatics. 2007 Jul 1;23(13):i41-8. doi: 10.1093/bioinformatics/btm229.

Structural methodologies for auditing SNOMED.用于审核SNOMED的结构化方法。

J Biomed Inform. 2007 Oct;40(5):561-81. doi: 10.1016/j.jbi.2006.12.003. Epub 2006 Dec 24.

Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。

Nucleic Acids Res. 2007 Jan;35(Database issue):D5-12. doi: 10.1093/nar/gkl1031. Epub 2006 Dec 14.

Evaluation of lexical methods for detecting relationships between concepts from multiple ontologies.用于检测来自多个本体的概念之间关系的词汇方法评估。

Pac Symp Biocomput. 2006:28-39.

Auditing as part of the terminology design life cycle.作为术语设计生命周期一部分的审核。

J Am Med Inform Assoc. 2006 Nov-Dec;13(6):676-90. doi: 10.1197/jamia.M2036. Epub 2006 Aug 23.

NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information.美国国立癌症研究所叙词表：整合癌症相关临床和分子信息的语义模型。

J Biomed Inform. 2007 Feb;40(1):30-43. doi: 10.1016/j.jbi.2006.02.013. Epub 2006 Mar 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用美国国立医学图书馆（NCBI）对国家癌症研究所术语表（NCIT）基因组角色进行自动比较审核。

Automated comparative auditing of NCIT genomic roles using NCBI.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献