用扩展巴科斯-诺尔范式（EBNF）形式化描述标准人类变异命名法。

A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form.

机构信息

Department of Human Genetics, Center for Human and Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands.

出版信息

BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S5. doi: 10.1186/1471-2105-12-S4-S5. Epub 2011 Jul 5.

DOI:10.1186/1471-2105-12-S4-S5

PMID:21992071

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3194197/

Abstract

BACKGROUND

The use of a standard human sequence variant nomenclature is advocated by the Human Genome Variation Society in order to unambiguously describe genetic variants in databases and literature. There is a clear need for tools that allow the mining of data about human sequence variants and their functional consequences from databases and literature. Existing text mining focuses on the recognition of protein variants and their effects. The recognition of variants at the DNA and RNA levels is essential for dissemination of variant data for diagnostic purposes. Development of new tools is hampered by the complexity of the current nomenclature, which requires processing at the character level to recognize the specific syntactic constructs used in variant descriptions.

RESULTS

We approached the gene variant nomenclature as a scientific sublanguage and created two formal descriptions of the syntax in Extended Backus-Naur Form: one at the DNA-RNA level and one at the protein level. To ensure compatibility to older versions of the human sequence variant nomenclature, previously recommended variant description formats have been included. The first grammar versions were designed to help build variant description handling in the Alamut mutation interpretation software. The DNA and RNA level descriptions were then updated and used to construct the context-free parser of the Mutalyzer 2 sequence variant nomenclature checker, which has already been used to check more than one million variant descriptions.

CONCLUSIONS

The Extended Backus-Naur Form provided an overview of the full complexity of the syntax of the sequence variant nomenclature, which remained hidden in the textual format and the division of the recommendations across the DNA, RNA and protein sections of the Human Genome Variation Society nomenclature website (http://www.hgvs.org/mutnomen/). This insight into the syntax of the nomenclature could be used to design detailed and clear rules for software development. The Mutalyzer 2 parser demonstrated that it facilitated decomposition of complex variant descriptions into their individual parts. The Extended Backus-Naur Form or parts of it can be used or modified by adding rules, allowing the development of specific sequence variant text mining tools and other programs, which can generate or handle sequence variant descriptions.

摘要

背景

为了在数据库和文献中明确描述遗传变异，人类基因组变异协会提倡使用标准的人类序列变异命名法。显然需要有工具可以从数据库和文献中挖掘有关人类序列变异及其功能后果的数据。现有的文本挖掘主要集中在识别蛋白质变异及其影响上。在 DNA 和 RNA 水平识别变异对于传播用于诊断目的的变异数据是必不可少的。由于当前命名法的复杂性，新工具的开发受到阻碍，这需要在字符级别进行处理，以识别变异描述中使用的特定语法结构。

结果

我们将基因变异命名法视为一种科学子语言，并以扩展巴科斯-诺尔范式（EBNF）的形式创建了两个语法描述：一个用于 DNA-RNA 水平，一个用于蛋白质水平。为了确保与旧版本的人类序列变异命名法兼容，之前推荐的变异描述格式也已包含在内。第一个语法版本旨在帮助在 Alamut 突变解释软件中构建变异描述处理。然后更新 DNA 和 RNA 水平的描述，并将其用于构建 Mutalyzer 2 序列变异命名法检查器的无上下文解析器，该检查器已经用于检查超过一百万个变异描述。

结论

扩展巴科斯-诺尔范式（EBNF）提供了序列变异命名法语法的完整复杂性概述，这些复杂性在文本格式和人类基因组变异协会命名法网站（http://www.hgvs.org/mutnomen/）的 DNA、RNA 和蛋白质部分之间的推荐划分中隐藏起来。对命名法语法的这种深入了解可用于为软件开发设计详细和清晰的规则。Mutalyzer 2 解析器证明，它有助于将复杂的变异描述分解为其各个部分。扩展巴科斯-诺尔范式（EBNF）或其部分可以通过添加规则来使用或修改，从而允许开发特定的序列变异文本挖掘工具和其他程序，这些工具可以生成或处理序列变异描述。

相似文献

A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form.用扩展巴科斯-诺尔范式（EBNF）形式化描述标准人类变异命名法。

BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S5. doi: 10.1186/1471-2105-12-S4-S5. Epub 2011 Jul 5.

Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer.序列变异描述：HGVS命名法与Mutalyzer

Curr Protoc Hum Genet. 2016 Jul 1;90:7.13.1-7.13.19. doi: 10.1002/cphg.2.

Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker.使用Mutalyzer序列变异命名检查器改进突变数据库和文献中的序列变异描述。

Hum Mutat. 2008 Jan;29(1):6-13. doi: 10.1002/humu.20654.

Describing structural changes by extending HGVS sequence variation nomenclature.通过扩展 HGVS 序列变异命名法来描述结构变化。

Hum Mutat. 2011 May;32(5):507-11. doi: 10.1002/humu.21427. Epub 2011 Mar 15.

Mutalyzer 2: next generation HGVS nomenclature checker.Mutalyzer 2：下一代 HGVS 命名法检查器。

Bioinformatics. 2021 Sep 29;37(18):2811-2817. doi: 10.1093/bioinformatics/btab051.

HGVS Recommendations for the Description of Sequence Variants: 2016 Update.《人类基因组变异协会（HGVS）序列变异描述建议：2016年更新》

Hum Mutat. 2016 Jun;37(6):564-9. doi: 10.1002/humu.22981. Epub 2016 Mar 25.

Describing Sequence Variants Using HGVS Nomenclature.使用人类基因组变异协会（HGVS）命名法描述序列变异。

Methods Mol Biol. 2017;1492:243-251. doi: 10.1007/978-1-4939-6442-0_17.

VariantValidator: Accurate validation, mapping, and formatting of sequence variation descriptions.VariantValidator：准确验证、映射和格式化序列变异描述。

Hum Mutat. 2018 Jan;39(1):61-68. doi: 10.1002/humu.23348. Epub 2017 Oct 17.

An efficient algorithm for the extraction of HGVS variant descriptions from sequences.一种从序列中提取 HGVS 变异描述的高效算法。

Bioinformatics. 2015 Dec 1;31(23):3751-7. doi: 10.1093/bioinformatics/btv443. Epub 2015 Jul 31.

hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update.HGVS：使用 HGVS 命名法操作序列变异的 Python 包：2018 更新。

Hum Mutat. 2018 Dec;39(12):1803-1813. doi: 10.1002/humu.23615. Epub 2018 Sep 5.

引用本文的文献

HGVS Nomenclature 2024: improvements to community engagement, usability, and computability.《人类基因组变异协会（HGVS）命名法2024：社区参与度、可用性和可计算性的改进》

Genome Med. 2024 Dec 20;16(1):149. doi: 10.1186/s13073-024-01421-5.

Murine allele and transgene symbols: ensuring unique, concise, and informative nomenclature.鼠等位基因和转基因符号：确保命名法具有唯一性、简洁性和信息性。

Mamm Genome. 2022 Mar;33(1):108-119. doi: 10.1007/s00335-021-09902-3. Epub 2021 Aug 14.

Aggregate penetrance of genomic variants for actionable disorders in European and African Americans.欧洲裔和非裔美国人中可操作疾病的基因组变异的综合外显率。

Sci Transl Med. 2016 Nov 9;8(364):364ra151. doi: 10.1126/scitranslmed.aag2367.

Human genotype-phenotype databases: aims, challenges and opportunities.人类基因型-表型数据库：目标、挑战与机遇。

Nat Rev Genet. 2015 Dec;16(12):702-15. doi: 10.1038/nrg3932. Epub 2015 Nov 10.

A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature.一个用于使用HGVS命名法解析、验证、映射和格式化序列变异的Python软件包。

Bioinformatics. 2015 Jan 15;31(2):268-70. doi: 10.1093/bioinformatics/btu630. Epub 2014 Sep 30.

A methodology for a minimum data set for rare diseases to support national centers of excellence for healthcare and research.一种用于罕见病的最小数据集的方法，以支持国家医疗保健和研究卓越中心。

J Am Med Inform Assoc. 2015 Jan;22(1):76-85. doi: 10.1136/amiajnl-2014-002794. Epub 2014 Jul 18.

Novel membrane frizzled-related protein gene mutation as cause of posterior microphthalmia resulting in high hyperopia with macular folds.新型膜卷曲相关蛋白基因突变导致小眼球后极部病变，进而引起高度远视伴黄斑皱褶。

Acta Ophthalmol. 2014 May;92(3):276-81. doi: 10.1111/aos.12105. Epub 2013 Jun 7.

VarioML framework for comprehensive variation data representation and exchange.VarioML 框架用于全面的变异数据表示和交换。

BMC Bioinformatics. 2012 Oct 3;13:254. doi: 10.1186/1471-2105-13-254.

本文引用的文献

LOVD v.2.0: the next generation in gene variant databases.LOVD v.2.0：基因变异数据库的新一代产品。

Hum Mutat. 2011 May;32(5):557-63. doi: 10.1002/humu.21438. Epub 2011 Feb 22.

Describing structural changes by extending HGVS sequence variation nomenclature.通过扩展 HGVS 序列变异命名法来描述结构变化。

Hum Mutat. 2011 May;32(5):507-11. doi: 10.1002/humu.21427. Epub 2011 Mar 15.

Locus Reference Genomic sequences: an improved basis for describing human DNA variants.基因组序列定位参考：一种改进的人类 DNA 变异描述基础。

Genome Med. 2010 Apr 15;2(4):24. doi: 10.1186/gm145.

EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts.EnzyMiner：从PubMed摘要中自动识别蛋白质水平突变及其对靶酶的影响。

BMC Bioinformatics. 2009 Aug 27;10 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-10-S8-S2.

Between proteins and phenotypes: annotation and interpretation of mutations.蛋白质与表型之间：突变的注释与解读

BMC Bioinformatics. 2009 Aug 27;10 Suppl 8(Suppl 8):I1. doi: 10.1186/1471-2105-10-S8-I1.

GoGene: gene annotation in the fast lane.GoGene：快速通道中的基因注释

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W300-4. doi: 10.1093/nar/gkp429. Epub 2009 May 22.

Linking genes to literature: text mining, information extraction, and retrieval applications for biology.将基因与文献相联系：生物学的文本挖掘、信息提取及检索应用

Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1.

OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature.OSIRISv1.2：一种用于生物医学文献中基因序列变异的命名实体识别系统。

BMC Bioinformatics. 2008 Feb 5;9:84. doi: 10.1186/1471-2105-9-84.

A workflow for mutation extraction and structure annotation.一种用于突变提取和结构注释的工作流程。

J Bioinform Comput Biol. 2007 Dec;5(6):1319-37. doi: 10.1142/s0219720007003119.

Hum Mutat. 2008 Jan;29(1):6-13. doi: 10.1002/humu.20654.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验