神经网络可检测mRNA剪接位点分配中的错误。

Neural network detects errors in the assignment of mRNA splice sites.

作者信息

Brunak S, Engelbrecht J, Knudsen S

机构信息

Department of Structural Properties of Materials, Technical University of Denmark, Lyngby.

出版信息

Nucleic Acids Res. 1990 Aug 25;18(16):4797-801. doi: 10.1093/nar/18.16.4797.

DOI:10.1093/nar/18.16.4797

PMID:2395643

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC331948/

Abstract

The use of databanks in genetic research assumes reliability of the information they contain. Currently, error-detection in the manually or electronically entered data contained in the nucleotide sequence databanks at EMBL, Heidelberg and GenBank at Los Alamos is limited. We have used a subset of sequences from these databanks to train neural networks to recognize pre-mRNA splicing signals in human genes. During the training on 33 human genes from the EMBL databank seven genes appeared to disturb the learning process. Subsequent investigation revealed discrepancies from the original published papers, for three genes. In four genes, we found wrongly assigned splicing frames of introns. We believe this to be a reflection of the fact that splicing frames cannot always be unambiguously assigned on the basis of experimental data. Thus incorrect assignment appear both due to mere typographical misprints as well as erroneous interpretation of experiments. Training on 241 human sequences from GenBank revealed nine new errors. We propose that such errors could be detected by computer algorithms designed to check the consistency of data prior to their incorporation in databanks.

摘要

在基因研究中使用数据库时，假定其所包含信息的可靠性。目前，对位于海德堡的欧洲分子生物学实验室（EMBL）和位于洛斯阿拉莫斯的GenBank核苷酸序列数据库中人工录入或电子录入的数据进行错误检测的能力有限。我们利用这些数据库中的一部分序列来训练神经网络，以识别人类基因中的前体信使核糖核酸（pre-mRNA）剪接信号。在对EMBL数据库中的33个人类基因进行训练时，有7个基因似乎干扰了学习过程。随后的调查发现，其中3个基因与最初发表的论文存在差异。在另外4个基因中，我们发现内含子的剪接框架被错误分配。我们认为这反映了一个事实，即仅凭实验数据并不总能明确无误地确定剪接框架。因此，错误的分配既可能是由于排版错误，也可能是对实验的错误解读。对GenBank中的241个人类序列进行训练时，又发现了9个新的错误。我们建议，可以通过设计用于在数据纳入数据库之前检查数据一致性的计算机算法来检测此类错误。

相似文献

Neural network detects errors in the assignment of mRNA splice sites.神经网络可检测mRNA剪接位点分配中的错误。

Nucleic Acids Res. 1990 Aug 25;18(16):4797-801. doi: 10.1093/nar/18.16.4797.

Cleaning the GenBank Arabidopsis thaliana data set.清理GenBank拟南芥数据集。

Nucleic Acids Res. 1996 Jan 15;24(2):316-20. doi: 10.1093/nar/24.2.316.

Identification of sites of pre-MRNA/spliceosome association.前体mRNA/剪接体结合位点的鉴定

SAAS Bull Biochem Biotechnol. 1991 Jan;4:76-80.

Quantification analysis of splice signal sequences: mutation of 3'-splice signal sequence and mechanism of unsplicing in a beta-thalassemia pre-mRNA.剪接信号序列的定量分析：β地中海贫血前体mRNA中3'剪接信号序列的突变与未剪接机制

Nucleic Acids Symp Ser. 1999(42):63-4.

Trans-splicing of pre-mRNA is predicted to occur in a wide range of organisms including vertebrates.前体信使核糖核酸（pre-mRNA）的反式剪接预计会在包括脊椎动物在内的多种生物体中发生。

Nucleic Acids Res. 1990 Aug 25;18(16):4719-25. doi: 10.1093/nar/18.16.4719.

Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information.通过结合局部和全局序列信息预测拟南芥前体mRNA中的剪接位点

Nucleic Acids Res. 1996 Sep 1;24(17):3439-52. doi: 10.1093/nar/24.17.3439.

Regulation of splicing: the importance of being translatable.剪接调控：可翻译性的重要性。

RNA. 2004 Jan;10(1):1-4. doi: 10.1261/rna.5112704.

Interaction of the yeast DExH-box RNA helicase prp22p with the 3' splice site during the second step of nuclear pre-mRNA splicing.酵母DExH盒RNA解旋酶prp22p在核内前体mRNA剪接第二步过程中与3'剪接位点的相互作用。

Nucleic Acids Res. 2000 Mar 15;28(6):1313-21. doi: 10.1093/nar/28.6.1313.

Nucleotide sequence composition adjacent to intronic splice sites improves splicing efficiency via its effect on pre-mRNA local folding in fungi.内含子剪接位点附近的核苷酸序列组成通过影响真菌中前体mRNA的局部折叠来提高剪接效率。

RNA. 2015 Oct;21(10):1704-18. doi: 10.1261/rna.051268.115. Epub 2015 Aug 5.

Temperature-dependent splicing of beta-globin pre-mRNA.β-珠蛋白前体信使核糖核酸的温度依赖性剪接

Nucleic Acids Res. 2002 Nov 1;30(21):4592-8. doi: 10.1093/nar/gkf607.

引用本文的文献

SignalP: The Evolution of a Web Server.SignalP：一个网络服务器的发展历程。

Methods Mol Biol. 2024;2836:331-367. doi: 10.1007/978-1-0716-4007-4_17.

ACDC, a global database of amphibian cytochrome-b sequences using reproducible curation for GenBank records.ACDC，一个全球两栖动物细胞色素-b 序列数据库，使用可重复的策管为 GenBank 记录提供服务。

Sci Data. 2020 Aug 13;7(1):268. doi: 10.1038/s41597-020-00598-9.

Method of predicting splice sites based on signal interactions.基于信号相互作用预测剪接位点的方法。

Biol Direct. 2006 Apr 3;1:10. doi: 10.1186/1745-6150-1-10.

Analysis of missense variants in the PKHD1-gene in patients with autosomal recessive polycystic kidney disease (ARPKD).常染色体隐性多囊肾病（ARPKD）患者PKHD1基因错义变异分析

Hum Genet. 2005 Nov;118(2):185-206. doi: 10.1007/s00439-005-0027-7. Epub 2005 Nov 15.

Analysis of donor splice sites in different eukaryotic organisms.不同真核生物中供体剪接位点的分析。

J Mol Evol. 1997 Jul;45(1):50-9. doi: 10.1007/pl00006200.

O-GLYCBASE version 2.0: a revised database of O-glycosylated proteins.O-GLYCBASE 2.0版：O-糖基化蛋白的修订数据库。

Nucleic Acids Res. 1997 Jan 1;25(1):278-82. doi: 10.1093/nar/25.1.278.

Cleaning the GenBank Arabidopsis thaliana data set.清理GenBank拟南芥数据集。

Nucleic Acids Res. 1996 Jan 15;24(2):316-20. doi: 10.1093/nar/24.2.316.

O-GLYCBASE: a revised database of O-glycosylated proteins.O-GLYCBASE：O-糖基化蛋白的修订数据库。

Nucleic Acids Res. 1996 Jan 1;24(1):248-52. doi: 10.1093/nar/24.1.248.

Quantitative sequence-activity models (QSAM)--tools for sequence design.定量序列-活性模型（QSAM）——序列设计工具

Nucleic Acids Res. 1993 Feb 11;21(3):733-9. doi: 10.1093/nar/21.3.733.

Self-organized neural maps of human protein sequences.人类蛋白质序列的自组织神经图谱。

Protein Sci. 1994 Mar;3(3):507-21. doi: 10.1002/pro.5560030316.

本文引用的文献

Complete nucleotide sequence of the human delta-globin gene.人类δ-珠蛋白基因的完整核苷酸序列。

Cell. 1980 Oct;21(3):639-46. doi: 10.1016/0092-8674(80)90427-4.

Complete nucleotide sequence of a functional class I HLA gene, HLA-A3: implications for the evolution of HLA genes.功能性I类HLA基因HLA - A3的完整核苷酸序列：对HLA基因进化的启示

EMBO J. 1984 Apr;3(4):887-94. doi: 10.1002/j.1460-2075.1984.tb01901.x.

Computer methods to locate signals in nucleic acid sequences.在核酸序列中定位信号的计算机方法。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505-19. doi: 10.1093/nar/12.1part2.505.

Human growth hormone DNA sequence and mRNA structure: possible alternative splicing.人类生长激素DNA序列和mRNA结构：可能的可变剪接

Nucleic Acids Res. 1981 Aug 11;9(15):3719-30. doi: 10.1093/nar/9.15.3719.

The nucleotide sequence of the human beta-globin gene.人类β-珠蛋白基因的核苷酸序列。

Cell. 1980 Oct;21(3):647-51. doi: 10.1016/0092-8674(80)90428-6.

The primary structure of the human epsilon-globin gene.人类ε-珠蛋白基因的一级结构。

Cell. 1980 Oct;21(3):621-6. doi: 10.1016/0092-8674(80)90425-0.

Diverse mechanisms in the generation of human beta-tubulin pseudogenes.

Science. 1982 Aug 6;217(4559):549. doi: 10.1126/science.6178164.

Prediction of splice junctions in mRNA sequences.mRNA序列中剪接位点的预测。

Nucleic Acids Res. 1985 Jul 25;13(14):5327-40. doi: 10.1093/nar/13.14.5327.

Isolation and characterization of genomic and cDNA clones of human erythropoietin.人促红细胞生成素基因组和cDNA克隆的分离与鉴定

Nature. 1985;313(6005):806-10. doi: 10.1038/313806a0.

Sequence and organization of genes encoding the human 27 kDa heat shock protein.编码人类27kDa热休克蛋白的基因序列与组织

Nucleic Acids Res. 1986 May 27;14(10):4127-45. doi: 10.1093/nar/14.10.4127.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。