Suppr超能文献

改进生物数据库中真核生物单外显子编码序列的本体论。

Improved ontology for eukaryotic single-exon coding sequences in biological databases.

机构信息

Center for Bioinformatics and Genome Biology, Fundacion Ciencia & Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile.

Facultad de Ciencias Biologicas, Universidad Andres Bello, Santiago, Chile.

出版信息

Database (Oxford). 2018 Jan 1;2018:1-6. doi: 10.1093/database/bay089.

Abstract

Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term 'single-exon gene'. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term 'SEGs' is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.

摘要

从生物数据中提取知识需要开发结构化词汇表来明确定义生物术语。本文提出了描述和定义来消除术语“单外显子基因”的歧义。真核单外显子基因 (SEGs) 被定义为其蛋白质编码序列中没有内含子的基因。不仅研究了它们的起源和进化,还研究了它们的表达与几种类型的人类癌症和神经/发育障碍的关系,许多 SEG 表现出组织特异性转录。不幸的是,术语“SEGs”存在很多歧义,导致生物学误解。在经典定义中,没有区分在其非翻译区 (UTR) 中含有内含子的 SEG 与没有内含子的 SEG。做出这种区分很重要,因为 UTR 中内含子的存在会影响 mRNA 的转录调控和转录后加工。此外,最近的全转录组鸟枪法测序导致发现了许多由多外显子基因的选择性剪接产生的单外显子 mRNA 的例子,尽管这些单外显子异构体的起源明显不同,但它们仍与 SEG 混淆。RNA-seq 数据集的不断扩展使得在生物数据库中注释错误不可避免地传播之前,区分不同的 SEG 类型势在必行。本文开发了一种用于消除歧义的结构化词汇表,允许对其进化轨迹、调控、RNA 加工和转运进行重大重新评估,并提供了提高与癌症、神经和发育疾病等疾病相关基因检测的机会。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df63/6146118/39a3d2f16176/bay089f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验