全基因组计算鉴定和人工注释人类长非编码 RNA 基因。

Genome-wide computational identification and manual annotation of human long noncoding RNA genes.

机构信息

Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MO 48202, USA.

出版信息

RNA. 2010 Aug;16(8):1478-87. doi: 10.1261/rna.1951310. Epub 2010 Jun 29.

DOI:10.1261/rna.1951310

PMID:20587619

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2905748/

Abstract

Experimental evidence suggests that half or more of the mammalian transcriptome consists of noncoding RNA. Noncoding RNAs are divided into short noncoding RNAs (including microRNAs) and long noncoding RNAs (lncRNAs). We defined complementary DNAs (cDNAs) lacking any positive-strand open reading frames (ORFs) longer than 30 amino acids, as well as cDNAs lacking any evidence of interspecies conservation of their longer-than-30-amino acid ORFs, as noncoding. We have identified 5446 lncRNA genes in the human genome from approximately 24,000 full-length cDNAs, using our new ORF-prediction pipeline. We combined them nonredundantly with lncRNAs from four published sources to derive 6736 lncRNA genes. In an effort to distinguish standalone and antisense lncRNA genes from database artifacts, we stratified our catalog of lncRNAs according to the distance between each lncRNA gene candidate and its nearest known protein-coding gene. We concurrently examined the protein-coding capacity of known genes overlapping with lncRNAs. Remarkably, 62% of known genes with "hypothetical protein" names actually lacked protein-coding capacity. This study has greatly expanded the known human lncRNA catalog, increased its accuracy through manual annotation of cDNA-to-genome alignments, and revealed that a large set of hypothetical-protein genes in GenBank lacks protein-coding capacity. In addition, we have developed, independently of existing NCBI tools, command-line programs with high-throughput ORF-finding and BLASTP-parsing functionality, suitable for future automated assessments of protein-coding capacity of novel transcripts.

摘要

实验证据表明，哺乳动物转录组的一半或更多由非编码 RNA 组成。非编码 RNA 分为短非编码 RNA（包括 microRNA）和长非编码 RNA（lncRNA）。我们将缺乏任何长度超过 30 个氨基酸的正链开放阅读框（ORF）的 cDNA 以及缺乏其长度超过 30 个氨基酸的 ORF 在种间保守性证据的 cDNA 定义为非编码。我们使用新的 ORF 预测管道，从大约 24000 个全长 cDNA 中鉴定出人类基因组中的 5446 个 lncRNA 基因。我们将它们与来自四个已发表来源的 lncRNA 非冗余组合，得出 6736 个 lncRNA 基因。为了区分独立的和反义的 lncRNA 基因与数据库伪影，我们根据每个 lncRNA 基因候选者与其最近的已知编码蛋白基因之间的距离对我们的 lncRNA 目录进行分层。我们同时检查了与 lncRNA 重叠的已知基因的编码蛋白能力。值得注意的是，62%的具有“假设蛋白”名称的已知基因实际上缺乏编码蛋白的能力。这项研究大大扩展了已知的人类 lncRNA 目录，通过手动注释 cDNA 到基因组比对提高了其准确性，并揭示了 GenBank 中大量的假设蛋白基因缺乏编码蛋白的能力。此外，我们独立于现有的 NCBI 工具开发了具有高通量 ORF 发现和 BLASTP 解析功能的命令行程序，适合未来对新型转录本的编码蛋白能力进行自动评估。

相似文献

Genome-wide computational identification and manual annotation of human long noncoding RNA genes.

RNA. 2010 Aug;16(8):1478-87. doi: 10.1261/rna.1951310. Epub 2010 Jun 29.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.

BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.

Long noncoding RNA repertoire in chicken liver and adipose tissue.

Genet Sel Evol. 2017 Jan 10;49(1):6. doi: 10.1186/s12711-016-0275-0.

Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

PLoS One. 2016 Feb 19;11(2):e0148940. doi: 10.1371/journal.pone.0148940. eCollection 2016.

Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis.

Genome Res. 2012 Mar;22(3):577-91. doi: 10.1101/gr.133009.111. Epub 2011 Nov 22.

Global analysis of ribosome-associated noncoding RNAs unveils new modes of translational regulation.

Proc Natl Acad Sci U S A. 2017 Nov 14;114(46):E10018-E10027. doi: 10.1073/pnas.1708433114. Epub 2017 Oct 30.

The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.

Genome Res. 2012 Sep;22(9):1775-89. doi: 10.1101/gr.132159.111.

Computational analysis of functional long noncoding RNAs reveals lack of peptide-coding capacity and parallels with 3' UTRs.

RNA. 2012 Apr;18(4):825-43. doi: 10.1261/rna.029520.111. Epub 2012 Feb 23.

Integrative classification of human coding and noncoding genes through RNA metabolism profiles.

Nat Struct Mol Biol. 2017 Jan;24(1):86-96. doi: 10.1038/nsmb.3325. Epub 2016 Nov 21.

引用本文的文献

Transcriptomic Identification of Long Noncoding RNAs Modulating MPK3/MPK6-Centered Immune Networks in Arabidopsis.

Int J Mol Sci. 2025 Aug 28;26(17):8331. doi: 10.3390/ijms26178331.

Integrated omics reveal the mechanisms underlying softening and aroma changes in pear during postharvest storage and the role of melatonin.

BMC Plant Biol. 2025 May 22;25(1):679. doi: 10.1186/s12870-025-06714-4.

LncRNA evolution and DNA methylation variation participate in photosynthesis pathways of distinct lineages of .

For Res (Fayettev). 2023 Feb 6;3:3. doi: 10.48130/FR-2023-0003. eCollection 2023.

lncRNAs regulate cell stemness in physiology and pathology during differentiation and development.

Am J Stem Cells. 2024 Apr 25;13(2):59-74. doi: 10.62347/VHVU7361. eCollection 2024.

A long noncoding RNA functions in pumpkin fruit development through S-adenosyl-L-methionine synthetase.

Plant Physiol. 2024 May 31;195(2):940-957. doi: 10.1093/plphys/kiae099.

Long non-coding RNAs in cancer: multifaceted roles and potential targets for immunotherapy.

Mol Cell Biochem. 2024 Dec;479(12):3229-3254. doi: 10.1007/s11010-024-04933-1. Epub 2024 Feb 28.

Transcriptome reveals the roles and potential mechanisms of lncRNAs in the regulation of albendazole resistance in Haemonchus contortus.

BMC Genomics. 2024 Feb 17;25(1):188. doi: 10.1186/s12864-024-10096-6.

Genome-Wide Identification and Involvement in Response to Biotic and Abiotic Stresses of lncRNAs in Turbot ().

Int J Mol Sci. 2023 Nov 1;24(21):15870. doi: 10.3390/ijms242115870.

LncRNA MAFG-AS1 is involved in human cancer progression.

Eur J Med Res. 2023 Nov 8;28(1):497. doi: 10.1186/s40001-023-01486-9.

Linc2function: A Comprehensive Pipeline and Webserver for Long Non-Coding RNA (lncRNA) Identification and Functional Predictions Using Deep Learning Approaches.

Epigenomes. 2023 Sep 15;7(3):22. doi: 10.3390/epigenomes7030022.

本文引用的文献

Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain.

PLoS Genet. 2009 Aug;5(8):e1000617. doi: 10.1371/journal.pgen.1000617. Epub 2009 Aug 21.

Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression.

Proc Natl Acad Sci U S A. 2009 Jul 14;106(28):11667-72. doi: 10.1073/pnas.0904715106. Epub 2009 Jul 1.

Identification of a shared genetic susceptibility locus for coronary heart disease and periodontitis.

PLoS Genet. 2009 Feb;5(2):e1000378. doi: 10.1371/journal.pgen.1000378. Epub 2009 Feb 13.

Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals.

Nature. 2009 Mar 12;458(7235):223-7. doi: 10.1038/nature07672. Epub 2009 Feb 1.

Regulation of neural macroRNAs by the transcriptional repressor REST.

RNA. 2009 Jan;15(1):85-96. doi: 10.1261/rna.1127009. Epub 2008 Dec 2.

Differentiating protein-coding and noncoding RNA: challenges and ambiguities.

PLoS Comput Biol. 2008 Nov;4(11):e1000176. doi: 10.1371/journal.pcbi.1000176. Epub 2008 Nov 28.

NRED: a database of long noncoding RNA expression.

Nucleic Acids Res. 2009 Jan;37(Database issue):D122-6. doi: 10.1093/nar/gkn617. Epub 2008 Oct 1.

Expression of a noncoding RNA is elevated in Alzheimer's disease and drives rapid feed-forward regulation of beta-secretase.

Nat Med. 2008 Jul;14(7):723-30. doi: 10.1038/nm1784. Epub 2008 Jun 29.

Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation.

Genome Res. 2008 Sep;18(9):1433-45. doi: 10.1101/gr.078378.108. Epub 2008 Jun 18.

Specific expression of long noncoding RNAs in the mouse brain.

Proc Natl Acad Sci U S A. 2008 Jan 15;105(2):716-21. doi: 10.1073/pnas.0706729105. Epub 2008 Jan 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全基因组计算鉴定和人工注释人类长非编码 RNA 基因。

Genome-wide computational identification and manual annotation of human long noncoding RNA genes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献