利用序列固有组成对蛋白编码和长非编码转录本进行分类。

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts.

机构信息

Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.

出版信息

Nucleic Acids Res. 2013 Sep;41(17):e166. doi: 10.1093/nar/gkt646. Epub 2013 Jul 27.

DOI:10.1093/nar/gkt646

PMID:23892401

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3783192/

Abstract

It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci.

摘要

对蛋白质编码或非编码转录本进行分类是一项挑战，特别是对那些来自注释较差的物种的高通量测序数据进行重建的转录本。本研究通过分析相邻的三核苷酸来开发和评估一种强大的特征工具——编码-非编码指数（CNCI），从而有效区分蛋白质编码和非编码序列，而无需依赖已知的注释。CNCI 可有效用于分类不完整的转录本和有义-反义对。CNCI 的实现以跨物种的方式对来自全转录组测序数据组装的转录本进行了高度准确的分类，这表明了脊椎动物和无脊椎动物之间，或植物和动物之间的基因进化分歧，并提供了猩猩的长非编码 RNA 目录。CNCI 软件可在 http://www.bioinfo.org/software/cnci 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ac7/3783192/1d1c031e85d4/gkt646f1p.jpg

相似文献

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts.

Nucleic Acids Res. 2013 Sep;41(17):e166. doi: 10.1093/nar/gkt646. Epub 2013 Jul 27.

De novo approach to classify protein-coding and noncoding transcripts based on sequence composition.

Methods Mol Biol. 2014;1182:203-7. doi: 10.1007/978-1-4939-1062-5_18.

CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition.

Nucleic Acids Res. 2019 Jul 2;47(W1):W516-W522. doi: 10.1093/nar/gkz400.

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.

BMC Bioinformatics. 2014 Sep 19;15(1):311. doi: 10.1186/1471-2105-15-311.

CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features.

Nucleic Acids Res. 2017 Jul 3;45(W1):W12-W16. doi: 10.1093/nar/gkx428.

TERIUS: accurate prediction of lncRNA via high-throughput sequencing data representing RNA-binding protein association.

BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):41. doi: 10.1186/s12859-018-2013-9.

LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature.

PLoS One. 2016 May 26;11(5):e0154567. doi: 10.1371/journal.pone.0154567. eCollection 2016.

BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

Nucleic Acids Res. 2018 Sep 19;46(16):e96. doi: 10.1093/nar/gky462.

lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts.

Sci Rep. 2016 Oct 6;6:34838. doi: 10.1038/srep34838.

Identification of long non-coding transcripts with feature selection: a comparative study.

BMC Bioinformatics. 2017 Mar 23;18(1):187. doi: 10.1186/s12859-017-1594-z.

引用本文的文献

Transcriptome profiling of mRNA and lncRNA involved in wax biosynthesis in cauliflower.

Sci Data. 2025 Aug 29;12(1):1511. doi: 10.1038/s41597-025-05816-w.

Integrating Full-Length and Second-Generation Transcriptomes to Elucidate the ApNPV-Induced Transcriptional Reprogramming in Midgut.

Insects. 2025 Jul 31;16(8):792. doi: 10.3390/insects16080792.

Differential gene expression drives muscle metabolic and structural differences in Liang Guang small spotted vs. large white pigs.

Sci Rep. 2025 Aug 27;15(1):31564. doi: 10.1038/s41598-025-17179-8.

Integrated Transcriptomic and Metabolomic Analyses Shed Light on the Regulation of Aromatic Amino Acid Biosynthesis in a Novel Albino Tea () Mutation.

Curr Issues Mol Biol. 2025 Aug 12;47(8):644. doi: 10.3390/cimb47080644.

Whole-transcriptome insights into follicle selection: deciphering key regulatory networks in Luxi gamecock.

Front Genet. 2025 Aug 6;16:1620058. doi: 10.3389/fgene.2025.1620058. eCollection 2025.

Decoding circRNA translation: challenges and advances in computational method development.

Front Genet. 2025 Jul 29;16:1654305. doi: 10.3389/fgene.2025.1654305. eCollection 2025.

Long-read sequencing uncovers key regulatory genes involved in the differentiation of preadipocytes of Chinese red steppe cattle.

Sci Rep. 2025 Aug 12;15(1):29459. doi: 10.1038/s41598-025-15106-5.

Preliminary investigation of the effect of ferulic acid on miRNAs and LncRNAs in Mongolian horse skeletal muscle satellite cells.

Front Genet. 2025 Jul 18;16:1630614. doi: 10.3389/fgene.2025.1630614. eCollection 2025.

Exploring the regulatory role of long non-coding RNAs in pigmentation in juvenile Plectropomus leopardus.

Sci Rep. 2025 Jul 31;15(1):27977. doi: 10.1038/s41598-025-13347-y.

Integrated Multi-Omics Reveals DAM-Mediated Phytohormone Regulatory Networks Driving Bud Dormancy in 'Mixue' Pears.

Plants (Basel). 2025 Jul 14;14(14):2172. doi: 10.3390/plants14142172.

本文引用的文献

CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model.

Nucleic Acids Res. 2013 Apr 1;41(6):e74. doi: 10.1093/nar/gkt006. Epub 2013 Jan 17.

Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks.

Nucleic Acids Res. 2013 Jan;41(2):e35. doi: 10.1093/nar/gks967. Epub 2012 Nov 5.

The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.

Genome Res. 2012 Sep;22(9):1775-89. doi: 10.1101/gr.132159.111.

Landscape of transcription in human cells.

Nature. 2012 Sep 6;489(7414):101-8. doi: 10.1038/nature11233.

An integrated encyclopedia of DNA elements in the human genome.

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.

Nat Protoc. 2012 Mar 1;7(3):562-78. doi: 10.1038/nprot.2012.016.

Modular regulatory principles of large non-coding RNAs.

Nature. 2012 Feb 15;482(7385):339-46. doi: 10.1038/nature10887.

NONCODE v3.0: integrative annotation of long noncoding RNAs.

Nucleic Acids Res. 2012 Jan;40(Database issue):D210-5. doi: 10.1093/nar/gkr1175. Epub 2011 Dec 1.

Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species.

Nucleic Acids Res. 2012 Jan;40(Database issue):D91-7. doi: 10.1093/nar/gkr895. Epub 2011 Nov 8.

The evolution of gene expression levels in mammalian organs.

Nature. 2011 Oct 19;478(7369):343-8. doi: 10.1038/nature10532.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用序列固有组成对蛋白编码和长非编码转录本进行分类。

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献