Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
Nucleic Acids Res. 2013 Sep;41(17):e166. doi: 10.1093/nar/gkt646. Epub 2013 Jul 27.
It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci.
对蛋白质编码或非编码转录本进行分类是一项挑战,特别是对那些来自注释较差的物种的高通量测序数据进行重建的转录本。本研究通过分析相邻的三核苷酸来开发和评估一种强大的特征工具——编码-非编码指数(CNCI),从而有效区分蛋白质编码和非编码序列,而无需依赖已知的注释。CNCI 可有效用于分类不完整的转录本和有义-反义对。CNCI 的实现以跨物种的方式对来自全转录组测序数据组装的转录本进行了高度准确的分类,这表明了脊椎动物和无脊椎动物之间,或植物和动物之间的基因进化分歧,并提供了猩猩的长非编码 RNA 目录。CNCI 软件可在 http://www.bioinfo.org/software/cnci 上获得。