Suppr超能文献

PLEK:一种基于改进的k-mer方案预测长链非编码RNA和信使RNA的工具。

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.

作者信息

Li Aimin, Zhang Junying, Zhou Zhongyin

机构信息

School of Computer Science and Technology, Xidian University, Xi'an, PR China.

出版信息

BMC Bioinformatics. 2014 Sep 19;15(1):311. doi: 10.1186/1471-2105-15-311.

Abstract

BACKGROUND

High-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data. This requires tools that are not restricted by prior gene annotations, genomic sequences and high-quality sequencing.

RESULTS

We present an alignment-free tool called PLEK (predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme), which uses a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from messenger RNAs (mRNAs), in the absence of genomic sequences or annotations. The performance of PLEK was evaluated on well-annotated mRNA and lncRNA transcripts. 10-fold cross-validation tests on human RefSeq mRNAs and GENCODE lncRNAs indicated that our tool could achieve accuracy of up to 95.6%. We demonstrated the utility of PLEK on transcripts from other vertebrates using the model built from human datasets. PLEK attained >90% accuracy on most of these datasets. PLEK also performed well using a simulated dataset and two real de novo assembled transcriptome datasets (sequenced by PacBio and 454 platforms) with relatively high indel sequencing errors. In addition, PLEK is approximately eightfold faster than a newly developed alignment-free tool, named Coding-Non-Coding Index (CNCI), and 244 times faster than the most popular alignment-based tool, Coding Potential Calculator (CPC), in a single-threading running manner.

CONCLUSIONS

PLEK is an efficient alignment-free computational tool to distinguish lncRNAs from mRNAs in RNA-seq transcriptomes of species lacking reference genomes. PLEK is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data. Its open-source software can be freely downloaded from https://sourceforge.net/projects/plek/files/.

摘要

背景

高通量转录组测序(RNA-seq)技术有望发现新的蛋白质编码和非编码转录本,尤其是从从头测序数据中鉴定长链非编码RNA(lncRNA)。这需要不受先前基因注释、基因组序列和高质量测序限制的工具。

结果

我们提出了一种名为PLEK(基于改进的k-mer方案的长链非编码RNA和信使RNA预测器)的无比对工具,它使用基于改进的k-mer方案和支持向量机(SVM)算法的计算流程,在没有基因组序列或注释的情况下将lncRNA与信使RNA(mRNA)区分开来。在注释良好的mRNA和lncRNA转录本上评估了PLEK的性能。对人类RefSeq mRNA和GENCODE lncRNA进行的10倍交叉验证测试表明,我们的工具准确率可达95.6%。我们使用从人类数据集构建的模型证明了PLEK在其他脊椎动物转录本上的实用性。PLEK在大多数这些数据集上的准确率超过90%。PLEK在具有相对较高插入缺失测序错误的模拟数据集和两个真实的从头组装转录组数据集(由PacBio和454平台测序)上也表现良好。此外,在单线程运行方式下,PLEK比新开发的名为编码-非编码指数(CNCI)的无比对工具快约八倍,比最流行的基于比对的工具编码潜能计算器(CPC)快244倍。

结论

PLEK是一种高效的无比对计算工具,可在缺乏参考基因组的物种的RNA-seq转录组中区分lncRNA和mRNA。PLEK特别适用于PacBio或454测序数据以及大规模转录组数据。其开源软件可从https://sourceforge.net/projects/plek/files/免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c3b/4177586/811a97a01b37/12859_2013_6586_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验