Jeong Hyun-Hwan, Yalamanchili Hari Krishna, Guo Caiwei, Shulman Joshua M, Liu Zhandong
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA, ²Jan and Dan Duncan Neurological Research Institute, Texas Childrens Hospital, Houston, Texas 77030, USA.
Pac Symp Biocomput. 2018;23:168-179.
Transposable elements (TEs) are DNA sequences which are capable of moving from one location to another and represent a large proportion (45%) of the human genome. TEs have functional roles in a variety of biological phenomena such as cancer, neurodegenerative disease, and aging. Rapid development in RNA-sequencing technology has enabled us, for the first time, to study the activity of TE at the systems level.However, efficient TE analysis tools are not yet developed. In this work, we developed SalmonTE, a fast and reliable pipeline for the quantification of TEs from RNA-seq data. We benchmarked our tool against TEtranscripts, a widely used TE quantification method, and three other quantification methods using several RNA-seq datasets from Drosophila melanogaster and human cell-line. We achieved 20 times faster execution speed without compromising the accuracy. This pipeline will enable the biomedical research community to quantify and analyze TEs from large amounts of data and lead to novel TE centric discoveries.
转座元件(TEs)是能够从一个位置移动到另一个位置的DNA序列,占人类基因组的很大比例(45%)。转座元件在多种生物学现象中发挥功能作用,如癌症、神经退行性疾病和衰老。RNA测序技术的快速发展使我们首次能够在系统水平上研究转座元件的活性。然而,高效的转座元件分析工具尚未开发出来。在这项工作中,我们开发了SalmonTE,这是一种用于从RNA测序数据中定量转座元件的快速且可靠的流程。我们使用来自黑腹果蝇和人类细胞系的多个RNA测序数据集,将我们的工具与广泛使用的转座元件定量方法TEtranscripts以及其他三种定量方法进行了基准测试。我们在不影响准确性的情况下实现了快20倍的执行速度。这个流程将使生物医学研究界能够从大量数据中定量和分析转座元件,并带来以转座元件为中心的新发现。