Coxe Tallon, Burks David J, Singh Utkarsh, Mittler Ron, Azad Rajeev K
Department of Biological Sciences and BioDiscovery Institute, College of Science, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA.
Texas Academy of Mathematics and Science, University of North Texas, Denton, TX 76203, USA.
Plants (Basel). 2024 Feb 21;13(5):582. doi: 10.3390/plants13050582.
The utmost goal of selecting an RNA-Seq alignment software is to perform accurate alignments with a robust algorithm, which is capable of detecting the various intricacies underlying read-mapping procedures and beyond. Most alignment software tools are typically pre-tuned with human or prokaryotic data, and therefore may not be suitable for applications to other organisms, such as plants. The rapidly growing plant RNA-Seq databases call for the assessment of the alignment tools on curated plant data, which will aid the calibration of these tools for applications to plant transcriptomic data. We therefore focused here on benchmarking RNA-Seq read alignment tools, using simulated data derived from the model organism We assessed the performance of five popular RNA-Seq alignment tools that are currently available, based on their usage (citation count). By introducing annotated single nucleotide polymorphisms (SNPs) from The Arabidopsis Information Resource (TAIR), we recorded alignment accuracy at both base-level and junction base-level resolutions for each alignment tool. In addition to assessing the performance of the alignment tools at their default settings, accuracies were also recorded by varying the values of numerous parameters, including the confidence threshold and the level of SNP introduction. The performances of the aligners were found consistent under various testing conditions at the base-level accuracy; however, the junction base-level assessment produced varying results depending upon the applied algorithm. At the read base-level assessment, the overall performance of the aligner STAR was superior to other aligners, with the overall accuracy reaching over 90% under different test conditions. On the other hand, at the junction base-level assessment, SubRead emerged as the most promising aligner, with an overall accuracy over 80% under most test conditions.
选择RNA测序比对软件的最终目标是使用强大的算法进行精确比对,该算法能够检测读段比对过程及其他方面的各种复杂情况。大多数比对软件工具通常是根据人类或原核生物数据进行预调整的,因此可能不适用于其他生物,如植物。快速增长的植物RNA测序数据库需要在经过整理的植物数据上评估比对工具,这将有助于校准这些工具以应用于植物转录组数据。因此,我们在此重点使用来自模式生物的模拟数据对标RNA测序读段比对工具进行基准测试。我们根据五种当前可用的流行RNA测序比对工具的使用情况(引用次数)评估了它们的性能。通过引入来自拟南芥信息资源库(TAIR)的注释单核苷酸多态性(SNP),我们记录了每个比对工具在碱基水平和连接碱基水平分辨率下的比对准确性。除了在默认设置下评估比对工具的性能外,还通过改变包括置信阈值和SNP引入水平在内的众多参数的值来记录准确性。发现在各种测试条件下,比对器在碱基水平准确性方面的性能是一致的;然而,连接碱基水平评估根据所应用的算法产生了不同的结果。在读段碱基水平评估中,比对器STAR的整体性能优于其他比对器,在不同测试条件下整体准确率超过90%。另一方面,在连接碱基水平评估中,SubRead成为最有前途的比对器,在大多数测试条件下整体准确率超过80%。