Dept. of Computer Science, College of SW Convergence, Dankook Univ, Yongin-si, 16890, Korea.
Center for Bio-Medical Engineering Core Facility, Dankook Univ, Cheonan, 31116, Korea.
Genes Genomics. 2023 Dec;45(12):1599-1609. doi: 10.1007/s13258-023-01458-7. Epub 2023 Oct 14.
Reconstruction of amino acid sequences from assembled transcriptome is of interest in personalized medicine, for example, to predict drug-target (or protein-protein) interaction considering individual's genomic variations. Most of the existing transcriptome assemblers, however, seems not well suited for this purpose.
In this work, we present StringFix, an annotation guided transcriptome assembly and protein sequence reconstruction software tool that takes genome-aligned reads and the annotations associated to the reference genome as input. The tool 'fixes' the pre-annotated transcript sequence by taking small variations into account, finally to produce possible amino acid sequences that are likely to exist in the test tissue.
The results show that, using outputs from existing reference-based assemblers as the input GTF-guide, StringFix could reconstruct amino acid sequences more precisely with higher sensitivity than direct generation using the recovered transcripts from all the assemblers we tested.
By using StringFix with the existing reference-based assemblers, one can recover not only a novel transcripts and isoforms but also the possible amino acid sequence stemming from them.
在个性化医疗中,从组装的转录组中重建氨基酸序列很有趣,例如,考虑到个体的基因组变异,预测药物靶点(或蛋白质-蛋白质)相互作用。然而,大多数现有的转录组组装器似乎不太适合这个目的。
在这项工作中,我们提出了 StringFix,这是一种注释指导的转录组组装和蛋白质序列重建软件工具,它以基因组对齐的读取和与参考基因组相关的注释作为输入。该工具通过考虑小的变化来“修复”预先注释的转录本序列,最终生成可能存在于测试组织中的可能的氨基酸序列。
结果表明,使用现有基于参考的组装器的输出作为输入 GTF 指南,StringFix 可以比直接使用我们测试的所有组装器恢复的转录本更精确地重建氨基酸序列,并且具有更高的敏感性。
通过使用 StringFix 和现有的基于参考的组装器,不仅可以恢复新的转录本和异构体,还可以恢复可能由它们产生的氨基酸序列。