Del Val Coral, Glatting Karl-Heinz, Suhai Sandor
Department of Molecular Biophysics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany.
BMC Bioinformatics. 2003 Sep 10;4:39. doi: 10.1186/1471-2105-4-39.
In the last years several high-throughput cDNA sequencing projects have been funded worldwide with the aim of identifying and characterizing the structure of complete novel human transcripts. However some of these cDNAs are error prone due to frameshifts and stop codon errors caused by low sequence quality, or to cloning of truncated inserts, among other reasons. Therefore, accurate CDS prediction from these sequences first require the identification of potentially problematic cDNAs in order to speed up the posterior annotation process.
cDNA2Genome is an application for the automatic high-throughput mapping and characterization of cDNAs. It utilizes current annotation data and the most up to date databases, especially in the case of ESTs and mRNAs in conjunction with a vast number of approaches to gene prediction in order to perform a comprehensive assessment of the cDNA exon-intron structure. The final result of cDNA2Genome is an XML file containing all relevant information obtained in the process. This XML output can easily be used for further analysis such us program pipelines, or the integration of results into databases. The web interface to cDNA2Genome also presents this data in HTML, where the annotation is additionally shown in a graphical form. cDNA2Genome has been implemented under the W3H task framework which allows the combination of bioinformatics tools in tailor-made analysis task flows as well as the sequential or parallel computation of many sequences for large-scale analysis.
cDNA2Genome represents a new versatile and easily extensible approach to the automated mapping and annotation of human cDNAs. The underlying approach allows sequential or parallel computation of sequences for high-throughput analysis of cDNAs.
在过去几年中,全球资助了多个高通量cDNA测序项目,目的是鉴定和表征全新人类转录本的结构。然而,由于序列质量低导致的移码和终止密码子错误,或由于截短插入片段的克隆等原因,这些cDNA中的一些容易出错。因此,要从这些序列中准确预测编码序列(CDS),首先需要识别潜在有问题的cDNA,以便加快后续注释过程。
cDNA2Genome是一个用于cDNA自动高通量定位和表征的应用程序。它利用当前注释数据和最新数据库,特别是在EST和mRNA的情况下,并结合大量基因预测方法,对cDNA外显子-内含子结构进行全面评估。cDNA2Genome的最终结果是一个XML文件,其中包含该过程中获得的所有相关信息。此XML输出可轻松用于进一步分析,如程序管道分析,或将结果整合到数据库中。cDNA2Genome的网络界面也以HTML形式呈现这些数据,其中注释还以图形形式显示。cDNA2Genome是在W3H任务框架下实现的,该框架允许在定制分析任务流程中组合生物信息学工具,并对许多序列进行顺序或并行计算以进行大规模分析。
cDNA2Genome代表了一种用于人类cDNA自动定位和注释新的通用且易于扩展的方法。其基本方法允许对序列进行顺序或并行计算以进行cDNA的高通量分析。