Paila Umadevi, Kondam Rohini, Ranjan Akash
Computational and Functional Genomics Group & Sun Centre of Excellence in Medical Bioinformatics, EMBnet India Node, Hyderabad, India.
Nucleic Acids Res. 2008 Dec;36(21):6664-75. doi: 10.1093/nar/gkn635. Epub 2008 Oct 23.
The genomic era has seen a remarkable increase in the number of genomes being sequenced and annotated. Nonetheless, annotation remains a serious challenge for compositionally biased genomes. For the preliminary annotation, popular nucleotide and protein comparison methods such as BLAST are widely employed. These methods make use of matrices to score alignments such as the amino acid substitution matrices. Since a nucleotide bias leads to an overall bias in the amino acid composition of proteins, it is possible that a genome with nucleotide bias may have introduced atypical amino acid substitutions in its proteome. Consequently, standard matrices fail to perform well in sequence analysis of these genomes. To address this issue, we examined the amino acid substitution in the AT-rich genome of Plasmodium falciparum, chosen as a reference and reconstituted a substitution matrix in the genome's context. The matrix was used to generate protein sequence alignments for the parasite proteins that improved across the functional regions. We attribute this to the consistency that may have been achieved amid the target and background frequencies calculated exclusively in our study. This study has important implications on annotation of proteins that are of experimental interest but give poor sequence alignments with standard conventional matrices.
基因组时代,已测序和注释的基因组数量显著增加。尽管如此,对于成分有偏差的基因组,注释仍然是一项严峻挑战。对于初步注释,广泛采用诸如BLAST等流行的核苷酸和蛋白质比较方法。这些方法利用矩阵对序列比对进行评分,比如氨基酸替换矩阵。由于核苷酸偏差会导致蛋白质氨基酸组成出现整体偏差,因此具有核苷酸偏差的基因组可能在其蛋白质组中引入了非典型氨基酸替换。结果,标准矩阵在这些基因组的序列分析中表现不佳。为解决这一问题,我们研究了恶性疟原虫富含AT的基因组中的氨基酸替换,将其选作参考,并在该基因组背景下重构了一个替换矩阵。该矩阵用于生成疟原虫蛋白质的序列比对,这些比对在功能区域得到了改善。我们将此归因于可能在我们的研究中专门计算的目标频率和背景频率之间所实现的一致性。这项研究对于那些具有实验意义但与标准传统矩阵的序列比对效果不佳的蛋白质的注释具有重要意义。