Hourai Yuichiro, Akutsu Tatsuya, Akiyama Yutaka
Department of Computer Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan.
Bioinformatics. 2004 Apr 12;20(6):863-73. doi: 10.1093/bioinformatics/btg494. Epub 2004 Jan 29.
Homology search is one of the most fundamental tools in Bioinformatics. Typical alignment algorithms use substitution matrices and gap costs. Thus, the improvement of substitution matrices increases accuracy of homology searches. Generally, substitution matrices are derived from aligned sequences whose relationships are known, and gap costs are determined by trial and error. To discriminate relationships more clearly, we are encouraged to optimize the substitution matrices from statistical viewpoints using both positive and negative examples utilizing Bayesian decision theory.
Using Cluster of Orthologous Group (COG) database, we optimized substitution matrices. The classification accuracy of the obtained matrix is better than that of conventional substitution matrices to COG database. It also achieves good performance in classifying with other databases.
同源性搜索是生物信息学中最基本的工具之一。典型的比对算法使用替换矩阵和空位罚分。因此,改进替换矩阵可提高同源性搜索的准确性。一般来说,替换矩阵是从关系已知的比对序列中推导出来的,空位罚分则通过反复试验来确定。为了更清晰地区分关系,我们鼓励利用贝叶斯决策理论,从统计角度使用正例和反例来优化替换矩阵。
我们利用直系同源群(COG)数据库优化了替换矩阵。所得矩阵对COG数据库的分类准确率高于传统替换矩阵。在使用其他数据库进行分类时,它也表现良好。