Shi Haihe, Zhang Xuchu
School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China.
Front Genet. 2020 Feb 27;11:105. doi: 10.3389/fgene.2020.00105. eCollection 2020.
In recent years, there has been an explosive increase in the amount of bioinformatics data produced, but data are not information. The purpose of bioinformatics research is to obtain information with biological significance from large amounts of data. Multiple sequence alignment is widely used in sequence homology detection, protein secondary and tertiary structure prediction, phylogenetic tree analysis, and other fields. Existing research mainly focuses on the specific steps of the algorithm or on specific problems, and there is a lack of high-level abstract domain algorithm frameworks. As a result, multiple sequence alignment algorithms are complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm, which may lead to computing errors. Here, through in-depth study and analysis of the heuristic multiple sequence alignment algorithm (HMSAA) domain, a domain-feature model and an interactive model of HMSAA components have been established according to the generative programming method. With the support of the PAR (partition and recur) platform, the HMSAA algorithm component library is formalized and a specific alignment algorithm is assembled, thus improving the reliability of algorithm assembly. This work provides a valuable theoretical reference for the applications of other biological sequence analysis algorithms.
近年来,所产生的生物信息学数据量呈爆发式增长,但数据并非信息。生物信息学研究的目的是从大量数据中获取具有生物学意义的信息。多序列比对广泛应用于序列同源性检测、蛋白质二级和三级结构预测、系统发育树分析等领域。现有研究主要集中在算法的具体步骤或特定问题上,缺乏高层次的抽象领域算法框架。因此,多序列比对算法复杂、冗余且难以理解,用户不易选择合适的算法,这可能导致计算错误。在此,通过对启发式多序列比对算法(HMSAA)领域的深入研究与分析,依据生成式编程方法建立了HMSAA的领域特征模型和组件交互模型。在PAR(划分与递归)平台的支持下,对HMSAA算法组件库进行形式化并组装特定的比对算法,从而提高了算法组装的可靠性。这项工作为其他生物序列分析算法的应用提供了有价值的理论参考。