ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Universidade de Vigo, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain.
CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain.
Interdiscip Sci. 2018 Mar;10(1):24-32. doi: 10.1007/s12539-018-0282-7. Epub 2018 Jan 30.
When changes at few amino acid sites are the target of selection, adaptive amino acid changes in protein sequences can be identified using maximum-likelihood methods based on models of codon substitution (such as codeml). Although such methods have been employed numerous times using a variety of different organisms, the time needed to collect the data and prepare the input files means that tens or hundreds of coding regions are usually analyzed. Nevertheless, the recent availability of flexible and easy to use computer applications that collect relevant data (such as BDBM) and infer positively selected amino acid sites (such as ADOPS), means that the entire process is easier and quicker than before. However, the lack of a batch option in ADOPS, here reported, still precludes the analysis of hundreds or thousands of sequence files. Given the interest and possibility of running such large-scale projects, we have also developed a database where ADOPS projects can be stored. Therefore, this study also presents the B+ database, which is both a data repository and a convenient interface that looks at the information contained in ADOPS projects without the need to download and unzip the corresponding ADOPS project file. The ADOPS projects available at B+ can also be downloaded, unzipped, and opened using the ADOPS graphical interface. The availability of such a database ensures results repeatability, promotes data reuse with significant savings on the time needed for preparing datasets, and effortlessly allows further exploration of the data contained in ADOPS projects.
当少数氨基酸位点的变化成为选择的目标时,可以使用基于密码子替换模型(如 codeml)的最大似然方法来识别蛋白质序列中的适应性氨基酸变化。虽然这种方法已经在许多不同的生物体中多次使用,但收集数据和准备输入文件所需的时间意味着通常需要分析数十个或数百个编码区域。然而,最近出现了灵活易用的计算机应用程序,可以收集相关数据(如 BDBM)并推断出阳性选择的氨基酸位点(如 ADOPS),这使得整个过程比以前更容易和更快。然而,ADOPS 中缺乏批处理选项,这在本文中有所报道,仍然排除了对数百个或数千个序列文件的分析。鉴于对运行此类大规模项目的兴趣和可能性,我们还开发了一个数据库,用于存储 ADOPS 项目。因此,本研究还介绍了 B+数据库,它既是一个数据存储库,也是一个方便的接口,可以查看 ADOPS 项目中包含的信息,而无需下载和解压相应的 ADOPS 项目文件。B+ 上提供的 ADOPS 项目也可以下载、解压并使用 ADOPS 图形界面打开。这种数据库的可用性确保了结果的可重复性,促进了数据的重用,大大节省了准备数据集所需的时间,并可以轻松地进一步探索 ADOPS 项目中包含的数据。