Pearson William R, Mackey Aaron J
Department of Biochemistry and Molecular Genetics, University of Virginia, School of Medicine, Charlottesville, Virginia.
Department of Public Health Sciences, University of Virginia, School of Medicine, Charlottesville, Virginia.
Curr Protoc Bioinformatics. 2017 Sep 13;59:9.4.1-9.4.22. doi: 10.1002/cpbi.32.
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc.
关系型数据库可以整合各种类型的信息,并管理大量的相似性搜索结果,极大地简化了基因组规模的分析。通过关注序列的分类子集,关系型数据库可以减小序列文库的大小和冗余,并提高同源物的统计显著性。此外,通过将相似性搜索结果加载到关系型数据库中,就有可能探索和总结一个生物体中所有蛋白质与其他生物界中蛋白质之间的关系。本单元描述了如何使用关系型数据库来提高序列相似性搜索的效率,并展示了与同源性相关数据的各种大规模基因组分析。它还描述了一个简单蛋白质序列数据库seqdb_demo的安装和使用,该数据库用作其他方案的基础。本单元还介绍了search_demo,一个存储序列相似性搜索结果的数据库。然后,在大规模比较基因组分析中,使用search_demo数据库来探索大肠杆菌蛋白质与其他生物体中蛋白质之间的进化关系。© 2017约翰威立国际出版公司