Bansal Arvind K
Department of Computer Science, Kent State University, Kent, OH 44242, USA.
Microb Cell Fact. 2005 Jun 28;4:19. doi: 10.1186/1475-2859-4-19.
The revolutionary growth in the computation speed and memory storage capability has fueled a new era in the analysis of biological data. Hundreds of microbial genomes and many eukaryotic genomes including a cleaner draft of human genome have been sequenced raising the expectation of better control of microorganisms. The goals are as lofty as the development of rational drugs and antimicrobial agents, development of new enhanced bacterial strains for bioremediation and pollution control, development of better and easy to administer vaccines, the development of protein biomarkers for various bacterial diseases, and better understanding of host-bacteria interaction to prevent bacterial infections. In the last decade the development of many new bioinformatics techniques and integrated databases has facilitated the realization of these goals. Current research in bioinformatics can be classified into: (i) genomics--sequencing and comparative study of genomes to identify gene and genome functionality, (ii) proteomics--identification and characterization of protein related properties and reconstruction of metabolic and regulatory pathways, (iii) cell visualization and simulation to study and model cell behavior, and (iv) application to the development of drugs and anti-microbial agents. In this article, we will focus on the techniques and their limitations in genomics and proteomics. Bioinformatics research can be classified under three major approaches: (1) analysis based upon the available experimental wet-lab data, (2) the use of mathematical modeling to derive new information, and (3) an integrated approach that integrates search techniques with mathematical modeling. The major impact of bioinformatics research has been to automate the genome sequencing, automated development of integrated genomics and proteomics databases, automated genome comparisons to identify the genome function, automated derivation of metabolic pathways, gene expression analysis to derive regulatory pathways, the development of statistical techniques, clustering techniques and data mining techniques to derive protein-protein and protein-DNA interactions, and modeling of 3D structure of proteins and 3D docking between proteins and biochemicals for rational drug design, difference analysis between pathogenic and non-pathogenic strains to identify candidate genes for vaccines and anti-microbial agents, and the whole genome comparison to understand the microbial evolution. The development of bioinformatics techniques has enhanced the pace of biological discovery by automated analysis of large number of microbial genomes. We are on the verge of using all this knowledge to understand cellular mechanisms at the systemic level. The developed bioinformatics techniques have potential to facilitate (i) the discovery of causes of diseases, (ii) vaccine and rational drug design, and (iii) improved cost effective agents for bioremediation by pruning out the dead ends. Despite the fast paced global effort, the current analysis is limited by the lack of available gene-functionality from the wet-lab data, the lack of computer algorithms to explore vast amount of data with unknown functionality, limited availability of protein-protein and protein-DNA interactions, and the lack of knowledge of temporal and transient behavior of genes and pathways.
计算速度和内存存储能力的革命性增长推动了生物数据分析的新时代。数百种微生物基因组以及包括更完善的人类基因组草图在内的许多真核生物基因组已被测序,这提高了对微生物进行更好控制的期望。目标之高远,涵盖了开发合理药物和抗菌剂、培育用于生物修复和污染控制的新型强化细菌菌株、开发更好且易于施用的疫苗、开发针对各种细菌性疾病的蛋白质生物标志物,以及更深入地了解宿主与细菌的相互作用以预防细菌感染。在过去十年中,许多新的生物信息学技术和综合数据库的发展促进了这些目标的实现。当前的生物信息学研究可分为:(i)基因组学——对基因组进行测序和比较研究以确定基因和基因组功能;(ii)蛋白质组学——鉴定和表征与蛋白质相关的特性以及重建代谢和调节途径;(iii)细胞可视化和模拟以研究和模拟细胞行为;(iv)应用于药物和抗菌剂的开发。在本文中,我们将重点关注基因组学和蛋白质组学中的技术及其局限性。生物信息学研究可分为三种主要方法:(1)基于现有的实验湿实验室数据进行分析;(2)使用数学建模来获取新信息;(3)一种将搜索技术与数学建模相结合的综合方法。生物信息学研究的主要影响在于实现了基因组测序自动化、综合基因组学和蛋白质组学数据库的自动化开发、通过自动化基因组比较来确定基因组功能、代谢途径的自动推导、通过基因表达分析来推导调节途径、开发统计技术、聚类技术和数据挖掘技术以获取蛋白质 - 蛋白质和蛋白质 - DNA 相互作用,以及对蛋白质的三维结构进行建模和蛋白质与生化物质之间的三维对接以进行合理药物设计、对致病菌株和非致病菌株进行差异分析以确定疫苗和抗菌剂的候选基因以及进行全基因组比较以了解微生物进化。生物信息学技术的发展通过对大量微生物基因组的自动化分析加快了生物学发现的步伐。我们即将利用所有这些知识在系统层面理解细胞机制。已开发的生物信息学技术有潜力促进:(i)发现疾病病因;(ii)疫苗和合理药物设计;(iii)通过排除无用环节改进具有成本效益的生物修复剂。尽管全球都在快马加鞭地努力,但目前的分析仍受到以下限制:缺乏来自湿实验室数据的可用基因功能信息、缺乏探索大量具有未知功能数据的计算机算法、蛋白质 - 蛋白质和蛋白质 - DNA 相互作用的信息有限,以及缺乏对基因和途径的时间和瞬时行为的了解。