生物信息：使其易于获取并整合（并尝试理解其意义）。

Biological information: making it accessible and integrated (and trying to make sense of it).

作者信息

Hubbard Tim

机构信息

Sanger Institute, Cambridgeshire, UK.

出版信息

Bioinformatics. 2002;18 Suppl 2:S140. doi: 10.1093/bioinformatics/18.suppl_2.s140.

DOI:10.1093/bioinformatics/18.suppl_2.s140

PMID:12385995

Abstract

The availability of the genome sequences of human and mouse, human sequence variation data and other large genetic data sets will lead to a revolution in understanding of the human machine and the treatment of its diseases. The success of the international genome sequencing consortiums shows what can be achieved by well coordinated large scale public domain projects and the benefits of data access to all. It is already clear that the availability of this sequence is having a huge impact on research worldwide. Complete genome sequences provide a framework to pull all biological data together such that each piece has the potential to say something about biology as a whole. Biology is too complex for any organisation to have a monopoly of ideas or data, so the collection, analysis and access to this data can be contributed to by research institutes around the world. However, although it is possible for all this data to be accessible to all through the internet, the more organisations provide data or analysis separately, the harder it becomes for anyone to collect and integrate the results. To address these problems of intergration of data, open standards for biological data exchange, such as the 'Distributed Annotation System' (DAS) are being developed and bioinformatics (Dowell et al., 2001) as a whole is now being strongly driven by the open source software (OSS) model for collaborative software development (Hubbard and Birney, 1999). The leading provider of human genome annotation, the Ensembl project (http://www.ensembl.org), is entirely an OSS project and has been widely adopted by academic and commerical organisations alike (Hubbard et al., 2002). Accurate automatic annotation of features such as genes in vertebrate genomes currently relies on supporting evidence in the form of homologies to mRNAs, ESTs or protein. However, it appears that sufficient high quality experimentally curated annotation now exists to be used as a substrate for machine learning algorithms to create effective models of biological signal sequences (Down and Hubbard, 2002). Is there hope for ab initio prediction methods after all?

摘要

人类和小鼠基因组序列、人类序列变异数据以及其他大型遗传数据集的可得性，将引发一场对人类机体及其疾病治疗理解的革命。国际基因组测序联盟的成功表明，精心协调的大规模公共领域项目能够取得怎样的成果，以及数据对所有人开放的益处。很明显，这些序列的可得性正在对全球研究产生巨大影响。完整的基因组序列提供了一个框架，可将所有生物数据整合在一起，使每一部分都有可能揭示关于整个生物学的信息。生物学过于复杂，任何一个组织都无法垄断思想或数据，因此世界各地的研究机构都可以为这些数据的收集、分析和获取做出贡献。然而，尽管所有这些数据有可能通过互联网供所有人访问，但越多的组织单独提供数据或分析，就越难有人收集和整合这些结果。为了解决数据整合的这些问题，正在开发生物数据交换的开放标准，如“分布式注释系统”（DAS），并且作为一个整体的生物信息学（Dowell等人，2001年）现在正受到开源软件（OSS）协作软件开发模式的有力推动（Hubbard和Birney，1999年）。人类基因组注释的领先提供者Ensembl项目（http://www.ensembl.org）完全是一个OSS项目，已被学术和商业组织广泛采用（Hubbard等人，2002年）。目前，脊椎动物基因组中基因等特征的准确自动注释依赖于与mRNA、EST或蛋白质同源性形式的支持证据。然而，似乎现在已经存在足够高质量的经实验整理的注释，可作为机器学习算法创建生物信号序列有效模型的基础（Down和Hubbard，2002年）。那么，从头预测方法到底还有希望吗？

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

生物信息：使其易于获取并整合（并尝试理解其意义）。

Biological information: making it accessible and integrated (and trying to make sense of it).

作者信息

机构信息

出版信息

相似文献

引用本文的文献

生物信息：使其易于获取并整合（并尝试理解其意义）。

Biological information: making it accessible and integrated (and trying to make sense of it).

作者信息

机构信息

出版信息

相似文献

引用本文的文献