Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia.
Appl Microbiol Biotechnol. 2012 Mar;93(5):2091-8. doi: 10.1007/s00253-012-3917-3. Epub 2012 Feb 4.
Interesting biological information as, for example, gene expression data (microarrays), can be extracted from publicly available genomic data. As a starting point in order to narrow down the great possibilities of wet lab experiments, global high throughput data and available knowledge should be used to infer biological knowledge and emit biological hypothesis. Here, based on microarray data, we propose the use of cluster and classification methods that have become very popular and are implemented in freely available software in order to predict the participation in virulence mechanisms of different proteins coded by genes of the pathogen Streptococcus pyogenes. Confidence of predictions is based on classification errors of known genes and repetitive prediction by more than three methods. A special emphasis is done on the nonlinear kernel classification methods used. We propose a list of interesting candidates that could be virulence factors or that participate in the virulence process of S. pyogenes. Biological validations should start using this list of candidates as they show similar behavior to known virulence factors.
有趣的生物学信息,例如基因表达数据(微阵列),可以从公开的基因组数据中提取出来。作为缩小湿实验室实验可能性的起点,应该使用全局高通量数据和可用知识来推断生物学知识并提出生物学假设。在这里,基于微阵列数据,我们提出使用聚类和分类方法,这些方法已经非常流行,并在免费提供的软件中实现,以便预测不同基因编码的蛋白质参与病原体酿脓链球菌毒力机制的情况。预测的置信度基于已知基因的分类错误和三种以上方法的重复预测。特别强调使用的非线性核分类方法。我们提出了一系列有趣的候选者,它们可能是毒力因子或参与酿脓链球菌毒力过程。应该开始使用这个候选者列表进行生物学验证,因为它们表现出与已知毒力因子相似的行为。