Hefei National Laboratory for Physical Sciences at Microscale and Department of Life Sciences, University of Science and Technology of China, Hefei 230027, China.
Nucleic Acids Res. 2013 Jan;41(Database issue):D1055-62. doi: 10.1093/nar/gks1186. Epub 2012 Nov 28.
Human infertility affects 10-15% of couples, half of which is attributed to the male partner. Abnormal spermatogenesis is a major cause of male infertility. Characterizing the genes involved in spermatogenesis is fundamental to understand the mechanisms underlying this biological process and in developing treatments for male infertility. Although many genes have been implicated in spermatogenesis, no dedicated bioinformatic resource for spermatogenesis is available. We have developed such a database, SpermatogenesisOnline 1.0 (http://mcg.ustc.edu.cn/sdap1/spermgenes/), using manual curation from 30 233 articles published before 1 May 2012. It provides detailed information for 1666 genes reported to participate in spermatogenesis in 37 organisms. Based on the analysis of these genes, we developed an algorithm, Greed AUC Stepwise (GAS) model, which predicted 762 genes to participate in spermatogenesis (GAS probability >0.5) based on genome-wide transcriptional data in Mus musculus testis from the ArrayExpress database. These predicted and experimentally verified genes were annotated, with several identical spermatogenesis-related GO terms being enriched for both classes. Furthermore, protein-protein interaction analysis indicates direct interactions of predicted genes with the experimentally verified ones, which supports the reliability of GAS. The strategy (manual curation and data mining) used to develop SpermatogenesisOnline 1.0 can be easily extended to other biological processes.
人类不孕不育影响了 10-15%的夫妇,其中一半归因于男性伴侣。异常的精子发生是男性不育的主要原因。描述参与精子发生的基因对于理解这一生物学过程的机制以及开发男性不育症的治疗方法至关重要。尽管许多基因与精子发生有关,但目前还没有专门针对精子发生的生物信息学资源。我们使用手动整理,从 2012 年 5 月 1 日之前发表的 30233 篇文章中,开发了这样一个数据库,即 SpermatogenesisOnline 1.0(http://mcg.ustc.edu.cn/sdap1/spermgenes/)。它为 37 种生物中报告参与精子发生的 1666 个基因提供了详细信息。基于这些基因的分析,我们开发了一种算法,即贪婪 AUC 逐步(GAS)模型,该模型基于 ArrayExpress 数据库中来自 Mus musculus 睾丸的全基因组转录数据,预测了 762 个参与精子发生的基因(GAS 概率>0.5)。这些预测的和经过实验验证的基因被注释,其中几个相同的与精子发生相关的 GO 术语在这两类中都被富集。此外,蛋白质-蛋白质相互作用分析表明,预测基因与实验验证基因之间存在直接相互作用,这支持了 GAS 的可靠性。用于开发 SpermatogenesisOnline 1.0 的策略(手动整理和数据挖掘)可以很容易地扩展到其他生物学过程。