Lu Yao, Lu Yulan, Deng Jingyuan, Peng Hai, Lu Hui, Lu Long Jason
Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, 24/1400 Beijing (W) Road, Shanghai 200040, People's Republic of China.
State Key Laboratory of Genetic Engineering Institute of Biostatistics, School of Life Science, Fudan University, Shanghai 200433, People's Republic of China.
Bioinformatics. 2015 Sep 15;31(18):2921-9. doi: 10.1093/bioinformatics/btv312. Epub 2015 May 22.
Genes with indispensable functions are identified as essential; however, the traditional gene-level studies of essentiality have several limitations. In this study, we characterized gene essentiality from a new perspective of protein domains, the independent structural or functional units of a polypeptide chain.
To identify such essential domains, we have developed an Expectation-Maximization (EM) algorithm-based Essential Domain Prediction (EDP) Model. With simulated datasets, the model provided convergent results given different initial values and offered accurate predictions even with noise. We then applied the EDP model to six microbial species and predicted 1879 domains to be essential in at least one species, ranging 10-23% in each species. The predicted essential domains were more conserved than either non-essential domains or essential genes. Comparing essential domains in prokaryotes and eukaryotes revealed an evolutionary distance consistent with that inferred from ribosomal RNA. When utilizing these essential domains to reproduce the annotation of essential genes, we received accurate results that suggest protein domains are more basic units for the essentiality of genes. Furthermore, we presented several examples to illustrate how the combination of essential and non-essential domains can lead to genes with divergent essentiality. In summary, we have described the first systematic analysis on gene essentiality on the level of domains.
huilu.bioinfo@gmail.com or Long.Lu@cchmc.org
Supplementary data are available at Bioinformatics online.
具有不可或缺功能的基因被鉴定为必需基因;然而,传统的基于基因层面的必需性研究存在若干局限性。在本研究中,我们从蛋白质结构域这一多肽链的独立结构或功能单元的新视角对基因必需性进行了表征。
为了鉴定此类必需结构域,我们开发了一种基于期望最大化(EM)算法的必需结构域预测(EDP)模型。对于模拟数据集,该模型在不同初始值下均能提供收敛结果,并且即使存在噪声也能给出准确预测。然后,我们将EDP模型应用于六种微生物物种,预测出至少在一种物种中为必需的1879个结构域,每种物种中的比例在10% - 23%之间。预测的必需结构域比非必需结构域或必需基因更为保守。比较原核生物和真核生物中的必需结构域发现,其进化距离与从核糖体RNA推断出的一致。当利用这些必需结构域来重现必需基因的注释时,我们得到了准确的结果,这表明蛋白质结构域是基因必需性的更基本单位。此外,我们给出了几个例子来说明必需结构域和非必需结构域的组合如何导致具有不同必需性的基因。总之,我们描述了首次在结构域水平上对基因必需性进行的系统分析。
huilu.bioinfo@gmail.com 或 Long.Lu@cchmc.org
补充数据可在《生物信息学》在线获取。