Lu Yulan, Lu Yao, Deng Jingyuan, Lu Hui, Lu Long Jason
State Key Laboratory of Genetic Engineering, Institute of Biostatistics, School of Life Science, Fudan University, 220 Handan Road, Shanghai, 200433, People's Republic of China.
Methods Mol Biol. 2015;1279:235-45. doi: 10.1007/978-1-4939-2398-4_15.
Genes with indispensable functions are identified as essential; however, the traditional gene-level perspective of essentiality has several limitations. We hypothesized that protein domains, the independent structural or functional units of a polypeptide chain, are responsible for gene essentiality. If the essentiality of domains is known, the essential genes could be identified. To find such essential domains, we have developed an EM algorithm-based Essential Domain Prediction (EDP) Model. With simulated datasets, the model provided convergent results given different initial values and offered accurate predictions even with noise. We then applied the EDP model to six microbes and predicted 3,450 domains to be essential in at least one species, ranging 8-24 % in each species.
具有不可或缺功能的基因被确定为必需基因;然而,传统的基于基因层面的必需性观点存在若干局限性。我们推测,蛋白质结构域作为多肽链的独立结构或功能单元,是基因必需性的决定因素。如果已知结构域的必需性,那么就可以识别出必需基因。为了找到这类必需结构域,我们开发了一种基于期望最大化(EM)算法的必需结构域预测(EDP)模型。对于模拟数据集,该模型在不同初始值的情况下都能给出收敛结果,并且即使存在噪声也能提供准确的预测。然后,我们将EDP模型应用于六种微生物,预测出至少在一个物种中为必需的3450个结构域,每个物种中的比例在8%至24%之间。