Department of Genome Sciences, University of Washington, Seattle, WA, USA.
Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac105.
The rapid evolution of fertilization proteins has generated remarkable diversity in molecular structure and function. Glycoproteins of vertebrate egg coats contain multiple zona pellucida (ZP)-N domains (1-6 copies) that facilitate multiple reproductive functions, including species-specific sperm recognition. In this report, we integrate phylogenetics and machine learning to investigate how ZP-N domains diversify in structure and function. The most C-terminal ZP-N domain of each paralog is associated with another domain type (ZP-C), which together form a "ZP module." All modular ZP-N domains are phylogenetically distinct from nonmodular or free ZP-N domains. Machine learning-based classification identifies eight residues that form a stabilizing network in modular ZP-N domains that is absent in free domains. Positive selection is identified in some free ZP-N domains. Our findings support that strong purifying selection has conserved an essential structural core in modular ZP-N domains, with the relaxation of this structural constraint allowing free N-terminal domains to functionally diversify.
受精蛋白的快速进化产生了分子结构和功能的显著多样性。脊椎动物卵壳的糖蛋白含有多个透明带(ZP)-N 结构域(1-6 个拷贝),有助于多种生殖功能,包括种特异性精子识别。在本报告中,我们整合系统发生学和机器学习来研究 ZP-N 结构域如何在结构和功能上多样化。每个同源物的最 C 末端 ZP-N 结构域与另一种结构域类型(ZP-C)相关联,它们共同形成“ZP 模块”。所有模块化 ZP-N 结构域在系统发生上与非模块化或游离的 ZP-N 结构域不同。基于机器学习的分类确定了八个残基,这些残基在模块化 ZP-N 结构域中形成一个稳定的网络,而在游离结构域中不存在该网络。一些游离的 ZP-N 结构域中存在正选择。我们的发现支持这样的观点,即强烈的净化选择保守了模块化 ZP-N 结构域中一个重要的结构核心,而这种结构约束的放松允许游离的 N 端结构域在功能上多样化。