NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
PLoS Comput Biol. 2021 Oct 29;17(10):e1009541. doi: 10.1371/journal.pcbi.1009541. eCollection 2021 Oct.
We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a 'cost' of transforming one group to the other at each position of the alignment. The output distinguishes conserved, variable and signature positions. A signature is conserved within groups but differs between groups. The method automatically detects continuous characteristic stretches (segments) within alignments. TwinCons provides a convenient representation of conserved, variable and signature positions as a single score, enabling the structural mapping and visualization of these characteristics. Structure is more conserved than sequence. TwinCons highlights alternative sequences of conserved structures. Using TwinCons, we detected highly similar segments between proteins from the translation and transcription systems. TwinCons detects conserved residues within regions of high functional importance for the ribosomal RNA (rRNA) and demonstrates that signatures are not confined to specific regions but are distributed across the rRNA structure. The ability to evaluate both nucleic acid and protein alignments allows TwinCons to be used in combined sequence and structural analysis of signatures and conservation in rRNA and in ribosomal proteins (rProteins). TwinCons detects a strong sequence conservation signal between bacterial and archaeal rProteins related by circular permutation. This conserved sequence is structurally colocalized with conserved rRNA, indicated by TwinCons scores of rRNA alignments of bacterial and archaeal groups. This combined analysis revealed deep co-evolution of rRNA and rProtein buried within the deepest branching points in the tree of life.
我们开发了 TwinCons 程序,用于检测蛋白质或核酸的深层祖先的嘈杂信号。作为输入,该程序使用包含预定义组的组合比对,并在比对的每个位置通过数学方法确定将一个组转换为另一个组的“成本”。输出区分保守、可变和特征位置。特征在组内保守但在组间不同。该方法自动检测比对中连续的特征(段)。TwinCons 将保守、可变和特征位置的表示作为单个分数,从而能够对这些特征进行结构映射和可视化。结构比序列更保守。TwinCons 突出显示保守结构的替代序列。使用 TwinCons,我们在来自翻译和转录系统的蛋白质之间检测到高度相似的片段。TwinCons 在核糖体 RNA(rRNA) 高功能重要区域内检测到保守残基,并表明特征不仅限于特定区域,而是分布在 rRNA 结构中。评估核酸和蛋白质比对的能力使 TwinCons 能够用于 rRNA 和核糖体蛋白(rProtein) 中特征和保守性的组合序列和结构分析。TwinCons 在通过循环排列相关的细菌和古细菌 rProtein 之间检测到强烈的序列保守信号。这种保守序列在结构上与保守的 rRNA 共定位,这表明细菌和古细菌组的 rRNA 比对的 TwinCons 分数。这种组合分析揭示了 rRNA 和 rProtein 在生命之树最深的分支点内的深度共同进化。