Bitard-Feildel Tristan, Heberlein Magdalena, Bornberg-Bauer Erich, Callebaut Isabelle
Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, D-48149, Germany.
Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Huefferstrasse 1, D-48149, Germany.
Biochimie. 2015 Dec;119:244-53. doi: 10.1016/j.biochi.2015.02.019. Epub 2015 Feb 28.
Comparative genomics has become an important strategy in life science research. While many genes, and the proteins they code for, can be well characterized by assigning orthologs, a significant amount of proteins or domains remain obscure "orphans". Some orphans are overlooked by current computational methods because they rapidly diverged, others emerged relatively recently (de novo). Recent research has demonstrated the importance of orphans, and of de novo proteins and domains for development of new phenotypic traits and adaptation. New approaches for detecting novel domains are thus of paramount importance.
The hydrophobic cluster analysis (HCA) method delineates globular-like domains from the information of a protein sequence and thereby allows bypassing some of the established methods limitations based on conserved sequence similarity. In this study, HCA is tested for orphan domain detection on 12 Drosophila genomes. After their detection, the oprhan domains are classified into two categories, depending on their presence/absence in distantly related species. The two categories show significantly different physico-chemical properties when compared to previously characterized domains from the Pfam database. The newly detected domains have a higher degree of intrinsic disorder and a particular hydrophobic cluster composition. The older the domains are, the more similar their hydrophobic cluster content is to the cluster content of Pfam domains. The results suggest that, over time, newly created domains acquire a canonical set of hydrophobic clusters but conserve some features of intrinsically disordered regions.
Our results agree with previous findings on orphan domains and suggest that the physico-chemical properties of domains change over evolutionary long time scale. The presented HCA-based method is able to detect domains with unusual properties without relying on prior knowledge, such as the availability of homologs. Therefore, the method has large potential for complementing existing strategies to annotate genomes, and for better understanding how molecular features emerge.
比较基因组学已成为生命科学研究中的一项重要策略。虽然许多基因及其编码的蛋白质可以通过指定直系同源物得到很好的表征,但仍有大量蛋白质或结构域仍是不明确的“孤儿”。一些孤儿因快速分化而被当前的计算方法忽视,另一些则是相对较新出现的(从头产生)。最近的研究表明,孤儿以及从头产生的蛋白质和结构域对于新表型特征的发展和适应具有重要意义。因此,检测新结构域的新方法至关重要。
疏水簇分析(HCA)方法从蛋白质序列信息中描绘出球状结构域,从而能够绕过一些基于保守序列相似性的现有方法的局限性。在本研究中,对12个果蝇基因组进行了HCA用于孤儿结构域检测的测试。检测到孤儿结构域后,根据它们在远缘物种中的存在与否将其分为两类。与Pfam数据库中先前表征的结构域相比,这两类结构域显示出显著不同的物理化学性质。新检测到的结构域具有更高程度的内在无序性和特定的疏水簇组成。结构域越古老,其疏水簇含量与Pfam结构域的簇含量就越相似。结果表明,随着时间的推移,新产生的结构域获得了一组典型的疏水簇,但保留了一些内在无序区域的特征。
我们的结果与先前关于孤儿结构域的发现一致,并表明结构域的物理化学性质在进化的长时间尺度上会发生变化。所提出的基于HCA的方法能够在不依赖先验知识(如同源物的可用性)的情况下检测具有异常性质的结构域。因此,该方法在补充现有基因组注释策略以及更好地理解分子特征如何出现方面具有很大潜力。