Johnson Kim L, Cassin Andrew M, Lonsdale Andrew, Bacic Antony, Doblin Monika S, Schultz Carolyn J
Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and.
School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.).
Plant Physiol. 2017 Jun;174(2):886-903. doi: 10.1104/pp.17.00294. Epub 2017 Apr 26.
Intrinsically disordered proteins (IDPs) are functional proteins that lack a well-defined three-dimensional structure. The study of IDPs is a rapidly growing area as the crucial biological functions of more of these proteins are uncovered. In plants, IDPs are implicated in plant stress responses, signaling, and regulatory processes. A superfamily of cell wall proteins, the hydroxyproline-rich glycoproteins (HRGPs), have characteristic features of IDPs. Their protein backbones are rich in the disordering amino acid proline, they contain repeated sequence motifs and extensive posttranslational modifications (glycosylation), and they have been implicated in many biological functions. HRGPs are evolutionarily ancient, having been isolated from the protein-rich walls of chlorophyte algae to the cellulose-rich walls of embryophytes. Examination of HRGPs in a range of plant species should provide valuable insights into how they have evolved. Commonly divided into the arabinogalactan proteins, extensins, and proline-rich proteins, in reality, a continuum of structures exists within this diverse and heterogenous superfamily. An inability to accurately classify HRGPs leads to inconsistent gene ontologies limiting the identification of HRGP classes in existing and emerging omics data sets. We present a novel and robust motif and amino acid bias (MAAB) bioinformatics pipeline to classify HRGPs into 23 descriptive subclasses. Validation of MAAB was achieved using available genomic resources and then applied to the 1000 Plants transcriptome project (www.onekp.com) data set. Significant improvement in the detection of HRGPs using multiple-mer transcriptome assembly methodology was observed. The MAAB pipeline is readily adaptable and can be modified to optimize the recovery of IDPs from other organisms.
内在无序蛋白(IDP)是缺乏明确三维结构的功能性蛋白。随着越来越多这类蛋白的关键生物学功能被发现,对IDP的研究成为一个快速发展的领域。在植物中,IDP参与植物应激反应、信号传导和调控过程。细胞壁蛋白的一个超家族,即富含羟脯氨酸的糖蛋白(HRGP),具有IDP的特征。它们的蛋白质主链富含导致无序的氨基酸脯氨酸,包含重复序列基序和广泛的翻译后修饰(糖基化),并且它们参与了许多生物学功能。HRGP在进化上很古老,从绿藻富含蛋白质的细胞壁到胚植物富含纤维素的细胞壁中都有发现。对一系列植物物种中的HRGP进行研究,应该能为它们的进化方式提供有价值的见解。HRGP通常分为阿拉伯半乳聚糖蛋白、伸展蛋白和富含脯氨酸的蛋白,实际上,在这个多样且异质的超家族中存在着连续的结构。无法准确分类HRGP会导致基因本体不一致,限制了在现有和新出现的组学数据集中识别HRGP类别。我们提出了一种新颖且强大的基序和氨基酸偏好(MAAB)生物信息学流程,将HRGP分类为23个描述性子类。利用现有的基因组资源对MAAB进行了验证,然后将其应用于千种植物转录组计划(www.onekp.com)数据集。使用多聚体转录组组装方法检测HRGP时,观察到了显著改进。MAAB流程易于调整,可以进行修改以优化从其他生物体中回收IDP。