Suppr超能文献

LCD-Composer:一种直观的、以组成为中心的方法,可实现低复杂性结构域的识别和详细功能映射。

LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.

作者信息

Cascarina Sean M, King David C, Osborne Nishimura Erin, Ross Eric D

机构信息

Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA.

出版信息

NAR Genom Bioinform. 2021 May 26;3(2):lqab048. doi: 10.1093/nargab/lqab048. eCollection 2021 Jun.

Abstract

Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs.

摘要

蛋白质中的低复杂性结构域(LCDs)是主要由一小部分可能的氨基酸组成的区域。LCDs参与了生命各个领域的多种正常和病理过程。现有方法使用信息理论复杂性阈值、与重复区域的序列比对或相对于全蛋白质组频率的氨基酸统计过度代表性来定义LCDs。虽然这些方法已被证明有价值,但它们都是间接量化氨基酸组成,而氨基酸组成是与蛋白质序列复杂性相关的基本且生物学相关的特征。在此,我们提出一种新的计算工具LCD-Composer,它基于氨基酸组成和线性氨基酸分散直接识别LCDs。使用LCD-Composer的默认参数,我们在通过UniProt可获取的所有生物体中识别出简单的LCDs,并以可访问的形式提供所得数据作为一种资源。此外,我们描述了来自生命不同领域的生物体之间的大规模差异,并探索了不同LCD类别中LCD含量极端的生物体。最后,我们通过使用简单和多方面的组成标准识别不同类别的LCDs来说明LCD-Composer可实现的通用性和特异性。我们证明,基于这些多方面标准剖析LCDs的能力增强了LCDs的功能映射和分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8e2/8153834/ce7558ca1155/lqab048fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验