利用进化进行对比学习来发现无序区域的分子特征。

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning.

机构信息

Department of Computer Science, University of Toronto, Toronto, Canada.

Department of Cell and Systems Biology, University of Toronto, Toronto, Canada.

出版信息

PLoS Comput Biol. 2022 Jun 29;18(6):e1010238. doi: 10.1371/journal.pcbi.1010238. eCollection 2022 Jun.

DOI:10.1371/journal.pcbi.1010238

PMID:35767567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9275697/

Abstract

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.

摘要

广泛存在于蛋白质组中的无规则区域（IDRs），其特征难以确定，这是一个主要的挑战，因为相对而言，人们对这些区域的功能了解较少。这里，我们引入了一种针对 IDRs 的蛋白质组规模特征发现方法。我们的方法称为“反向同源性”，利用了重要功能特征在进化中保守的原理。我们将其用作深度学习的对比学习信号：给定一组同源 IDR，神经网络必须从蛋白质组中随机采样的另一组 IDR 中正确选择一个保留的同源物。我们将反向同源性与简单的架构和标准解释技术相结合，并表明该网络可以学习 IDR 的保守特征，这些特征可以解释为基序、重复序列或电荷或氨基酸倾向等整体特征。我们还表明，我们的模型可用于生成对 IDR 功能最重要的残基和区域的可视化效果，从而为未表征的 IDR 生成假说。我们的结果表明，使用无监督神经网络进行特征发现是深入了解理解较少的蛋白质序列的一种有前途的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6976/9275697/229edc80b9e5/pcbi.1010238.g001.jpg

相似文献

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning.利用进化进行对比学习来发现无序区域的分子特征。

PLoS Comput Biol. 2022 Jun 29;18(6):e1010238. doi: 10.1371/journal.pcbi.1010238. eCollection 2022 Jun.

Conformational ensembles of the human intrinsically disordered proteome.人类内在无序蛋白质组的构象集合

Nature. 2024 Feb;626(8000):897-904. doi: 10.1038/s41586-023-07004-5. Epub 2024 Jan 31.

IFF: Identifying key residues in intrinsically disordered regions of proteins using machine learning.IFF：使用机器学习鉴定蛋白质无规则卷曲区域的关键残基

Protein Sci. 2023 Sep;32(9):e4739. doi: 10.1002/pro.4739.

SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences.SHARK 能够在不可比对和无序序列中灵敏地检测进化同源物和功能类似物。

Proc Natl Acad Sci U S A. 2024 Oct 15;121(42):e2401622121. doi: 10.1073/pnas.2401622121. Epub 2024 Oct 9.

Functional Tuning of Intrinsically Disordered Regions in Human Proteins by Composition Bias.通过组成偏见对人类蛋白质中的无规卷曲区域进行功能调节。

Biomolecules. 2022 Oct 15;12(10):1486. doi: 10.3390/biom12101486.

Evolutionary analyses of intrinsically disordered regions reveal widespread signals of conservation.进化分析揭示了无规则区域广泛的保守信号。

PLoS Comput Biol. 2024 Apr 25;20(4):e1012028. doi: 10.1371/journal.pcbi.1012028. eCollection 2024 Apr.

Identifying molecular features that are associated with biological function of intrinsically disordered protein regions.识别与内在无序蛋白质区域的生物学功能相关的分子特征。

Elife. 2021 Feb 22;10:e60220. doi: 10.7554/eLife.60220.

Computational Methods to Predict Intrinsically Disordered Regions and Functional Regions in Them.计算方法预测蛋白质内无序区及其功能区。

Methods Mol Biol. 2023;2627:231-245. doi: 10.1007/978-1-0716-2974-1_13.

Towards Decoding the Sequence-Based Grammar Governing the Functions of Intrinsically Disordered Protein Regions.探索基于序列的语法，以揭示无规则蛋白区域功能的奥秘。

J Mol Biol. 2021 Jun 11;433(12):166724. doi: 10.1016/j.jmb.2020.11.023. Epub 2020 Nov 26.

DisEnrich: database of enriched regions in human dark proteome.DisEnrich：人类暗蛋白质组中富集区域的数据库。

Bioinformatics. 2022 Mar 28;38(7):1870-1876. doi: 10.1093/bioinformatics/btac051.

引用本文的文献

Protein Language Model Identifies Disordered, Conserved Motifs Implicated in Phase Separation.蛋白质语言模型识别出与相分离相关的无序保守基序。

bioRxiv. 2025 Jul 23:2024.12.12.628175. doi: 10.1101/2024.12.12.628175.

Sequence-based prediction of condensate composition reveals that specificity can emerge from multivalent interactions among disordered regions.基于序列的凝聚物成分预测表明，特异性可源自无序区域之间的多价相互作用。

bioRxiv. 2025 Jun 18:2025.06.13.659429. doi: 10.1101/2025.06.13.659429.

Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization.通过对比优化增强基因组分析中的核苷酸序列表示。

Commun Biol. 2025 Mar 29;8(1):517. doi: 10.1038/s42003-025-07902-6.

Evaluation of predictions of disordered binding regions in the CAID2 experiment.CAID2实验中无序结合区域预测的评估。

Comput Struct Biotechnol J. 2024 Dec 17;27:78-88. doi: 10.1016/j.csbj.2024.12.009. eCollection 2025.

Proc Natl Acad Sci U S A. 2024 Oct 15;121(42):e2401622121. doi: 10.1073/pnas.2401622121. Epub 2024 Oct 9.

Beyond monopole electrostatics in regulating conformations of intrinsically disordered proteins.超越单极静电在调节内在无序蛋白质构象中的作用。

PNAS Nexus. 2024 Aug 27;3(9):pgae367. doi: 10.1093/pnasnexus/pgae367. eCollection 2024 Sep.

PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions.PairK：用于量化无序区域中蛋白质基序保守性的成对k-mer比对

bioRxiv. 2024 Jul 24:2024.07.23.604860. doi: 10.1101/2024.07.23.604860.

Direct prediction of intermolecular interactions driven by disordered regions.由无序区域驱动的分子间相互作用的直接预测

bioRxiv. 2024 Jun 3:2024.06.03.597104. doi: 10.1101/2024.06.03.597104.

Conserved molecular recognition by an intrinsically disordered region in the absence of sequence conservation.在缺乏序列保守性的情况下，由一个内在无序区域进行的保守分子识别。

Res Sq. 2024 Jun 3:rs.3.rs-4477977. doi: 10.21203/rs.3.rs-4477977/v1.

Preserving condensate structure and composition by lowering sequence complexity.通过降低序列复杂度来保持冷凝物的结构和组成。

Biophys J. 2024 Jul 2;123(13):1815-1826. doi: 10.1016/j.bpj.2024.05.026. Epub 2024 May 31.

本文引用的文献

Poly(A)-binding protein is an ataxin-2 chaperone that regulates biomolecular condensates.多聚（A）结合蛋白是一种 ataxin-2 伴侣蛋白，可调节生物分子凝聚物。

Mol Cell. 2023 Jun 15;83(12):2020-2034.e6. doi: 10.1016/j.molcel.2023.05.025. Epub 2023 Jun 8.

Deciphering how naturally occurring sequence features impact the phase behaviours of disordered prion-like domains.解析天然序列特征如何影响无规则朊病毒样结构域的相行为。

Nat Chem. 2022 Feb;14(2):196-207. doi: 10.1038/s41557-021-00840-w. Epub 2021 Dec 20.

The length scale of multivalent interactions is evolutionarily conserved in fungal and vertebrate phase-separating proteins.多价相互作用的长度尺度在真菌和脊椎动物的相分离蛋白中是进化保守的。

Genetics. 2022 Jan 4;220(1). doi: 10.1093/genetics/iyab184.

Effective gene expression prediction from sequence by integrating long-range interactions.通过整合长程相互作用，从序列中有效预测基因表达。

Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.

On the Potential of Machine Learning to Examine the Relationship Between Sequence, Structure, Dynamics and Function of Intrinsically Disordered Proteins.基于机器学习研究无规卷曲蛋白序列、结构、动力学与功能关系的潜力

J Mol Biol. 2021 Oct 1;433(20):167196. doi: 10.1016/j.jmb.2021.167196. Epub 2021 Aug 12.

flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions.flDPnn：利用无序功能的假定倾向进行准确的固有无序预测。

Nat Commun. 2021 Jul 21;12(1):4438. doi: 10.1038/s41467-021-24773-7.

Simple biochemical features underlie transcriptional activation domain diversity and dynamic, fuzzy binding to Mediator.简单的生化特征为转录激活结构域的多样性以及与中介体的动态、模糊结合奠定了基础。

Elife. 2021 Apr 27;10:e68068. doi: 10.7554/eLife.68068.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

Critical assessment of protein intrinsic disorder prediction.蛋白质固有无序预测的关键评估。

Nat Methods. 2021 May;18(5):472-481. doi: 10.1038/s41592-021-01117-3. Epub 2021 Apr 19.

Identifying molecular features that are associated with biological function of intrinsically disordered protein regions.识别与内在无序蛋白质区域的生物学功能相关的分子特征。

Elife. 2021 Feb 22;10:e60220. doi: 10.7554/eLife.60220.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用进化进行对比学习来发现无序区域的分子特征。

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献