利用蛋白质三维二级结构进行转录因子预测。

Transcription factor prediction using protein 3D secondary structures.

作者信息

Liebold Jeanine, Neuhaus Fabian, Geiser Janina, Kurtz Stefan, Baumbach Jan, Newaz Khalique

机构信息

Institute for Computational Systems Biology, Universität Hamburg, Hamburg 22761, Germany.

Faculty of Mathematics, Informatics and Natural Sciences, ZBH-Center for Bioinformatics, Universität Hamburg, Hamburg 22761, Germany.

出版信息

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae762.

DOI:10.1093/bioinformatics/btae762

PMID:39786868

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11769678/

Abstract

MOTIVATION

Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Traditional methods predict a protein as a TF if the protein contains any DNA-binding domains (DBDs) of known TFs. However, this approach fails to identify a novel TF that does not contain any known DBDs. Recently proposed TF prediction methods do not rely on DBDs. Such methods use features of protein sequences to train a machine learning model, and then use the trained model to predict whether a protein is a TF or not. Because the 3-dimensional (3D) structure of a protein captures more information than its sequence, using 3D protein structures will likely allow for more accurate prediction of novel TFs.

RESULTS

We propose a deep learning-based TF prediction method (StrucTFactor), which is the first method to utilize 3D secondary structural information of proteins. We compare StrucTFactor with recent state-of-the-art TF prediction methods based on ∼525 000 proteins across 12 datasets, capturing different aspects of data bias (including sequence redundancy) possibly influencing a method's performance. We find that StrucTFactor significantly (P-value < 0.001) outperforms the existing TF prediction methods, improving the performance over its closest competitor by up to 17% based on Matthews correlation coefficient.

AVAILABILITY AND IMPLEMENTATION

Data and source code are available at https://github.com/lieboldj/StrucTFactor and on our website at https://apps.cosy.bio/StrucTFactor.

摘要

动机

转录因子（TFs）是调节基因表达的DNA结合蛋白。传统方法如果一个蛋白质包含任何已知转录因子的DNA结合结构域（DBDs），就将其预测为转录因子。然而，这种方法无法识别不包含任何已知DBD的新型转录因子。最近提出的转录因子预测方法不依赖于DBD。此类方法利用蛋白质序列的特征来训练机器学习模型，然后使用训练好的模型来预测一个蛋白质是否为转录因子。由于蛋白质的三维（3D）结构比其序列捕获的信息更多，使用蛋白质的3D结构可能会更准确地预测新型转录因子。

结果

我们提出了一种基于深度学习的转录因子预测方法（StrucTFactor），这是第一种利用蛋白质3D二级结构信息的方法。我们将StrucTFactor与最近基于12个数据集中约525000种蛋白质的最先进转录因子预测方法进行比较，涵盖可能影响方法性能的数据偏差（包括序列冗余）的不同方面。我们发现StrucTFactor显著（P值<0.001）优于现有的转录因子预测方法，基于马修斯相关系数，其性能比最接近的竞争对手提高了多达17%。

可用性和实现方式

数据和源代码可在https://github.com/lieboldj/StrucTFactor以及我们的网站https://apps.cosy.bio/StrucTFactor上获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用蛋白质三维二级结构进行转录因子预测。

Transcription factor prediction using protein 3D secondary structures.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

本文引用的文献

利用蛋白质三维二级结构进行转录因子预测。

Transcription factor prediction using protein 3D secondary structures.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

本文引用的文献