正例-无标签学习在鉴定与衰老相关基因中的新候选饮食限制相关基因中的应用。

Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genes.

机构信息

LIDIA Group, CITIC, Universidade da Coruña, Campus de Elviña s/n, A Coruña 15071, Spain.

School of Computing, University of Kent, Canterbury CT2 7FS, United Kingdom.

出版信息

Comput Biol Med. 2024 Sep;180:108999. doi: 10.1016/j.compbiomed.2024.108999. Epub 2024 Aug 12.

DOI:10.1016/j.compbiomed.2024.108999

PMID:39137672

Abstract

Dietary Restriction (DR) is one of the most popular anti-ageing interventions; recently, Machine Learning (ML) has been explored to identify potential DR-related genes among ageing-related genes, aiming to minimize costly wet lab experiments needed to expand our knowledge on DR. However, to train a model from positive (DR-related) and negative (non-DR-related) examples, the existing ML approach naively labels genes without known DR relation as negative examples, assuming that lack of DR-related annotation for a gene represents evidence of absence of DR-relatedness, rather than absence of evidence. This hinders the reliability of the negative examples (non-DR-related genes) and the method's ability to identify novel DR-related genes. This work introduces a novel gene prioritization method based on the two-step Positive-Unlabelled (PU) Learning paradigm: using a similarity-based, KNN-inspired approach, our method first selects reliable negative examples among the genes without known DR associations. Then, these reliable negatives and all known positives are used to train a classifier that effectively differentiates DR-related and non-DR-related genes, which is finally employed to generate a more reliable ranking of promising genes for novel DR-relatedness. Our method significantly outperforms (p<0.05) the existing state-of-the-art approach in three predictive accuracy metrics with up to ∼40% lower computational cost in the best case, and we identify 4 new promising DR-related genes (PRKAB1, PRKAB2, IRS2, PRKAG1), all with evidence from the existing literature supporting their potential DR-related role.

摘要

饮食限制 (DR) 是最受欢迎的抗衰老干预措施之一；最近，机器学习 (ML) 已被用于在与衰老相关的基因中识别潜在的 DR 相关基因，旨在最小化扩展我们对 DR 知识所需的昂贵的湿实验室实验。然而，为了从阳性（DR 相关）和阴性（非 DR 相关）示例中训练模型，现有的 ML 方法天真地将没有已知 DR 关系的基因标记为阴性示例，假设缺乏对基因的 DR 相关注释代表缺乏 DR 相关性的证据，而不是缺乏证据。这阻碍了阴性示例（非 DR 相关基因）的可靠性和该方法识别新的 DR 相关基因的能力。本工作介绍了一种基于两步正无标记（PU）学习范例的新型基因优先级方法：使用基于相似性的、受 KNN 启发的方法，我们的方法首先从没有已知 DR 关联的基因中选择可靠的阴性示例。然后，使用这些可靠的阴性示例和所有已知的阳性示例来训练一个分类器，该分类器能够有效地区分 DR 相关和非 DR 相关基因，最后用于生成更可靠的具有新 DR 相关性的有希望的基因排名。我们的方法在三个预测准确性指标上显著优于（p<0.05）现有的最先进方法，在最佳情况下计算成本降低了高达 ∼40%，并且我们确定了 4 个新的有希望的 DR 相关基因（PRKAB1、PRKAB2、IRS2、PRKAG1），所有这些基因都有现有文献的证据支持它们的潜在 DR 相关作用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

正例-无标签学习在鉴定与衰老相关基因中的新候选饮食限制相关基因中的应用。

Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genes.

机构信息

出版信息

相似文献

正例-无标签学习在鉴定与衰老相关基因中的新候选饮食限制相关基因中的应用。

Positive-Unlabelled learning for identifying new candidate Dietary Restriction-related genes among ageing-related genes.

机构信息

出版信息

相似文献