Suppr超能文献

piRNA 在基于机器学习的结直肠癌诊断中的应用。

piRNA in Machine-Learning-Based Diagnostics of Colorectal Cancer.

机构信息

CureScience Institute, San Diego, CA 92121, USA.

San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.

出版信息

Molecules. 2024 Sep 11;29(18):4311. doi: 10.3390/molecules29184311.

Abstract

Objective biomarkers are crucial for early diagnosis to promote treatment and raise survival rates for diseases. With the smallest non-coding RNAs-piwi-RNAs (piRNAs)-and their transcripts, we sought to identify if these piRNAs could be used as biomarkers for colorectal cancer (CRC). Using previously published data from serum samples of patients with CRC, 13 differently expressed piRNAs were selected as potential biomarkers. With this data, we developed a machine learning (ML) algorithm and created 1020 different piRNA sequence descriptors. With the Naïve Bayes Multinomial classifier, we were able to isolate the 27 most influential sequence descriptors and achieve an accuracy of 96.4%. To test the validity of our model, we used data from piRBase with known associations with CRC that we did not use to train the ML model. We were able to achieve an accuracy of 85.7% with these new independent data. To further validate our model, we also tested data from unrelated diseases, including piRNAs with a correlation to breast cancer and no proven correlation to CRC. The model scored 44.4% on these piRNAs, showing that it can identify a difference between biomarkers of CRC and biomarkers of other diseases. The final results show that our model is an effective tool for diagnosing colorectal cancer. We believe that in the future, this model will prove useful for colorectal cancer and other diseases diagnostics.

摘要

客观生物标志物对于疾病的早期诊断至关重要,有助于提高治疗效果和生存率。我们利用最小的非编码 RNA—piwi-RNAs(piRNAs)及其转录本,试图确定这些 piRNAs 是否可以作为结直肠癌(CRC)的生物标志物。利用之前发表的 CRC 患者血清样本数据,我们选择了 13 个差异表达的 piRNA 作为潜在的生物标志物。根据这些数据,我们开发了一种机器学习(ML)算法,并创建了 1020 个不同的 piRNA 序列描述符。使用朴素贝叶斯多项式分类器,我们能够分离出 27 个最具影响力的序列描述符,准确率达到 96.4%。为了测试我们模型的有效性,我们使用了 piRBase 中已知与 CRC 相关的数据,但这些数据未用于训练 ML 模型。我们能够使用这些新的独立数据实现 85.7%的准确率。为了进一步验证我们的模型,我们还测试了与无关疾病相关的数据,包括与乳腺癌相关但与 CRC 无明确相关性的 piRNAs。该模型在这些 piRNAs 上的准确率为 44.4%,表明它可以识别 CRC 生物标志物和其他疾病生物标志物之间的差异。最终结果表明,我们的模型是诊断结直肠癌的有效工具。我们相信,在未来,该模型将在结直肠癌和其他疾病的诊断中证明是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c03/11434383/cb23cc67b6df/molecules-29-04311-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验