CureScience Institute, San Diego, CA 92121, USA.
San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.
Molecules. 2024 Sep 11;29(18):4311. doi: 10.3390/molecules29184311.
Objective biomarkers are crucial for early diagnosis to promote treatment and raise survival rates for diseases. With the smallest non-coding RNAs-piwi-RNAs (piRNAs)-and their transcripts, we sought to identify if these piRNAs could be used as biomarkers for colorectal cancer (CRC). Using previously published data from serum samples of patients with CRC, 13 differently expressed piRNAs were selected as potential biomarkers. With this data, we developed a machine learning (ML) algorithm and created 1020 different piRNA sequence descriptors. With the Naïve Bayes Multinomial classifier, we were able to isolate the 27 most influential sequence descriptors and achieve an accuracy of 96.4%. To test the validity of our model, we used data from piRBase with known associations with CRC that we did not use to train the ML model. We were able to achieve an accuracy of 85.7% with these new independent data. To further validate our model, we also tested data from unrelated diseases, including piRNAs with a correlation to breast cancer and no proven correlation to CRC. The model scored 44.4% on these piRNAs, showing that it can identify a difference between biomarkers of CRC and biomarkers of other diseases. The final results show that our model is an effective tool for diagnosing colorectal cancer. We believe that in the future, this model will prove useful for colorectal cancer and other diseases diagnostics.
客观生物标志物对于疾病的早期诊断至关重要,有助于提高治疗效果和生存率。我们利用最小的非编码 RNA—piwi-RNAs(piRNAs)及其转录本,试图确定这些 piRNAs 是否可以作为结直肠癌(CRC)的生物标志物。利用之前发表的 CRC 患者血清样本数据,我们选择了 13 个差异表达的 piRNA 作为潜在的生物标志物。根据这些数据,我们开发了一种机器学习(ML)算法,并创建了 1020 个不同的 piRNA 序列描述符。使用朴素贝叶斯多项式分类器,我们能够分离出 27 个最具影响力的序列描述符,准确率达到 96.4%。为了测试我们模型的有效性,我们使用了 piRBase 中已知与 CRC 相关的数据,但这些数据未用于训练 ML 模型。我们能够使用这些新的独立数据实现 85.7%的准确率。为了进一步验证我们的模型,我们还测试了与无关疾病相关的数据,包括与乳腺癌相关但与 CRC 无明确相关性的 piRNAs。该模型在这些 piRNAs 上的准确率为 44.4%,表明它可以识别 CRC 生物标志物和其他疾病生物标志物之间的差异。最终结果表明,我们的模型是诊断结直肠癌的有效工具。我们相信,在未来,该模型将在结直肠癌和其他疾病的诊断中证明是有用的。