Suppr超能文献

基于 LR-RF 的乳腺癌差异表达基因筛选的高效混合模型

An Efficient Mixed-Model for Screening Differentially Expressed Genes of Breast Cancer Based on LR-RF.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):124-130. doi: 10.1109/TCBB.2018.2829519. Epub 2018 Apr 23.

Abstract

To screen differentially expressed genes quickly and efficiently in breast cancer, two gene microarray datasets of breast cancer, GSE15852 and GSE45255, were downloaded from GEO. By combining the Logistic Regression and Random Forest algorithm, this paper proposed a novel method named LR-RF to select differentially expressed genes of breast cancer on microarray data by the Bonferroni test of FWER error measure. Comparing with Logistic Regression and Random Forest, our study shows that LR-FR has a great facility in selecting differentially expressed genes. The average prediction accuracy of the proposed LR-RF from replicating random test 10 times surprisingly reaches 93.11 percent with variance as low as 0.00045. The prediction accuracy rate reaches a maximum 95.57 percent when threshold value α = 0.2 in the random forest algorithm process of ranking genes' importance score, and the differentially expressed genes are relatively few in number. In addition, through analyzing the gene interaction networks, most of the top 20 genes we selected were found to involve in the development of breast cancer. All of these results demonstrate the reliability and efficiency of LR-RF. It is anticipated that LR-RF would provide new knowledge and method for biologists, medical scientists, and cognitive computing researchers to identify disease-related genes of breast cancer.

摘要

为了快速有效地筛选乳腺癌中的差异表达基因,本研究从 GEO 下载了两个乳腺癌基因芯片数据集 GSE15852 和 GSE45255。通过结合 Logistic 回归和随机森林算法,本文提出了一种名为 LR-RF 的新方法,该方法通过 FWER 错误度量的 Bonferroni 检验来选择基因芯片数据中的乳腺癌差异表达基因。与 Logistic 回归和随机森林相比,我们的研究表明 LR-FR 在选择差异表达基因方面具有很大的优势。从重复随机测试 10 次中得出的建议 LR-RF 的平均预测准确率令人惊讶地达到了 93.11%,方差低至 0.00045。当随机森林算法中基因重要性评分排序的阈值α=0.2 时,预测准确率达到最大值 95.57%,并且差异表达基因的数量相对较少。此外,通过分析基因相互作用网络,我们发现所选择的前 20 个基因中的大多数都与乳腺癌的发展有关。所有这些结果都证明了 LR-RF 的可靠性和效率。预计 LR-RF 将为生物学家、医学科学家和认知计算研究人员提供识别乳腺癌相关基因的新知识和方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验