Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long,Taipa, Macau, 999078, China.
Institute for Information and System Sciences and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an Shaan'xi, 710049, China.
Sci Rep. 2017 Oct 12;7(1):13053. doi: 10.1038/s41598-017-13133-5.
Gene selection is an attractive and important task in cancer survival analysis. Most existing supervised learning methods can only use the labeled biological data, while the censored data (weakly labeled data) far more than the labeled data are ignored in model building. Trying to utilize such information in the censored data, a semi-supervised learning framework (Cox-AFT model) combined with Cox proportional hazard (Cox) and accelerated failure time (AFT) model was used in cancer research, which has better performance than the single Cox or AFT model. This method, however, is easily affected by noise. To alleviate this problem, in this paper we combine the Cox-AFT model with self-paced learning (SPL) method to more effectively employ the information in the censored data in a self-learning way. SPL is a kind of reliable and stable learning mechanism, which is recently proposed for simulating the human learning process to help the AFT model automatically identify and include samples of high confidence into training, minimizing interference from high noise. Utilizing the SPL method produces two direct advantages: (1) The utilization of censored data is further promoted; (2) the noise delivered to the model is greatly decreased. The experimental results demonstrate the effectiveness of the proposed model compared to the traditional Cox-AFT model.
基因选择是癌症生存分析中一项有吸引力且重要的任务。大多数现有的监督学习方法只能使用标记的生物数据,而在模型构建中忽略了大量的删失数据(弱标记数据)。为了尝试利用删失数据中的此类信息,我们在癌症研究中使用了一种结合 Cox 比例风险(Cox)和加速失效时间(AFT)模型的半监督学习框架(Cox-AFT 模型),其性能优于单一的 Cox 或 AFT 模型。然而,该方法容易受到噪声的影响。为了解决这个问题,本文将 Cox-AFT 模型与自步学习(SPL)方法相结合,以更有效地以自学习的方式利用删失数据中的信息。SPL 是一种可靠且稳定的学习机制,最近被提出用于模拟人类学习过程,以帮助 AFT 模型自动识别和包含高置信度的样本进入训练,从而最大程度地减少来自高噪声的干扰。利用 SPL 方法产生了两个直接优势:(1)进一步促进了删失数据的利用;(2)大大降低了传递给模型的噪声。实验结果表明,与传统的 Cox-AFT 模型相比,所提出的模型更有效。