Liu Xiangju, Zhang Yu, Fu Chunli, Zhang Ruochi, Zhou Fengfeng
Department of Geriatric Medicine & Shandong Key Laboratory Cardiovascular Proteomics, Qilu Hospital of Shandong University, Jinan, China.
College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.
Front Genet. 2021 Apr 27;12:636429. doi: 10.3389/fgene.2021.636429. eCollection 2021.
Pulmonary hypertension (PH) is a common disease that affects the normal functioning of the human pulmonary arteries. The peripheral blood mononuclear cells (PMBCs) served as an ideal source for a minimally invasive disease diagnosis. This study hypothesized that the transcriptional fluctuations in the PMBCs exposed to the PH arteries may stably reflect the disease. However, the dimension of a human transcriptome is much higher than the number of samples in all the existing datasets. So, an ensemble feature selection algorithm, EnRank, was proposed to integrate the ranking information of four popular feature selection algorithms, i.e., T-test (Ttest), Chi-squared test (Chi2), ridge regression (Ridge), and Least Absolute Shrinkage and Selection Operator (Lasso). Our results suggested that the EnRank-detected biomarkers provided useful information from these four feature selection algorithms and achieved very good prediction accuracy in predicting the PH patients. Many of the EnRank-detected biomarkers were also supported by the literature.
肺动脉高压(PH)是一种影响人体肺动脉正常功能的常见疾病。外周血单个核细胞(PMBCs)是微创疾病诊断的理想来源。本研究假设,暴露于PH动脉的PMBCs中的转录波动可能稳定反映该疾病。然而,人类转录组的维度远高于所有现有数据集中的样本数量。因此,提出了一种集成特征选择算法EnRank,以整合四种流行特征选择算法的排名信息,即t检验(Ttest)、卡方检验(Chi2)、岭回归(Ridge)和最小绝对收缩和选择算子(Lasso)。我们的结果表明,EnRank检测到的生物标志物从这四种特征选择算法中提供了有用信息,并在预测PH患者方面取得了非常好的预测准确性。许多EnRank检测到的生物标志物也得到了文献的支持。