利用特征选择方法从混合单细胞测序数据中鉴定2型糖尿病生物标志物
Identification of Type 2 Diabetes Biomarkers From Mixed Single-Cell Sequencing Data With Feature Selection Methods.
作者信息
Li Zhandong, Pan Xiaoyong, Cai Yu-Dong
机构信息
College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China.
Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai, China.
出版信息
Front Bioeng Biotechnol. 2022 Jun 2;10:890901. doi: 10.3389/fbioe.2022.890901. eCollection 2022.
Diabetes is the most common disease and a major threat to human health. Type 2 diabetes (T2D) makes up about 90% of all cases. With the development of high-throughput sequencing technologies, more and more fundamental pathogenesis of T2D at genetic and transcriptomic levels has been revealed. The recent single-cell sequencing can further reveal the cellular heterogenicity of complex diseases in an unprecedented way. With the expectation on the molecular essence of T2D across multiple cell types, we investigated the expression profiling of more than 1,600 single cells (949 cells from T2D patients and 651 cells from normal controls) and identified the differential expression profiling and characteristics at the transcriptomics level that can distinguish such two groups of cells at the single-cell level. The expression profile was analyzed by several machine learning algorithms, including Monte Carlo feature selection, support vector machine, and repeated incremental pruning to produce error reduction (RIPPER). On one hand, some T2D-associated genes (MTND4P24, MTND2P28, and LOC100128906) were discovered. On the other hand, we revealed novel potential pathogenic mechanisms in a rule manner. They are induced by newly recognized genes and neglected by traditional bulk sequencing techniques. Particularly, the newly identified T2D genes were shown to follow specific quantitative rules with diabetes prediction potentials, and such rules further indicated several potential functional crosstalks involved in T2D.
糖尿病是最常见的疾病,也是对人类健康的重大威胁。2型糖尿病(T2D)约占所有病例的90%。随着高通量测序技术的发展,越来越多T2D在遗传和转录组水平的基本发病机制被揭示。最近的单细胞测序能够以前所未有的方式进一步揭示复杂疾病的细胞异质性。基于对跨多种细胞类型的T2D分子本质的期望,我们研究了1600多个单细胞(949个来自T2D患者的细胞和651个来自正常对照的细胞)的表达谱,并确定了在转录组水平上能够在单细胞水平区分这两组细胞的差异表达谱和特征。通过几种机器学习算法对表达谱进行分析,包括蒙特卡罗特征选择、支持向量机和重复增量剪枝以减少错误(RIPPER)。一方面,发现了一些与T2D相关的基因(MTND4P24、MTND2P28和LOC100128906)。另一方面,我们以一种规则的方式揭示了新的潜在致病机制。它们由新识别的基因诱导,而被传统的批量测序技术所忽视。特别是,新鉴定的T2D基因显示出遵循具有糖尿病预测潜力的特定定量规则,并且这些规则进一步表明了一些参与T2D的潜在功能串扰。