Chen Wei, Zhou Haiyan, Zhang Mingyu, Shi Yafei, Li Taifeng, Qian Di, Yang Jun, Yu Feng, Li Guohui
School of Basic Medicine and Clinical Pharmacy China Pharmaceutical University Nanjing China.
Pharmacy Department, National Cancer Center/Cancer Hospital Chinese Academy of Medical Sciences and Peking Union Medical College Beijing China.
Cancer Innov. 2024 May 12;3(4):e110. doi: 10.1002/cai2.110. eCollection 2024 Aug.
The rate at which the anticancer drug paclitaxel is cleared from the body markedly impacts its dosage and chemotherapy effectiveness. Importantly, paclitaxel clearance varies among individuals, primarily because of genetic polymorphisms. This metabolic variability arises from a nonlinear process that is influenced by multiple single nucleotide polymorphisms (SNPs). Conventional bioinformatics methods struggle to accurately analyze this complex process and, currently, there is no established efficient algorithm for investigating SNP interactions.
We developed a novel machine-learning approach called GEP-CSIs data mining algorithm. This algorithm, an advanced version of GEP, uses linear algebra computations to handle discrete variables. The GEP-CSI algorithm calculates a fitness function score based on paclitaxel clearance data and genetic polymorphisms in patients with nonsmall cell lung cancer. The data were divided into a primary set and a validation set for the analysis.
We identified and validated 1184 three-SNP combinations that had the highest fitness function values. Notably, , and were found to indirectly influence paclitaxel clearance by coordinating the activity of genes previously reported to be significant in paclitaxel clearance. Particularly intriguing was the discovery of a combination of three SNPs in genes , and . These SNPs-related proteins were confirmed to interact with each other in the protein-protein interaction network, which formed the basis for further exploration of their functional roles and mechanisms.
We successfully developed an effective deep-learning algorithm tailored for the nuanced mining of SNP interactions, leveraging data on paclitaxel clearance and individual genetic polymorphisms.
抗癌药物紫杉醇从体内清除的速率对其剂量和化疗效果有显著影响。重要的是,紫杉醇清除率在个体之间存在差异,主要是由于基因多态性。这种代谢变异性源于一个受多个单核苷酸多态性(SNP)影响的非线性过程。传统的生物信息学方法难以准确分析这一复杂过程,目前尚无用于研究SNP相互作用的成熟有效算法。
我们开发了一种名为GEP-CSIs数据挖掘算法的新型机器学习方法。该算法是GEP的高级版本,使用线性代数计算来处理离散变量。GEP-CSI算法根据非小细胞肺癌患者的紫杉醇清除数据和基因多态性计算适应度函数得分。数据被分为一个主要集和一个验证集用于分析。
我们鉴定并验证了1184个具有最高适应度函数值的三SNP组合。值得注意的是,发现 、 和 通过协调先前报道在紫杉醇清除中具有重要意义的基因的活性间接影响紫杉醇清除。特别有趣的是在基因 、 和 中发现了一个三SNP组合。这些与SNP相关的蛋白质在蛋白质-蛋白质相互作用网络中被证实相互作用,这为进一步探索它们的功能作用和机制奠定了基础。
我们成功开发了一种有效的深度学习算法,专门用于对SNP相互作用进行细致挖掘,利用了紫杉醇清除数据和个体基因多态性。