Li Min, Pan XiaoYong, Zeng Tao, Zhang Yu-Hang, Feng Kaiyan, Chen Lei, Huang Tao, Cai Yu-Dong
School of Life Sciences, Shanghai University, Shanghai 200444, China.
Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China.
Biomed Res Int. 2020 Jun 15;2020:6384120. doi: 10.1155/2020/6384120. eCollection 2020.
Among various risk factors for the initiation and progression of cancer, alternative polyadenylation (APA) is a remarkable endogenous contributor that directly triggers the malignant phenotype of cancer cells. APA affects biological processes at a transcriptional level in various ways. As such, APA can be involved in tumorigenesis through gene expression, protein subcellular localization, or transcription splicing pattern. The APA sites and status of different cancer types may have diverse modification patterns and regulatory mechanisms on transcripts. Potential APA sites were screened by applying several machine learning algorithms on a TCGA-APA dataset. First, a powerful feature selection method, minimum redundancy maximum relevancy, was applied on the dataset, resulting in a feature list. Then, the feature list was fed into the incremental feature selection, which incorporated the support vector machine as the classification algorithm, to extract key APA features and build a classifier. The classifier can classify cancer patients into cancer types with perfect performance. The key APA-modified genes had a potential prognosis ability because of their significant power in the survival analysis of TCGA pan-cancer data.
在癌症发生和发展的各种风险因素中,可变聚腺苷酸化(APA)是一个显著的内源性因素,它直接引发癌细胞的恶性表型。APA以多种方式在转录水平上影响生物学过程。因此,APA可通过基因表达、蛋白质亚细胞定位或转录剪接模式参与肿瘤发生。不同癌症类型的APA位点和状态可能对转录本有不同的修饰模式和调控机制。通过在TCGA-APA数据集中应用几种机器学习算法筛选潜在的APA位点。首先,对数据集应用一种强大的特征选择方法——最小冗余最大相关性,得到一个特征列表。然后,将该特征列表输入到增量特征选择中,该方法将支持向量机作为分类算法,以提取关键的APA特征并构建一个分类器。该分类器能够以完美的性能将癌症患者分类到不同的癌症类型中。关键的APA修饰基因因其在TCGA泛癌数据生存分析中的显著能力而具有潜在的预后能力。