基于 HPV 和人类综合基因组特征的捕获测序和机器学习方法对宫颈病变进行风险分层。

Risk stratification of cervical lesions using capture sequencing and machine learning method based on HPV and human integrated genomic profiles.

机构信息

Department of Obstetrics and Gynecology, Precision Medicine Institute, Sun Yat-sen University, Yuexiu, Guangzhou, Guangdong, China.

Department of Neurology, The First Affiliated Hospital, Sun Yat-sen University, Yuexiu, Guangzhou, Guangdong, China.

出版信息

Carcinogenesis. 2019 Oct 16;40(10):1220-1228. doi: 10.1093/carcin/bgz094.

Abstract

From initial human papillomavirus (HPV) infection and precursor stages, the development of cervical cancer takes decades. High-sensitivity HPV DNA testing is currently recommended as primary screening method for cervical cancer, whereas better triage methodologies are encouraged to provide accurate risk management for HPV-positive women. Given that virus-driven genomic variation accumulates during cervical carcinogenesis, we designed a 39 Mb custom capture panel targeting 17 HPV types and 522 mutant genes related to cervical cancer. Using capture-based next-generation sequencing, HPV integration status, somatic mutation and copy number variation were analyzed on 34 paired samples, including 10 cases of HPV infection (HPV+), 10 cases of cervical intraepithelial neoplasia (CIN) grade and 14 cases of CIN2+ (CIN2: n = 1; CIN2-3: n = 3; CIN3: n = 9; squamous cell carcinoma: n = 1). Finally, the machine learning algorithm (Random Forest) was applied to build the risk stratification model for cervical precursor lesions based on CIN2+ enriched biomarkers. Generally, HPV integration events (11 in HPV+, 25 in CIN1 and 56 in CIN2+), non-synonymous mutations (2 in CIN1, 12 in CIN2+) and copy number variations (19.1 in HPV+, 29.4 in CIN1 and 127 in CIN2+) increased from HPV+ to CIN2+. Interestingly, 'common' deletion of mitochondrial chromosome was significantly observed in CIN2+ (P = 0.009). Together, CIN2+ enriched biomarkers, classified as HPV information, mutation, amplification, deletion and mitochondrial change, successfully predicted CIN2+ with average accuracy probability score of 0.814, and amplification and deletion ranked as the most important features. Our custom capture sequencing combined with machine learning method effectively stratified the risk of cervical lesions and provided valuable integrated triage strategies.

摘要

从最初的人乳头瘤病毒(HPV)感染和前体阶段开始,宫颈癌的发展需要几十年的时间。目前,高灵敏度 HPV DNA 检测被推荐为宫颈癌的主要筛查方法,而鼓励更好的分流方法是为 HPV 阳性妇女提供准确的风险管理。鉴于病毒驱动的基因组变异在宫颈癌发生过程中累积,我们设计了一个 39Mb 的定制捕获面板,针对 17 种 HPV 类型和 522 种与宫颈癌相关的突变基因。使用基于捕获的下一代测序,对 34 对样本进行 HPV 整合状态、体细胞突变和拷贝数变异分析,包括 10 例 HPV 感染(HPV+)、10 例宫颈上皮内瘤变(CIN)和 14 例 CIN2+(CIN2:n=1;CIN2-3:n=3;CIN3:n=9;鳞状细胞癌:n=1)。最后,应用机器学习算法(随机森林)构建基于 CIN2+富集生物标志物的宫颈癌前病变风险分层模型。一般来说,HPV 整合事件(HPV+中 11 例,CIN1 中 25 例,CIN2+中 56 例)、非同义突变(CIN1 中 2 例,CIN2+中 12 例)和拷贝数变异(HPV+中 19.1 例,CIN1 中 29.4 例,CIN2+中 127 例)从 HPV+增加到 CIN2+。有趣的是,CIN2+中明显观察到线粒体染色体的“常见”缺失(P=0.009)。总之,CIN2+富集的生物标志物,分为 HPV 信息、突变、扩增、缺失和线粒体变化,成功预测了 CIN2+,平均准确性概率评分 0.814,扩增和缺失被列为最重要的特征。我们的定制捕获测序结合机器学习方法,有效地对宫颈癌病变的风险进行分层,并提供了有价值的综合分流策略。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索