Suppr超能文献

许特耳细胞癌的识别:利用基因组测序和三种机器学习算法应对临床挑战

Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms.

作者信息

Hao Yangyang, Duh Quan-Yang, Kloos Richard T, Babiarz Joshua, Harrell R Mack, Traweek S Thomas, Kim Su Yeon, Fedorowicz Grazyna, Walsh P Sean, Sadow Peter M, Huang Jing, Kennedy Giulia C

机构信息

Department of Research & Development, Veracyte, Inc, 6000 Shoreline Court, Suite 300, South San Francisco, CA, 94080, USA.

Department of Surgery, Section of Endocrine Surgery, University of California San Francisco, San Francisco, CA, USA.

出版信息

BMC Syst Biol. 2019 Apr 5;13(Suppl 2):27. doi: 10.1186/s12918-019-0693-z.

Abstract

BACKGROUND

Identification of Hürthle cell cancers by non-operative fine-needle aspiration biopsy (FNAB) of thyroid nodules is challenging. Resultingly, non-cancerous Hürthle lesions were conventionally distinguished from Hürthle cell cancers by histopathological examination of tissue following surgical resection. Reliance on histopathological evaluation requires patients to undergo surgery to obtain a diagnosis despite most being non-cancerous. It is highly desirable to avoid surgery and to provide accurate classification of benignity versus malignancy from FNAB preoperatively. In our first-generation algorithm, Gene Expression Classifier (GEC), we achieved this goal by using machine learning (ML) on gene expression features. The classifier is sensitive, but not specific due in part to the presence of non-neoplastic benign Hürthle cells in many FNAB.

RESULTS

We sought to overcome this low-specificity limitation by expanding the feature set for ML using next-generation whole transcriptome RNA sequencing and called the improved algorithm the Genomic Sequencing Classifier (GSC). The Hürthle identification leverages mitochondrial expression and we developed novel feature extraction mechanisms to measure chromosomal and genomic level loss-of-heterozygosity (LOH) for the algorithm. Additionally, we developed a multi-layered system of cascading classifiers to sequentially triage Hürthle cell-containing FNAB, including: 1. presence of Hürthle cells, 2. presence of neoplastic Hürthle cells, and 3. presence of benign Hürthle cells. The final Hürthle cell Index utilizes 1048 nuclear and mitochondrial genes; and Hürthle cell Neoplasm Index leverages LOH features as well as 2041 genes. Both indices are Support Vector Machine (SVM) based. The third classifier, the GSC Benign/Suspicious classifier, utilizes 1115 core genes and is an ensemble classifier incorporating 12 individual models.

CONCLUSIONS

The accurate algorithmic depiction of this complex biological system among Hürthle subtypes results in a dramatic improvement of classification performance; specificity among Hürthle cell neoplasms increases from 11.8% with the GEC to 58.8% with the GSC, while maintaining the same sensitivity of 89%.

摘要

背景

通过甲状腺结节的非手术细针穿刺活检(FNAB)来识别许特耳细胞癌具有挑战性。因此,传统上通过手术切除后组织的组织病理学检查来区分非癌性许特耳病变和许特耳细胞癌。尽管大多数患者并非癌症,但依赖组织病理学评估仍需要患者接受手术才能获得诊断。非常希望避免手术,并在术前通过FNAB对良性与恶性进行准确分类。在我们的第一代算法基因表达分类器(GEC)中,我们通过对基因表达特征使用机器学习(ML)实现了这一目标。该分类器具有敏感性,但特异性不高,部分原因是许多FNAB中存在非肿瘤性良性许特耳细胞。

结果

我们试图通过使用下一代全转录组RNA测序扩展ML的特征集来克服这种低特异性限制,并将改进后的算法称为基因组测序分类器(GSC)。许特耳细胞的识别利用了线粒体表达,并且我们开发了新颖的特征提取机制来测量算法的染色体和基因组水平杂合性缺失(LOH)。此外,我们开发了一个多层级联分类器系统,以对含有许特耳细胞的FNAB进行顺序分类,包括:1. 许特耳细胞的存在;2. 肿瘤性许特耳细胞的存在;3. 良性许特耳细胞的存在。最终的许特耳细胞指数利用1048个核基因和线粒体基因;许特耳细胞瘤指数利用LOH特征以及2041个基因。这两个指数均基于支持向量机(SVM)。第三个分类器,即GSC良性/可疑分类器,利用1115个核心基因,是一个包含12个个体模型的集成分类器。

结论

对许特耳细胞亚型中这种复杂生物系统的准确算法描述导致分类性能的显著提高;许特耳细胞瘤中的特异性从GEC的11.8%提高到GSC的58.8%,同时保持89%的相同敏感性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5ac/6450053/2a7735908c7c/12918_2019_693_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验