Abrams Zachary B, Tally Dwayne G, Zhang Lin, Coombes Caitlin E, Payne Philip R O, Abruzzo Lynne V, Coombes Kevin R
Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
The Center for Genomic Advocacy At Indiana State University, Terre Haute, IN, 47809, USA.
BMC Bioinformatics. 2021 Mar 1;22(1):100. doi: 10.1186/s12859-021-03992-1.
There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers.
In this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database.
Our findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.
生物医学信息学在处理和分析大规模数据集方面最近取得了许多突破。例如,CytoGPS算法通过将基于文本的核型转化为二元模型,实现了其应用。然而,这些进展伴随着新的数据稀疏性、异质性和噪声问题,而数据的大规模多维性质又进一步放大了这些问题。为了解决这些问题,我们开发了Mercator R包,用于处理和可视化二元生物医学数据。我们使用Mercator来解决与淋巴造血系统恶性肿瘤相关的细胞遗传学模式的生物医学问题,这些恶性肿瘤包括广泛的白血病和淋巴瘤。核型数据是淋巴系统恶性肿瘤中收集的最常见的遗传数据形式之一,因为核型分析是这些癌症标准治疗的一部分。
在本文中,我们结合CytoGPS和Mercator的分析能力,对从公共米特尔曼数据库获得的47种不同血液系统恶性肿瘤中的22,741个核型样本进行了大规模多维模式识别研究。
我们的研究结果表明,Mercator能够识别不同淋巴系统恶性肿瘤中的已知和新的细胞遗传学模式,加深了我们对这些疾病遗传学的理解。