ChromosomeNet：一个大规模数据集，可用于临床染色体分类的基准测试和构建基线。

ChromosomeNet: A massive dataset enabling benchmarking and building basedlines of clinical chromosome classification.

机构信息

School of Computer Science, South China Normal University, Guangzhou 510631, China; Key Lab on Cloud Security and Assessment technology of Guangzhou, Guangzhou 510631, China; SCNU & VeChina Joint Lab on BlockChain Technology and Application, Guangzhou 510631, China.

Medical Genetic Centre and Maternal and Children Metabolic-Genetic Key Laboratory, Guangdong Women and Children Hospital, Guangzhou 511400, China.

出版信息

Comput Biol Chem. 2022 Oct;100:107731. doi: 10.1016/j.compbiolchem.2022.107731. Epub 2022 Jul 16.

DOI:10.1016/j.compbiolchem.2022.107731

PMID:35907293

Abstract

Chromosome karyotyping analysis is a vital cytogenetics technique for diagnosing genetic and congenital malformations, analyzing gestational and implantation failures, etc. Since the chromosome classification as an essential stage in chromosome karyotype analysis is a highly time-consuming, tedious, and error-prone task, which requires a large amount of manual work of experienced cytogenetics experts. Many deep learning-based methods have been proposed to address the chromosome classification issues. However, two challenges still remain in current chromosome classification methods. First, most existing methods were developed by different private datasets, making these methods difficult to compare with each other on the same base. Second, due to the absence of reproducing details of most existing methods, these methods are difficult to be applied in clinical chromosome classification applications widely. To address the above challenges in the chromosome classification issue, this work builds and publishes a massive clinical dataset. This dataset enables the benchmarking and building chromosome classification baselines suitable for different scenarios. The massive clinical dataset consists of 126,453 privacy preserving G-band chromosome instances from 2763 karyotypes of 408 individuals. To our best knowledge, it is the first work to collect, annotate, and release a publicly available clinical chromosome classification dataset whose data size scale is also over 120,000. Meanwhile, the experimental results show that the proposed dataset can boost performance of existing chromosome classification models at a varied range of degrees, with the highest accuracy improvement by 5.39 % points. Moreover, the best baseline with 99.33 % accuracy reports state-of-the-art classification performance. The clinical dataset and state-of-the-art baselines can be found at https://github.com/CloudDataLab/BenchmarkForChromosomeClassification.

摘要

染色体核型分析是诊断遗传和先天性畸形、分析妊娠和着床失败等的重要细胞遗传学技术。由于染色体分类作为染色体核型分析的一个重要阶段是一个非常耗时、乏味和容易出错的任务，需要大量有经验的细胞遗传学专家的人工工作。已经提出了许多基于深度学习的方法来解决染色体分类问题。然而，目前的染色体分类方法仍然存在两个挑战。首先，大多数现有的方法都是由不同的私有数据集开发的，这使得这些方法很难在同一基础上相互比较。其次，由于大多数现有方法缺乏重现细节，这些方法很难在临床染色体分类应用中广泛应用。为了解决染色体分类问题中的上述挑战，本工作构建并发布了一个大规模的临床数据集。该数据集能够对不同场景下的基准测试和构建染色体分类基线。该大规模临床数据集由来自 408 个人的 2763 个核型的 126453 个隐私保护 G 带染色体实例组成。据我们所知，这是首次收集、注释和发布公开可用的临床染色体分类数据集，其数据规模也超过 12 万。同时，实验结果表明，所提出的数据集可以在不同程度上提高现有染色体分类模型的性能，最高精度提高了 5.39%。此外，以 99.33%的准确率报告的最佳基线达到了分类性能的最新水平。临床数据集和最新的基线可以在 https://github.com/CloudDataLab/BenchmarkForChromosomeClassification 找到。

相似文献

ChromosomeNet: A massive dataset enabling benchmarking and building basedlines of clinical chromosome classification.ChromosomeNet：一个大规模数据集，可用于临床染色体分类的基准测试和构建基线。

Comput Biol Chem. 2022 Oct;100:107731. doi: 10.1016/j.compbiolchem.2022.107731. Epub 2022 Jul 16.

CIR-Net: Automatic Classification of Human Chromosome Based on Inception-ResNet Architecture.CIR-Net：基于 Inception-ResNet 架构的人类染色体自动分类。

IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1285-1293. doi: 10.1109/TCBB.2020.3003445. Epub 2022 Jun 3.

EDFace-Celeb-1 M: Benchmarking Face Hallucination With a Million-Scale Dataset.EDFace-Celeb-1 M：基于百万级数据集的人脸幻觉基准测试。

IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3968-3978. doi: 10.1109/TPAMI.2022.3181579. Epub 2023 Feb 3.

A Clinical Dataset and Various Baselines for Chromosome Instance Segmentation.染色体实例分割的临床数据集和各种基线。

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):31-39. doi: 10.1109/TCBB.2021.3089507. Epub 2022 Feb 3.

An improved deep convolutional neural network architecture for chromosome abnormality detection using hybrid optimization model.基于混合优化模型的染色体异常检测改进型深度卷积神经网络架构。

Microsc Res Tech. 2022 Sep;85(9):3115-3129. doi: 10.1002/jemt.24170. Epub 2022 Jun 16.

TEM virus images: Benchmark dataset and deep learning classification.TEM 病毒图像：基准数据集和深度学习分类。

Comput Methods Programs Biomed. 2021 Sep;209:106318. doi: 10.1016/j.cmpb.2021.106318. Epub 2021 Jul 29.

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data.ShinyLearner：一个用于表格数据机器学习分类的容器化基准测试工具。

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa026.

Tufts Dental Database: A Multimodal Panoramic X-Ray Dataset for Benchmarking Diagnostic Systems.塔夫茨牙科数据库：用于基准诊断系统的多模态全景 X 射线数据集。

IEEE J Biomed Health Inform. 2022 Apr;26(4):1650-1659. doi: 10.1109/JBHI.2021.3117575. Epub 2022 Apr 14.

Chromosome classification via deep learning and its application to patients with structural abnormalities of chromosomes.基于深度学习的染色体分类及其在染色体结构异常患者中的应用。

Med Eng Phys. 2023 Nov;121:104064. doi: 10.1016/j.medengphy.2023.104064. Epub 2023 Oct 17.

Hierarchical, multi-sensor based classification of daily life activities: comparison with state-of-the-art algorithms using a benchmark dataset.基于分层多传感器的日常生活活动分类：使用基准数据集与最先进算法的比较。

PLoS One. 2013 Oct 9;8(10):e75196. doi: 10.1371/journal.pone.0075196. eCollection 2013.

引用本文的文献

Optimization of diagnosis and treatment of hematological diseases via artificial intelligence.通过人工智能优化血液疾病的诊断与治疗

Front Med (Lausanne). 2024 Nov 7;11:1487234. doi: 10.3389/fmed.2024.1487234. eCollection 2024.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ChromosomeNet：一个大规模数据集，可用于临床染色体分类的基准测试和构建基线。

ChromosomeNet: A massive dataset enabling benchmarking and building basedlines of clinical chromosome classification.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献