Department of Biology, Brigham Young University, Provo, Utah, United States.
Cancer Biol Ther. 2021 Sep 2;22(7-9):417-429. doi: 10.1080/15384047.2021.1953902. Epub 2021 Aug 19.
Scholarly requirements have led to a massive increase of transcriptomic data in the public domain, with millions of samples available for secondary research. We identified gene-expression datasets representing 10,214 breast-cancer patients in public databases. We focused on datasets that included patient metadata on race and/or immunohistochemistry (IHC) profiling of the ER, PR, and HER-2 proteins. This review provides a summary of these datasets and describes findings from 32 research articles associated with the datasets. These studies have helped to elucidate relationships between IHC, race, and/or treatment options, as well as relationships between IHC status and the breast-cancer intrinsic subtypes. We have also identified broad themes across the analysis methodologies used in these studies, including breast cancer subtyping, deriving predictive biomarkers, identifying differentially expressed genes, and optimizing data processing. Finally, we discuss limitations of prior work and recommend future directions for reusing these datasets in secondary analyses.
学术需求导致公共领域中转录组学数据的大量增加,数百万个样本可用于二次研究。我们在公共数据库中确定了代表 10214 名乳腺癌患者的基因表达数据集。我们专注于包含患者种族和/或 ER、PR 和 HER-2 蛋白免疫组织化学 (IHC) 分析元数据的数据集。本综述提供了这些数据集的摘要,并描述了与数据集相关的 32 篇研究文章的发现。这些研究有助于阐明 IHC、种族和/或治疗选择之间的关系,以及 IHC 状态与乳腺癌内在亚型之间的关系。我们还确定了这些研究中使用的分析方法的广泛主题,包括乳腺癌亚型、衍生预测生物标志物、识别差异表达基因和优化数据处理。最后,我们讨论了先前工作的局限性,并建议在二次分析中重复使用这些数据集的未来方向。