Stellenbosch University.
armaceutical Sciences Department, Massachusetts College of Pharmacy and Health Sciences, USA.
Brief Bioinform. 2019 Mar 22;20(2):504-514. doi: 10.1093/bib/bbx138.
Breast cancer prognosis and administration of therapies are aided by knowledge of hormonal and HER2 receptor status. Breast cancer lacking estrogen receptors, progesterone receptors and HER2 receptors are difficult to treat. Regarding large data repositories such as The Cancer Genome Atlas, available wet-lab methods for establishing the presence of these receptors do not always conclusively cover all available samples. To this end, we introduce median-supplement methods to identify hormonal and HER2 receptor status phenotypes of breast cancer patients using gene expression profiles. In these approaches, supplementary instances based on median patient gene expression are introduced to balance a training set from which we build simple models to identify the receptor expression status of patients. In addition, for the purpose of benchmarking, we examine major machine learning approaches that are also applicable to the problem of finding receptor status in breast cancer. We show that our methods are robust and have high sensitivity with extremely low false-positive rates compared with the well-established methods. A successful application of these methods will permit the simultaneous study of large collections of samples of breast cancer patients as well as save time and cost while standardizing interpretation of outcomes of such studies.
激素和 HER2 受体状态的知识有助于乳腺癌的预后和治疗管理。缺乏雌激素受体、孕激素受体和 HER2 受体的乳腺癌难以治疗。对于像癌症基因组图谱这样的大型数据库,用于确定这些受体存在的现有湿实验室方法并不总是能够明确涵盖所有可用的样本。为此,我们引入中位数补充方法,使用基因表达谱来确定乳腺癌患者的激素和 HER2 受体状态表型。在这些方法中,引入基于中位数患者基因表达的补充实例,以平衡我们构建简单模型以识别患者受体表达状态的训练集。此外,为了进行基准测试,我们还研究了也适用于在乳腺癌中寻找受体状态的主要机器学习方法。我们的方法与成熟的方法相比具有稳健性,并且具有高灵敏度和极低的假阳性率。这些方法的成功应用将允许同时研究大量乳腺癌患者的样本,并在标准化此类研究结果的解释的同时节省时间和成本。