在来自多个数据集的基因表达数据上鉴定和验证的候选生物标志物集的预测潜力。

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.

作者信息

Gormley Michael, Dampier William, Ertel Adam, Karacali Bilge, Tozeren Aydin

机构信息

School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA.

出版信息

BMC Bioinformatics. 2007 Oct 26;8:415. doi: 10.1186/1471-2105-8-415.

DOI:10.1186/1471-2105-8-415

PMID:17963508

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2211325/

Abstract

BACKGROUND

Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal) samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a) all genes on the microarray platform and b) a list of known disease-related genes (a priori selection). We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms.

RESULTS

Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform.

CONCLUSION

Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine learning approaches. These findings are relevant to the use of molecular profiling for the identification of candidate biomarker panels.

摘要

背景

同一生物学状态下独立得出的表达谱通常很少有共同基因。在本研究中，我们使用迭代机器学习算法，从与临床信息相关联的癌症（乳腺癌、淋巴瘤和肾癌）样本的公开可用微阵列数据集中创建了表达谱群体。ROC曲线用于评估每个谱对分类的预测误差。我们比较了与分子表型相关的谱和与无复发生存状态相关的谱的预测误差。将通过监督单变量特征选择算法识别的谱的预测误差与从以下两者中随机选择的谱进行比较：a）微阵列平台上的所有基因；b）已知疾病相关基因列表（先验选择）。我们还确定了来自独立数据集的测试阵列上的表达谱的相关性，这些数据集在相同或不同的微阵列平台上进行测量。

结果

基于ER和BCL-6表达，分别在模拟基因表达数据以及乳腺癌和淋巴瘤数据集的表达数据上生成了高度有区分力的表达谱。使用无复发生存状态来识别用于预后预测的谱会导致区分力较差的决策规则。监督特征选择比随机选择或先验选择产生更准确的分类，然而，随着特征数量的增加，预测误差的差异减小。当决策规则跨数据集应用于在相同微阵列平台上进行分析的样本时，这些结果依然成立。

结论

我们的结果表明，许多基因集能够准确预测分子表型。鉴于此，使用不同训练数据集识别的表达谱预计显示出很少的一致性。此外，我们证明了使用监督机器学习方法直接从微阵列数据预测复发的困难。这些发现与使用分子谱分析来识别候选生物标志物面板相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f39/2211325/2ef76b4540ed/1471-2105-8-415-1.jpg

相似文献

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.

BMC Bioinformatics. 2007 Oct 26;8:415. doi: 10.1186/1471-2105-8-415.

Tumor classification ranking from microarray data.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data.

BMC Bioinformatics. 2006 Jul 26;7:359. doi: 10.1186/1471-2105-7-359.

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-9-S2-S7.

Challenges in projecting clustering results across gene expression-profiling datasets.

J Natl Cancer Inst. 2007 Nov 21;99(22):1715-23. doi: 10.1093/jnci/djm216. Epub 2007 Nov 13.

A comparative study of different machine learning methods on microarray gene expression data.

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-9-S1-S13.

Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.

PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.

The feature selection bias problem in relation to high-dimensional gene data.

Artif Intell Med. 2016 Jan;66:63-71. doi: 10.1016/j.artmed.2015.11.001. Epub 2015 Nov 14.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Module-based outcome prediction using breast cancer compendia.

PLoS One. 2007 Oct 17;2(10):e1047. doi: 10.1371/journal.pone.0001047.

引用本文的文献

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets.

PLoS One. 2022 Jul 28;17(7):e0252697. doi: 10.1371/journal.pone.0252697. eCollection 2022.

Introducing Serine as Cardiovascular Disease Biomarker Candidate via Pathway Analysis.

Galen Med J. 2020 Feb 10;9:e1696. doi: 10.31661/gmj.v9i0.1696. eCollection 2020.

Meta-analysis approach as a gene selection method in class prediction: does it improve model performance? A case study in acute myeloid leukemia.

BMC Bioinformatics. 2017 Apr 11;18(1):210. doi: 10.1186/s12859-017-1619-7.

Identification of novel biomarkers associated with poor patient outcomes in invasive breast carcinoma.

Tumour Biol. 2016 Oct;37(10):13855-13870. doi: 10.1007/s13277-016-5133-8. Epub 2016 Aug 2.

Transcriptional response of porcine skeletal muscle to feeding a linseed-enriched diet to growing pigs.

J Anim Sci Biotechnol. 2016 Feb 8;7:6. doi: 10.1186/s40104-016-0064-1. eCollection 2016.

Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers.

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S2. doi: 10.1186/1471-2164-16-S7-S2. Epub 2015 Jun 11.

Development of phenotypic and transcriptional biomarkers to evaluate relative activity of potentially estrogenic chemicals in ovariectomized mice.

Environ Health Perspect. 2015 Apr;123(4):344-52. doi: 10.1289/ehp.1307935. Epub 2015 Jan 9.

Psoriasis prediction from genome-wide SNP profiles.

BMC Dermatol. 2011 Jan 7;11:1. doi: 10.1186/1471-5945-11-1.

Asymmetric microarray data produces gene lists highly predictive of research literature on multiple cancer types.

BMC Bioinformatics. 2010 Sep 27;11:483. doi: 10.1186/1471-2105-11-483.

Modular composition predicts kinase/substrate interactions.

BMC Bioinformatics. 2010 Jun 25;11:349. doi: 10.1186/1471-2105-11-349.

本文引用的文献

On the number of close-to-optimal feature sets.

Cancer Inform. 2007 Feb 16;2:189-96.

Gene expression profiles as biomarkers for the prediction of chemotherapy drug response in human tumour cells.

Anticancer Drugs. 2007 Jun;18(5):499-523. doi: 10.1097/CAD.0b013e3280262427.

Integration of clinical information and gene expression profiles for prediction of chemo-response for ovarian cancer.

Conf Proc IEEE Eng Med Biol Soc. 2005;2005:4818-21. doi: 10.1109/IEMBS.2005.1615550.

Improved breast cancer prognosis through the combination of clinical and genetic markers.

Bioinformatics. 2007 Jan 1;23(1):30-7. doi: 10.1093/bioinformatics/btl543. Epub 2006 Nov 26.

Identifying genes that contribute most to good classification in microarrays.

BMC Bioinformatics. 2006 Sep 7;7:407. doi: 10.1186/1471-2105-7-407.

Proteome analysis of cultured fibroblasts from type 1 diabetic patients and normal subjects.

J Clin Endocrinol Metab. 2006 Sep;91(9):3507-14. doi: 10.1210/jc.2006-0274. Epub 2006 Jul 5.

A biologic definition of Burkitt's lymphoma from transcriptional and genomic profiling.

N Engl J Med. 2006 Jun 8;354(23):2419-30. doi: 10.1056/NEJMoa055351.

Gene expression profiling highlights defective myogenesis in DMD patients and a possible role for bone morphogenetic protein 4.

Neurobiol Dis. 2006 Jul;23(1):228-36. doi: 10.1016/j.nbd.2006.03.004. Epub 2006 May 6.

Cancer biomarkers: knowing the present and predicting the future.

Future Oncol. 2005 Feb;1(1):37-50. doi: 10.1517/14796694.1.1.37.

Gene expression profiling predicts survival in conventional renal cell carcinoma.

PLoS Med. 2006 Jan;3(1):e13. doi: 10.1371/journal.pmed.0030013. Epub 2005 Dec 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在来自多个数据集的基因表达数据上鉴定和验证的候选生物标志物集的预测潜力。

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.

作者信息

Gormley Michael, Dampier William, Ertel Adam, Karacali Bilge, Tozeren Aydin

机构信息

School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA.