Department of Mathematical Sciences, University of Massachusetts, Lowell, MA, USA.
BMC Med Genomics. 2011 Jan 24;4:10. doi: 10.1186/1755-8794-4-10.
Molecular classification of tumors can be achieved by global gene expression profiling. Most machine learning classification algorithms furnish global error rates for the entire population. A few algorithms provide an estimate of probability of malignancy for each queried patient but the degree of accuracy of these estimates is unknown. On the other hand local minimax learning provides such probability estimates with best finite sample bounds on expected mean squared error on an individual basis for each queried patient. This allows a significant percentage of the patients to be identified as confidently predictable, a condition that ensures that the machine learning algorithm possesses an error rate below the tolerable level when applied to the confidently predictable patients.
We devise a new learning method that implements: (i) feature selection using the k-TSP algorithm and (ii) classifier construction by local minimax kernel learning. We test our method on three publicly available gene expression datasets and achieve significantly lower error rate for a substantial identifiable subset of patients. Our final classifiers are simple to interpret and they can make prediction on an individual basis with an individualized confidence level.
Patients that were predicted confidently by the classifiers as cancer can receive immediate and appropriate treatment whilst patients that were predicted confidently as healthy will be spared from unnecessary treatment. We believe that our method can be a useful tool to translate the gene expression signatures into clinical practice for personalized medicine.
通过全局基因表达谱分析可以实现肿瘤的分子分类。大多数机器学习分类算法为整个群体提供全局错误率。少数算法为每个查询的患者提供恶性肿瘤的概率估计,但这些估计的准确性程度尚不清楚。另一方面,局部最小最大学习为每个查询的患者提供了基于个体的最佳有限样本均方误差的概率估计,从而可以将大量患者确定为可准确预测。这种情况确保了机器学习算法在应用于可准确预测的患者时,其错误率低于可接受水平。
我们设计了一种新的学习方法,该方法实现了:(i)使用 k-TSP 算法进行特征选择,(ii)通过局部最小最大核学习进行分类器构建。我们在三个公开可用的基因表达数据集上测试了我们的方法,并在相当大的可识别患者亚组中实现了明显更低的错误率。我们的最终分类器易于解释,并且可以根据个体进行预测,并具有个性化的置信水平。
被分类器自信地预测为癌症的患者可以立即获得适当的治疗,而被分类器自信地预测为健康的患者将避免不必要的治疗。我们相信,我们的方法可以成为将基因表达谱转化为个性化医疗的临床实践的有用工具。