Mamtani Manju R, Thakre Tushar P, Kalkonde Mrunal Y, Amin Manik A, Kalkonde Yogeshwar V, Amin Amit P, Kulkarni Hemant
Lata Medical Research Foundation, Nagpur, India.
BMC Bioinformatics. 2006 Oct 10;7:442. doi: 10.1186/1471-2105-7-442.
In spite of the recognized diagnostic potential of biomarkers, the quest for squelching noise and wringing in information from a given set of biomarkers continues. Here, we suggest a statistical algorithm that--assuming each molecular biomarker to be a diagnostic test--enriches the diagnostic performance of an optimized set of independent biomarkers employing established statistical techniques. We validated the proposed algorithm using several simulation datasets in addition to four publicly available real datasets that compared i) subjects having cancer with those without; ii) subjects with two different cancers; iii) subjects with two different types of one cancer; and iv) subjects with same cancer resulting in differential time to metastasis.
Our algorithm comprises of three steps: estimating the area under the receiver operating characteristic curve for each biomarker, identifying a subset of biomarkers using linear regression and combining the chosen biomarkers using linear discriminant function analysis. Combining these established statistical methods that are available in most statistical packages, we observed that the diagnostic accuracy of our approach was 100%, 99.94%, 96.67% and 93.92% for the real datasets used in the study. These estimates were comparable to or better than the ones previously reported using alternative methods. In a synthetic dataset, we also observed that all the biomarkers chosen by our algorithm were indeed truly differentially expressed.
The proposed algorithm can be used for accurate diagnosis in the setting of dichotomous classification of disease states.
尽管生物标志物具有公认的诊断潜力,但人们仍在不断探索如何消除给定生物标志物集中的噪声并提取其中的信息。在此,我们提出一种统计算法,该算法假设每个分子生物标志物都是一种诊断测试,利用已有的统计技术来提高一组优化的独立生物标志物的诊断性能。除了四个公开可用的真实数据集外,我们还使用了几个模拟数据集对所提出的算法进行了验证,这些真实数据集比较了:i)患有癌症的受试者与未患癌症的受试者;ii)患有两种不同癌症的受试者;iii)患有同一种癌症的两种不同类型的受试者;iv)患有相同癌症但转移时间不同的受试者。
我们的算法包括三个步骤:估计每个生物标志物的受试者工作特征曲线下面积,使用线性回归识别生物标志物子集,并使用线性判别函数分析组合所选的生物标志物。结合大多数统计软件包中都有的这些既定统计方法,我们观察到,对于本研究中使用的真实数据集,我们方法的诊断准确率分别为100%、99.94%、96.67%和93.92%。这些估计值与之前使用其他方法报告的结果相当或更好。在一个合成数据集中,我们还观察到我们算法选择的所有生物标志物确实存在真正的差异表达。
所提出的算法可用于疾病状态二分分类情况下的准确诊断。