Suppr超能文献

一种通过双聚类分析进行亚组识别和预测的复合模型。

A composite model for subgroup identification and prediction via bicluster analysis.

作者信息

Chen Hung-Chia, Zou Wen, Lu Tzu-Pin, Chen James J

机构信息

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America; Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan.

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America.

出版信息

PLoS One. 2014 Oct 27;9(10):e111318. doi: 10.1371/journal.pone.0111318. eCollection 2014.

Abstract

BACKGROUND

A major challenges in the analysis of large and complex biomedical data is to develop an approach for 1) identifying distinct subgroups in the sampled populations, 2) characterizing their relationships among subgroups, and 3) developing a prediction model to classify subgroup memberships of new samples by finding a set of predictors. Each subgroup can represent different pathogen serotypes of microorganisms, different tumor subtypes in cancer patients, or different genetic makeups of patients related to treatment response.

METHODS

This paper proposes a composite model for subgroup identification and prediction using biclusters. A biclustering technique is first used to identify a set of biclusters from the sampled data. For each bicluster, a subgroup-specific binary classifier is built to determine if a particular sample is either inside or outside the bicluster. A composite model, which consists of all binary classifiers, is constructed to classify samples into several disjoint subgroups. The proposed composite model neither depends on any specific biclustering algorithm or patterns of biclusters, nor on any classification algorithms.

RESULTS

The composite model was shown to have an overall accuracy of 97.4% for a synthetic dataset consisting of four subgroups. The model was applied to two datasets where the sample's subgroup memberships were known. The procedure showed 83.7% accuracy in discriminating lung cancer adenocarcinoma and squamous carcinoma subtypes, and was able to identify 5 serotypes and several subtypes with about 94% accuracy in a pathogen dataset.

CONCLUSION

The composite model presents a novel approach to developing a biclustering-based classification model from unlabeled sampled data. The proposed approach combines unsupervised biclustering and supervised classification techniques to classify samples into disjoint subgroups based on their associated attributes, such as genotypic factors, phenotypic outcomes, efficacy/safety measures, or responses to treatments. The procedure is useful for identification of unknown species or new biomarkers for targeted therapy.

摘要

背景

分析大型复杂生物医学数据的一个主要挑战是开发一种方法,用于:1)识别抽样人群中的不同亚组;2)描述亚组之间的关系;3)通过找到一组预测因子来开发预测模型,以对新样本的亚组成员身份进行分类。每个亚组可以代表微生物的不同病原体血清型、癌症患者的不同肿瘤亚型或与治疗反应相关的患者不同基因组成。

方法

本文提出了一种使用双聚类进行亚组识别和预测的复合模型。首先使用双聚类技术从抽样数据中识别出一组双聚类。对于每个双聚类,构建一个亚组特异性二元分类器,以确定特定样本是在双聚类内部还是外部。构建一个由所有二元分类器组成的复合模型,将样本分类为几个不相交的亚组。所提出的复合模型既不依赖于任何特定的双聚类算法或双聚类模式,也不依赖于任何分类算法。

结果

对于由四个亚组组成的合成数据集,复合模型的总体准确率为97.4%。该模型应用于两个已知样本亚组成员身份的数据集。该过程在区分肺癌腺癌和鳞癌亚型方面显示出83.7%的准确率,并且在病原体数据集中能够以约94%的准确率识别出5种血清型和几种亚型。

结论

复合模型提出了一种从未标记抽样数据开发基于双聚类的分类模型的新方法。所提出的方法结合了无监督双聚类和有监督分类技术,根据样本的相关属性(如基因型因素、表型结果、疗效/安全措施或对治疗的反应)将样本分类为不相交的亚组。该过程对于识别未知物种或靶向治疗的新生物标志物很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9cc/4210136/c3467685787f/pone.0111318.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验