Suppr超能文献

在肺腺癌中寻找差异表达基因的竞争性检测方法的效率分析

Efficiency analysis of competing tests for finding differentially expressed genes in lung adenocarcinoma.

作者信息

Jordan Rick, Patel Satish, Hu Hai, Lyons-Weiler James

机构信息

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA.

出版信息

Cancer Inform. 2008;6:389-421. doi: 10.4137/cin.s791. Epub 2008 Jul 14.

Abstract

In this study, we introduce and use Efficiency Analysis to compare differences in the apparent internal and external consistency of competing normalization methods and tests for identifying differentially expressed genes. Using publicly available data, two lung adenocarcinoma datasets were analyzed using caGEDA (http://bioinformatics2.pitt.edu/GE2/GEDA.html) to measure the degree of differential expression of genes existing between two populations. The datasets were randomly split into at least two subsets, each analyzed for differentially expressed genes between the two sample groups, and the gene lists compared for overlapping genes. Efficiency Analysis is an intuitive method that compares the differences in the percentage of overlap of genes from two or more data subsets, found by the same test over a range of testing methods. Tests that yield consistent gene lists across independently analyzed splits are preferred to those that yield less consistent inferences. For example, a method that exhibits 50% overlap in the 100 top genes from two studies should be preferred to a method that exhibits 5% overlap in the top 100 genes. The same procedure was performed using all available normalization and transformation methods that are available through caGEDA. The 'best' test was then further evaluated using internal cross-validation to estimate generalizable sample classification errors using a Naïve Bayes classification algorithm. A novel test, termed D1 (a derivative of the J5 test) was found to be the most consistent, and to exhibit the lowest overall classification error, and highest sensitivity and specificity. The D1 test relaxes the assumption that few genes are differentially expressed. Efficiency Analysis can be misleading if the tests exhibit a bias in any particular dimension (e.g. expression intensity); we therefore explored intensity-scaled and segmented J5 tests using data in which all genes are scaled to share the same intensity distribution range. Efficiency Analysis correctly predicted the 'best' test and normalization method using the Beer dataset and also performed well with the Bhattacharjee dataset based on both efficiency and classification accuracy criteria.

摘要

在本研究中,我们引入并使用效率分析来比较竞争性标准化方法和用于鉴定差异表达基因的检验在表观内部和外部一致性方面的差异。利用公开可用的数据,使用caGEDA(http://bioinformatics2.pitt.edu/GE2/GEDA.html)对两个肺腺癌数据集进行分析,以测量两个群体之间存在的基因差异表达程度。将数据集随机分成至少两个子集,每个子集分析两个样本组之间的差异表达基因,并比较基因列表中的重叠基因。效率分析是一种直观的方法,可比较通过一系列测试方法由相同检验在两个或更多数据子集中发现的基因重叠百分比差异。在独立分析的拆分中产生一致基因列表的检验优于那些产生较不一致推断的检验。例如,在两项研究的前100个基因中显示出50%重叠的方法应优于在前100个基因中显示出5%重叠的方法。使用caGEDA中可用的所有标准化和转换方法执行相同的程序。然后使用朴素贝叶斯分类算法通过内部交叉验证进一步评估“最佳”检验,以估计可推广的样本分类错误。发现一种名为D1(J5检验的衍生物)的新型检验最一致,总体分类错误最低,灵敏度和特异性最高。D1检验放宽了少数基因差异表达的假设。如果检验在任何特定维度(例如表达强度)上表现出偏差,效率分析可能会产生误导;因此,我们使用所有基因都按比例缩放以共享相同强度分布范围的数据探索了强度缩放和分段的J5检验。基于效率和分类准确性标准,效率分析使用Beer数据集正确预测了“最佳”检验和标准化方法,并且在Bhattacharjee数据集上也表现良好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65da/2623303/21d2e30a5e32/cin-6-0389f12.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验