• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GSNFS:肺癌表达数据的基因子网生物标志物识别

GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.

作者信息

Doungpan Narumol, Engchuan Worrawat, Chan Jonathan H, Meechai Asawin

机构信息

Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, Bangkok, Thailand.

The Centre for Applied Genomics, Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada.

出版信息

BMC Med Genomics. 2016 Dec 5;9(Suppl 3):70. doi: 10.1186/s12920-016-0231-4.

DOI:10.1186/s12920-016-0231-4
PMID:28117655
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5260788/
Abstract

BACKGROUND

Gene expression has been used to identify disease gene biomarkers, but there are ongoing challenges. Single gene or gene-set biomarkers are inadequate to provide sufficient understanding of complex disease mechanisms and the relationship among those genes. Network-based methods have thus been considered for inferring the interaction within a group of genes to further study the disease mechanism. Recently, the Gene-Network-based Feature Set (GNFS), which is capable of handling case-control and multiclass expression for gene biomarker identification, has been proposed, partly taking into account of network topology. However, its performance relies on a greedy search for building subnetworks and thus requires further improvement. In this work, we establish a new approach named Gene Sub-Network-based Feature Selection (GSNFS) by implementing the GNFS framework with two proposed searching and scoring algorithms, namely gene-set-based (GS) search and parent-node-based (PN) search, to identify subnetworks. An additional dataset is used to validate the results.

METHODS

The two proposed searching algorithms of the GSNFS method for subnetwork expansion are concerned with the degree of connectivity and the scoring scheme for building subnetworks and their topology. For each iteration of expansion, the neighbour genes of a current subnetwork, whose expression data improved the overall subnetwork score, is recruited. While the GS search calculated the subnetwork score using an activity score of a current subnetwork and the gene expression values of its neighbours, the PN search uses the expression value of the corresponding parent of each neighbour gene. Four lung cancer expression datasets were used for subnetwork identification. In addition, using pathway data and protein-protein interaction as network data in order to consider the interaction among significant genes were discussed. Classification was performed to compare the performance of the identified gene subnetworks with three subnetwork identification algorithms.

RESULTS

The two searching algorithms resulted in better classification and gene/gene-set agreement compared to the original greedy search of the GNFS method. The identified lung cancer subnetwork using the proposed searching algorithm resulted in an improvement of the cross-dataset validation and an increase in the consistency of findings between two independent datasets. The homogeneity measurement of the datasets was conducted to assess dataset compatibility in cross-dataset validation. The lung cancer dataset with higher homogeneity showed a better result when using the GS search while the dataset with low homogeneity showed a better result when using the PN search. The 10-fold cross-dataset validation on the independent lung cancer datasets showed higher classification performance of the proposed algorithms when compared with the greedy search in the original GNFS method.

CONCLUSIONS

The proposed searching algorithms provide a higher number of genes in the subnetwork expansion step than the greedy algorithm. As a result, the performance of the subnetworks identified from the GSNFS method was improved in terms of classification performance and gene/gene-set level agreement depending on the homogeneity of the datasets used in the analysis. Some common genes obtained from the four datasets using different searching algorithms are genes known to play a role in lung cancer. The improvement of classification performance and the gene/gene-set level agreement, and the biological relevance indicated the effectiveness of the GSNFS method for gene subnetwork identification using expression data.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b44d/5260788/9a1b5f8dea45/12920_2016_231_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b44d/5260788/9a1b5f8dea45/12920_2016_231_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b44d/5260788/9a1b5f8dea45/12920_2016_231_Fig1_HTML.jpg
摘要

背景

基因表达已被用于识别疾病基因生物标志物,但仍存在诸多挑战。单基因或基因集生物标志物不足以充分理解复杂的疾病机制以及这些基因之间的关系。因此,基于网络的方法被用于推断一组基因内的相互作用,以进一步研究疾病机制。最近,有人提出了基于基因网络的特征集(GNFS),它能够处理病例对照和多类基因表达以进行基因生物标志物识别,部分考虑了网络拓扑结构。然而,其性能依赖于构建子网的贪婪搜索,因此需要进一步改进。在这项工作中,我们通过用两种提出的搜索和评分算法,即基于基因集(GS)的搜索和基于父节点(PN)的搜索,来实现GNFS框架,建立了一种名为基于基因子网的特征选择(GSNFS)的新方法,以识别子网。使用一个额外的数据集来验证结果。

方法

GSNFS方法用于子网扩展的两种提出的搜索算法涉及连通度以及构建子网及其拓扑结构的评分方案。对于每次扩展迭代,招募当前子网的邻居基因,其表达数据改善了整个子网的分数。虽然GS搜索使用当前子网的活性分数及其邻居的基因表达值来计算子网分数,但PN搜索使用每个邻居基因相应父节点的表达值。使用四个肺癌表达数据集进行子网识别。此外,还讨论了使用通路数据和蛋白质 - 蛋白质相互作用作为网络数据,以便考虑显著基因之间的相互作用。进行分类以将识别出的基因子网的性能与三种子网识别算法进行比较。

结果

与GNFS方法的原始贪婪搜索相比,这两种搜索算法在分类以及基因/基因集一致性方面表现更好。使用提出的搜索算法识别出的肺癌子网在跨数据集验证方面有所改进,并且两个独立数据集之间的发现一致性有所增加。对数据集进行同质性测量以评估跨数据集验证中的数据集兼容性。同质性较高的肺癌数据集在使用GS搜索时显示出更好的结果,而同质性较低的数据集在使用PN搜索时显示出更好的结果。在独立的肺癌数据集上进行的10倍交叉数据集验证表明,与原始GNFS方法中的贪婪搜索相比,所提出的算法具有更高的分类性能。

结论

所提出的搜索算法在子网扩展步骤中比贪婪算法提供了更多的基因。因此,根据分析中使用的数据集的同质性,从GSNFS方法识别出的子网在分类性能和基因/基因集水平一致性方面得到了改进。使用不同搜索算法从四个数据集中获得的一些常见基因是已知在肺癌中起作用的基因。分类性能和基因/基因集水平一致性的提高以及生物学相关性表明了GSNFS方法使用表达数据进行基因子网识别的有效性。

相似文献

1
GSNFS: Gene subnetwork biomarker identification of lung cancer expression data.GSNFS:肺癌表达数据的基因子网生物标志物识别
BMC Med Genomics. 2016 Dec 5;9(Suppl 3):70. doi: 10.1186/s12920-016-0231-4.
2
BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency.BMRF-MI:通过对基因依赖性进行建模来综合识别蛋白质相互作用网络。
BMC Genomics. 2015;16 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2164-16-S7-S10. Epub 2015 Jun 11.
3
Incorporating topological information for predicting robust cancer subnetwork markers in human protein-protein interaction network.整合拓扑信息以预测人类蛋白质-蛋白质相互作用网络中稳健的癌症子网标志物。
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):351. doi: 10.1186/s12859-016-1224-1.
4
GTA: a game theoretic approach to identifying cancer subnetwork markers.GTA:一种用于识别癌症子网标志物的博弈论方法。
Mol Biosyst. 2016 Mar;12(3):818-25. doi: 10.1039/c5mb00684h. Epub 2016 Jan 11.
5
Identification of significantly mutated subnetworks in the breast cancer genome.鉴定乳腺癌基因组中显著突变的子网络。
Sci Rep. 2021 Jan 12;11(1):642. doi: 10.1038/s41598-020-80204-5.
6
Identification of hub subnetwork based on topological features of genes in breast cancer.基于乳腺癌基因拓扑特征的枢纽子网鉴定
Int J Mol Med. 2015 Mar;35(3):664-74. doi: 10.3892/ijmm.2014.2057. Epub 2014 Dec 30.
7
Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network.鉴定人类蛋白质相互作用网络中癌症的诊断子网标记物。
BMC Bioinformatics. 2010 Oct 7;11 Suppl 6(Suppl 6):S8. doi: 10.1186/1471-2105-11-S6-S8.
8
Simultaneous identification of robust synergistic subnetwork markers for effective cancer prognosis.同时识别用于有效癌症预后的稳健协同子网标志物。
EURASIP J Bioinform Syst Biol. 2014 Nov 6;2014:19. doi: 10.1186/s13637-014-0019-9. eCollection 2014 Dec.
9
Identification of differentially expressed subnetworks based on multivariate ANOVA.基于多变量方差分析的差异表达子网的识别。
BMC Bioinformatics. 2009 Apr 30;10:128. doi: 10.1186/1471-2105-10-128.
10
Comparison of statistical methods for subnetwork detection in the integration of gene expression and protein interaction network.基因表达与蛋白质相互作用网络整合中亚网络检测统计方法的比较
BMC Bioinformatics. 2017 Mar 3;18(1):149. doi: 10.1186/s12859-017-1567-2.

引用本文的文献

1
Identification of Diagnostic and Prognostic Subnetwork Biomarkers for Women with Breast Cancer Using Integrative Genomic and Network-Based Analysis.使用整合基因组学和基于网络的分析方法鉴定乳腺癌女性的诊断和预后子网生物标志物
Int J Mol Sci. 2024 Nov 28;25(23):12779. doi: 10.3390/ijms252312779.
2
Data analysis methods for defining biomarkers from omics data.用于从组学数据中定义生物标志物的数据分析方法。
Anal Bioanal Chem. 2022 Jan;414(1):235-250. doi: 10.1007/s00216-021-03813-7. Epub 2021 Dec 24.
3
NBIA: a network-based integrative analysis framework - applied to pathway analysis.

本文引用的文献

1
Gene-set activity toolbox (GAT): A platform for microarray-based cancer diagnosis using an integrative gene-set analysis approach.基因集活性工具箱(GAT):一种使用综合基因集分析方法进行基于微阵列的癌症诊断的平台。
J Bioinform Comput Biol. 2016 Aug;14(4):1650015. doi: 10.1142/S0219720016500153. Epub 2016 Mar 15.
2
Targeted therapy for non-small cell lung cancer: current standards and the promise of the future.非小细胞肺癌的靶向治疗:当前标准与未来前景。
Transl Lung Cancer Res. 2015 Feb;4(1):36-54. doi: 10.3978/j.issn.2218-6751.2014.05.01.
3
The BioGRID interaction database: 2015 update.
NBIA:一种基于网络的综合分析框架——应用于通路分析。
Sci Rep. 2020 Mar 6;10(1):4188. doi: 10.1038/s41598-020-60981-9.
4
pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks.pathfindR:一个通过活性子网全面识别组学数据中富集通路的R包。
Front Genet. 2019 Sep 25;10:858. doi: 10.3389/fgene.2019.00858. eCollection 2019.
5
Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures.将通路信息纳入特征选择以获得更好的基因特征表现。
Biomed Res Int. 2019 Apr 3;2019:2497509. doi: 10.1155/2019/2497509. eCollection 2019.
6
2016 update on APBioNet's annual international conference on bioinformatics (InCoB).APBioNet生物信息学年度国际会议(InCoB)2016年更新内容。
BMC Genomics. 2016 Dec 22;17(Suppl 13):1036. doi: 10.1186/s12864-016-3362-2.
生物通用互作数据库:2015年更新版
Nucleic Acids Res. 2015 Jan;43(Database issue):D470-8. doi: 10.1093/nar/gku1204. Epub 2014 Nov 26.
4
Network biomarkers reveal dysfunctional gene regulations during disease progression.网络生物标志物揭示疾病进展过程中基因调控的功能障碍。
FEBS J. 2013 Nov;280(22):5682-95. doi: 10.1111/febs.12536. Epub 2013 Oct 22.
5
Identifying protein interaction subnetworks by a bagging Markov random field-based method.基于集成马尔可夫随机场的方法鉴定蛋白质相互作用子网络。
Nucleic Acids Res. 2013 Jan;41(2):e42. doi: 10.1093/nar/gks951. Epub 2012 Nov 17.
6
Identifying cancer biomarkers by network-constrained support vector machines.通过网络约束支持向量机识别癌症生物标志物。
BMC Syst Biol. 2011 Oct 12;5:161. doi: 10.1186/1752-0509-5-161.
7
Hallmarks of cancer: the next generation.癌症的特征:下一代。
Cell. 2011 Mar 4;144(5):646-74. doi: 10.1016/j.cell.2011.02.013.
8
Enhancing biological relevance of a weighted gene co-expression network for functional module identification.增强加权基因共表达网络在功能模块识别中的生物学相关性。
J Bioinform Comput Biol. 2011 Feb;9(1):111-29. doi: 10.1142/s0219720011005252.
9
Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network.鉴定人类蛋白质相互作用网络中癌症的诊断子网标记物。
BMC Bioinformatics. 2010 Oct 7;11 Suppl 6(Suppl 6):S8. doi: 10.1186/1471-2105-11-S6-S8.
10
Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer.基因表达谱分析揭示非小细胞肺癌的新型生物标志物。
Int J Cancer. 2011 Jul 15;129(2):355-64. doi: 10.1002/ijc.25704. Epub 2010 Nov 28.