• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习的蛋白质复合物建模的文本挖掘。

Text mining for modeling of protein complexes enhanced by machine learning.

机构信息

Computational Biology Program.

Department of Molecular Biosciences, The University of Kansas, Lawrence, KS 66045, USA.

出版信息

Bioinformatics. 2021 May 1;37(4):497-505. doi: 10.1093/bioinformatics/btaa823.

DOI:10.1093/bioinformatics/btaa823
PMID:32960948
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8088328/
Abstract

MOTIVATION

Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins.

RESULTS

We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles.

AVAILABILITYAND IMPLEMENTATION

The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质-蛋白质复合物结构建模(蛋白质对接)的过程会产生许多需要进一步分析和评分的模型。评分可以基于对复合物结构的独立确定的约束,例如对蛋白质相互作用至关重要的氨基酸的知识。以前,我们表明,对关于蛋白质-蛋白质相互作用研究的免费可获取 PubMed 摘要中的残基进行文本挖掘,可能会生成这样的约束。然而,由于 spotted 残基的后处理缺失,约束的可用性降低了,因为大量残基与特定蛋白质的结合不相关。

结果

我们通过两种机器学习方法(深度递归神经网络(DRNN)和支持向量机(SVM)模型),探索了通过不同的训练/测试方案对无关残基进行过滤。结果表明,在对 PMC-OA 全文文章进行训练并应用于 PubMed 摘要中 spotted 残基的分类(界面或非界面)时,DRNN 模型优于 SVM 模型。当在全文文章或摘要上进行训练和测试时,这些模型的性能相似。因此,在这种情况下,没有必要利用计算成本高昂的 DRNN 方法,该方法在训练阶段尤其昂贵。原因是 SVM 的成功通常取决于训练集和测试集中数据/文本模式的相似性,而摘要中的句子结构通常与全文文章中的不同。

可用性和实现

本研究生成的代码和数据集可在 https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04 获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Text mining for modeling of protein complexes enhanced by machine learning.基于机器学习的蛋白质复合物建模的文本挖掘。
Bioinformatics. 2021 May 1;37(4):497-505. doi: 10.1093/bioinformatics/btaa823.
2
Natural language processing in text mining for structural modeling of protein complexes.自然语言处理在文本挖掘中用于蛋白质复合物的结构建模。
BMC Bioinformatics. 2018 Mar 5;19(1):84. doi: 10.1186/s12859-018-2079-4.
3
Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测:机器学习在 1 型糖尿病中的应用。
Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.
4
Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews.将文本挖掘和机器学习算法与系统文献综述中的研究选择最佳实践相结合。
Syst Rev. 2020 Dec 13;9(1):293. doi: 10.1186/s13643-020-01520-5.
5
Text Mining for Protein Docking.用于蛋白质对接的文本挖掘
PLoS Comput Biol. 2015 Dec 9;11(12):e1004630. doi: 10.1371/journal.pcbi.1004630. eCollection 2015 Dec.
6
Text Mining and Machine Learning Protocol for Extracting Human-Related Protein Phosphorylation Information from PubMed.从 PubMed 中提取与人相关的蛋白质磷酸化信息的文本挖掘和机器学习协议。
Methods Mol Biol. 2022;2496:159-177. doi: 10.1007/978-1-0716-2305-3_9.
7
A Text Mining and Machine Learning Protocol for Extracting Posttranslational Modifications of Proteins from PubMed: A Special Focus on Glycosylation, Acetylation, Methylation, Hydroxylation, and Ubiquitination.一种从 PubMed 中提取蛋白质翻译后修饰的文本挖掘和机器学习协议:特别关注糖基化、乙酰化、甲基化、羟化和泛素化。
Methods Mol Biol. 2022;2496:179-202. doi: 10.1007/978-1-0716-2305-3_10.
8
Toward automatic evaluation of medical abstracts: The current value of sentiment analysis and machine learning for classification of the importance of PubMed abstracts of randomized trials for stroke.迈向医学摘要的自动评估:情感分析和机器学习在对中风随机试验的PubMed摘要重要性分类方面的当前价值。
J Stroke Cerebrovasc Dis. 2020 Sep;29(9):105042. doi: 10.1016/j.jstrokecerebrovasdis.2020.105042. Epub 2020 Jun 23.
9
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
10
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.

引用本文的文献

1
Pan-Cancer Analysis of PGAM1 and Its Experimental Validation in Uveal Melanoma Progression.PGAM1的泛癌分析及其在葡萄膜黑色素瘤进展中的实验验证
J Cancer. 2024 Feb 17;15(7):2074-2094. doi: 10.7150/jca.93398. eCollection 2024.
2
Integrative Analysis of the Role of in Human Pan-Cancer.关于[具体内容]在人类泛癌中作用的综合分析。(注:原文中“of”后面缺少具体内容)
Curr Issues Mol Biol. 2023 Nov 29;45(12):9606-9633. doi: 10.3390/cimb45120601.
3
Natural product drug discovery in the artificial intelligence era.人工智能时代的天然产物药物发现
Chem Sci. 2021 Dec 13;13(6):1526-1546. doi: 10.1039/d1sc04471k. eCollection 2022 Feb 9.

本文引用的文献

1
New advances in extracting and learning from protein-protein interactions within unstructured biomedical text data.从非结构化生物医学文本数据中提取蛋白质-蛋白质相互作用并从中学习的新进展。
Emerg Top Life Sci. 2019 Aug 16;3(4):357-369. doi: 10.1042/ETLS20190003.
2
Automated Extraction and Visualization of Protein-Protein Interaction Networks and Beyond: A Text-Mining Protocol.自动化提取和可视化蛋白质-蛋白质相互作用网络及其他:文本挖掘方案。
Methods Mol Biol. 2020;2074:13-34. doi: 10.1007/978-1-4939-9873-9_2.
3
ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins.ProtFus:一种全面的融合蛋白蛋白质相互作用特征分析方法。
PLoS Comput Biol. 2019 Aug 22;15(8):e1007239. doi: 10.1371/journal.pcbi.1007239. eCollection 2019 Aug.
4
An integration of deep learning with feature embedding for protein-protein interaction prediction.用于蛋白质-蛋白质相互作用预测的深度学习与特征嵌入的集成。
PeerJ. 2019 Jun 17;7:e7126. doi: 10.7717/peerj.7126. eCollection 2019.
5
Computational Feasibility of an Exhaustive Search of Side-Chain Conformations in Protein-Protein Docking.蛋白质-蛋白质对接中侧链构象穷举搜索的计算可行性。
J Comput Chem. 2018 Sep 15;39(24):2012-2021. doi: 10.1002/jcc.25381. Epub 2018 Sep 18.
6
Automatic extraction of protein-protein interactions using grammatical relationship graph.基于语法关系图自动提取蛋白质相互作用。
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):42. doi: 10.1186/s12911-018-0628-4.
7
Opportunities and obstacles for deep learning in biology and medicine.深度学习在生物学和医学中的机遇与挑战。
J R Soc Interface. 2018 Apr;15(141). doi: 10.1098/rsif.2017.0387.
8
Natural language processing in text mining for structural modeling of protein complexes.自然语言处理在文本挖掘中用于蛋白质复合物的结构建模。
BMC Bioinformatics. 2018 Mar 5;19(1):84. doi: 10.1186/s12859-018-2079-4.
9
The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles.开放获取(OA)的现状:对开放获取文章的患病率和影响的大规模分析。
PeerJ. 2018 Feb 13;6:e4375. doi: 10.7717/peerj.4375. eCollection 2018.
10
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.全面且定量地比较了 1500 万篇全文文章及其相应摘要中的文本挖掘。
PLoS Comput Biol. 2018 Feb 15;14(2):e1005962. doi: 10.1371/journal.pcbi.1005962. eCollection 2018 Feb.