Suppr超能文献

从不完整训练数据中发现功能位点:以核酸结合蛋白为例的研究

Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins.

作者信息

Wang Wenchuan, Langlois Robert, Langlois Marina, Genchev Georgi Z, Wang Xiaolei, Lu Hui

机构信息

SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas.

Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States.

出版信息

Front Genet. 2019 Aug 30;10:729. doi: 10.3389/fgene.2019.00729. eCollection 2019.

Abstract

Function annotation efforts provide a foundation to our understanding of cellular processes and the functioning of the living cell. This motivates high-throughput computational methods to characterize new protein members of a particular function. Research work has focused on discriminative machine-learning methods, which promise to make efficient, predictions of protein function. Furthermore, available function annotation exists predominantly for individual proteins rather than residues of which only a subset is necessary for the conveyance of a particular function. This limits discriminative approaches to predicting functions for which there is sufficient residue-level annotation, e.g., identification of DNA-binding proteins or where an excellent global representation can be divined. Complete understanding of the various functions of proteins requires discovery and functional annotation at the residue level. Herein, we cast this problem into the setting of multiple-instance learning, which only requires knowledge of the protein's function yet identifies functionally relevant residues and need not rely on homology. We developed a new multiple-instance leaning algorithm derived from AdaBoost and benchmarked this algorithm against two well-studied protein function prediction tasks: annotating proteins that bind DNA and RNA. This algorithm outperforms certain previous approaches in annotating protein function while identifying functionally relevant residues involved in binding both DNA and RNA, and on one protein-DNA benchmark, it achieves near perfect classification.

摘要

功能注释工作为我们理解细胞过程和活细胞的功能提供了基础。这推动了高通量计算方法来表征具有特定功能的新蛋白质成员。研究工作主要集中在有判别力的机器学习方法上,这些方法有望对蛋白质功能做出高效预测。此外,现有的功能注释主要针对单个蛋白质,而非残基,而对于特定功能的传递而言,只有一部分残基是必需的。这限制了有判别力的方法用于预测那些有足够残基水平注释的功能,例如识别DNA结合蛋白,或者可以推断出出色全局表示的情况。对蛋白质各种功能的全面理解需要在残基水平上进行发现和功能注释。在此,我们将这个问题转化为多实例学习的框架,该框架只需要知道蛋白质的功能,就能识别出功能相关的残基,且无需依赖同源性。我们开发了一种源自AdaBoost的新多实例学习算法,并针对两项研究充分的蛋白质功能预测任务对该算法进行了基准测试:注释与DNA和RNA结合的蛋白质。该算法在注释蛋白质功能的同时,能识别出参与DNA和RNA结合的功能相关残基,优于某些先前的方法,并且在一个蛋白质-DNA基准测试中,它实现了近乎完美的分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abe2/6729729/3a5d4c3dc810/fgene-10-00729-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验