Information Communication Technology Centre, Bangabandhu Sheikh Mujibur Rahman Maritime University, Pallabi, Mirpur-12, Dhaka, Bangladesh.
Department of CSE, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.
BMC Bioinformatics. 2021 Sep 11;22(1):435. doi: 10.1186/s12859-021-04341-y.
Proteins are integral part of all living beings, which are building blocks of many amino acids. To be functionally active, amino acids chain folds up in a complex way to give each protein a unique 3D shape, where a minor error may cause misfolded structure. Genetic disorder diseases i.e. Alzheimer, Parkinson, etc. arise due to misfolding in protein sequences. Thus, identifying patterns of amino acids is important for inferring protein associated genetic diseases. Recent studies in predicting amino acids patterns focused on only simple protein misfolded disease i.e. Chromaffin Tumor, by association rule mining. However, more complex diseases are yet to be attempted. Moreover, association rules obtained by these studies were not verified by usefulness measuring tools.
In this work, we analyzed protein sequences associated with complex protein misfolded diseases (i.e. Sickle Cell Anemia, Breast Cancer, Cystic Fibrosis, Nephrogenic Diabetes Insipidus, and Retinitis Pigmentosa 4) by association rule mining technique and objective interestingness measuring tools. Experimental results show the effectiveness of our method.
Adopting quantitative experimental methods, this work can form more reliable, useful and strong association rules i. e. dominating patterns of amino acid of complex protein misfolded diseases. Thus, in addition to usual applications, the identified patterns can be more useful in discovering medicines for protein misfolded diseases and thereby may open up new opportunities in medical science to handle genetic disorder diseases.
蛋白质是所有生物的组成部分,是许多氨基酸的构建块。为了具有功能活性,氨基酸链以复杂的方式折叠,赋予每种蛋白质独特的 3D 形状,其中一个小错误可能导致错误折叠的结构。遗传紊乱疾病,如阿尔茨海默病、帕金森病等,是由于蛋白质序列的错误折叠引起的。因此,识别氨基酸模式对于推断与蛋白质相关的遗传疾病很重要。最近的研究集中在通过关联规则挖掘来预测简单的蛋白质错误折叠疾病(如嗜铬细胞瘤)的氨基酸模式。然而,更复杂的疾病仍有待尝试。此外,这些研究获得的关联规则没有通过有用性测量工具进行验证。
在这项工作中,我们通过关联规则挖掘技术和客观有趣性测量工具分析了与复杂蛋白质错误折叠疾病(如镰状细胞贫血、乳腺癌、囊性纤维化、肾性尿崩症和色素性视网膜炎 4)相关的蛋白质序列。实验结果表明了我们方法的有效性。
通过采用定量实验方法,本工作可以形成更可靠、更有用和更强的关联规则,即复杂蛋白质错误折叠疾病的氨基酸主导模式。因此,除了通常的应用外,所识别的模式在发现治疗蛋白质错误折叠疾病的药物方面可能更有用,从而为医学科学处理遗传紊乱疾病开辟新的机会。