Azuwar Mohd Amin, Muhammad Nor Azlan Nor, Afiqah-Aleng Nor, Ab Mutalib Nurul-Syakima, Md Yusof Najwa Farhah, Mohd Yunos Ryia Illani, Ishak Muhiddin, Saidin Sazuita, Rose Isa Mohamed, Sagap Ismail, Mazlan Luqman, Mohd Azman Zairul Azwan, Mazlan Musalmah, Ab Rahim Sharaniza, Wan Ngah Wan Zurinah, Nathan Sheila, Hashim Nurul Azmir Amir, Mohamed-Hussein Zeti-Azura, Jamal Rahman
Center for Bioinformatics Research, Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Malaysia.
Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus 21030, Malaysia.
Life (Basel). 2022 May 24;12(6):772. doi: 10.3390/life12060772.
Colorectal cancer (CRC) ranks second among the most commonly occurring cancers in Malaysia, and unfortunately, its pathobiology remains unknown. CRC pathobiology can be understood in detail with the implementation of omics technology that is able to generate vast amounts of molecular data. The generation of omics data has introduced a new challenge for data organization. Therefore, a knowledge-based repository, namely TCGA-My, was developed to systematically store and organize CRC omics data for Malaysian patients. TCGA-My stores the genome and metabolome of Malaysian CRC patients. The genome and metabolome datasets were organized using a Python module, pandas. The variants and metabolites were first annotated with their biological information using gene ontologies (GOs) vocabulary. The TCGA-My relational database was then built using HeidiSQL PorTable 9.4.0.512, and Laravel was used to design the web interface. Currently, TCGA-My stores 1,517,841 variants, 23,695 genes, and 167,451 metabolites from the samples of 50 CRC patients. Data entries can be accessed via search and browse menus. TCGA-My aims to offer effective and systematic omics data management, allowing it to become the main resource for Malaysian CRC research, particularly in the context of biomarker identification for precision medicine.
结直肠癌(CRC)在马来西亚最常见的癌症中排名第二,不幸的是,其病理生物学仍然未知。通过能够生成大量分子数据的组学技术,可以详细了解CRC的病理生物学。组学数据的生成给数据组织带来了新的挑战。因此,开发了一个基于知识的存储库,即TCGA-My,用于系统地存储和组织马来西亚患者的CRC组学数据。TCGA-My存储了马来西亚CRC患者的基因组和代谢组。基因组和代谢组数据集使用Python模块pandas进行组织。首先使用基因本体(GO)词汇对变体和代谢物进行生物学信息注释。然后使用HeidiSQL PorTable 9.4.0.512构建TCGA-My关系数据库,并使用Laravel设计网页界面。目前,TCGA-My存储了来自50名CRC患者样本的1,517,841个变体、23,695个基因和167,451个代谢物。数据条目可通过搜索和浏览菜单访问。TCGA-My旨在提供有效和系统的组学数据管理,使其成为马来西亚CRC研究的主要资源,特别是在精准医学的生物标志物识别方面。