Departments of Cancer Biology and of Molecular and Cellular Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA.
Methods Mol Biol. 2022;2444:1-13. doi: 10.1007/978-1-0716-2063-2_1.
The massive amount of experimental DNA and RNA sequence information provides an encyclopedia for cell biology that requires computational tools for efficient interpretation. The ability to write and apply simple computing scripts propels the investigator beyond the boundaries of online analysis tools to more broadly interrogate laboratory experimental data and to integrate them with all available datasets to test and challenge hypotheses. Here we describe robust prototypic bash and C++ scripts with metrics and methods for validation that we have made publicly available to address the roles of non-B DNA-forming motifs in eliciting genetic instability and to query The Cancer Genome Atlas. Importantly, the methods presented provide practical data interpretation tools to examine fundamental relationships and to enable insights and correlations between alterations in gene expression patterns and patient outcome. The exemplary source codes described are simple and can be efficiently modified, elaborated, and applied to other relationships and areas of investigation.
大量的实验 DNA 和 RNA 序列信息为细胞生物学提供了一个百科全书,需要计算工具来进行有效的解释。编写和应用简单的计算脚本的能力使研究人员超越了在线分析工具的范围,更广泛地探究实验室实验数据,并将其与所有可用数据集集成,以测试和挑战假设。在这里,我们描述了稳健的原型 bash 和 C++脚本,以及用于验证的指标和方法,我们已经将这些脚本公开提供,以解决非 B DNA 形成基序在引发遗传不稳定性方面的作用,并查询癌症基因组图谱。重要的是,所提出的方法提供了实用的数据解释工具,以检查基本关系,并能够洞察和关联基因表达模式的改变与患者预后之间的关系。所描述的示例源代码简单,可以有效地修改、阐述,并应用于其他关系和研究领域。