Rostami Zahra, Fooshee David, Carlsson Gunnar, Subramaniam Shankar
Department of Computer Science and EngineeringUniversity of California San Diego CA 92093 USA.
BluelightAI Inc. Stanford CA 94305 USA.
IEEE Open J Eng Med Biol. 2025 May 21;6:465-471. doi: 10.1109/OJEMB.2025.3558670. eCollection 2025.
High-throughput biological data, with its vast complexity and higher dimensions, continues to require innovative analytic methodologies for meaningful exploration. Most methods for reducing data dimensions overlook the shape and topology of data, even though these are vital components of the data structure and complexity. This study leverages topological data analysis (TDA) and shows, using breast cancer (BC) gene expression data as an illustrative example, the power of including the shape of data. In addition to delineating the known subtypes of BC, TDA identifies a new subtype within luminal B cancer along with the features that define the subtype. The final outcome is shown via three-dimensional (3D) scatter plots which demonstrate how the underlying patterns that we identified through TDA map to 3D space. The new subtype, obtained unsupervised and validated by prior knowledge, demonstrates the power of embedding the topology and shape of data in the analyses.
高通量生物数据具有极大的复杂性和更高的维度,持续需要创新的分析方法以进行有意义的探索。尽管数据的形状和拓扑结构是数据结构和复杂性的重要组成部分,但大多数数据降维方法都忽略了这些因素。本研究利用拓扑数据分析(TDA),并以乳腺癌(BC)基因表达数据为例,展示了纳入数据形状的作用。除了描绘已知的BC亚型外,TDA还在管腔B型癌中识别出一种新的亚型以及定义该亚型的特征。最终结果通过三维(3D)散点图展示,这些散点图展示了我们通过TDA识别出的潜在模式如何映射到3D空间。通过无监督方式获得并经先验知识验证的新亚型,证明了在分析中嵌入数据拓扑结构和形状的作用。