Linking Signaling Pathways to Transcriptional Programs in Breast Cancer

Hatice Ulku Osmanbeyoglu1, Raphael Pelossof1, Jacqueline F. Bromberg2 and Christina S. Leslie1
1. Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, NY
2. Department of Medicine, Memorial Sloan-Kettering Cancer Center and Weill Cornell Medical College, New York, NY

Overview of study

Cancer cells acquire genetic and epigenetic alterations that often lead to dysregulation of oncogenic signal transduction pathways, which in turn alters downstream transcriptional programs. Numerous methods attempt to deduce aberrant signaling pathways in tumors from mRNA data alone, but these pathway analysis approaches remain qualitative and imprecise. In this study, we present a statistical method to link upstream signaling to downstream transcriptional response by exploiting reverse phase protein array (RPPA) and mRNA expression data in The Cancer Genome Atlas (TCGA) breast cancer project. Formally, we use an algorithm called affinity regression to learn an interaction matrix between upstream signal transduction proteins and downstream transcription factors (TFs) that explains target gene expression. The trained model can then predict the TF activity given a tumor sample’s protein expression profile or infer the signaling protein activity given a tumor sample’s gene expression profile. Breast cancers are comprised of molecularly distinct subtypes that respond differently to pathway-targeted therapies. We trained our model on the TCGA breast cancer data set and identified subtype-specific and common TF regulators of gene expression. We then used the trained tumor model to predict signaling protein activity in a panel of breast cancer cell lines for which gene expression and drug response data was available. Correlations between inferred protein activities and drug responses in breast cancer cell lines grouped drugs that are clinically used in combination. Finally, inferred protein activity predicted clinical outcome within the METABRIC Luminal A cohort, identifying high- and low-risk patient groups within this heterogeneous subtype.

Code and Data sets

  • Here will be the R code for training and testing affinity models on TCGA data.
  • Elastic net protein-drug associations are available for download here.

Related papers

Also see our paper Learning Trancriptional Factor to DNA binding interactions using affinity regression.