PANORAMA (Pan-cancer atlas of transcriptional dependence on DNA methylation and copy number aberrations) is a tool for exploring the associations between gene expression, DNA methylation and copy number in cancer. PANORAMA has two main modes: Pan-cancer, and single model. In the pan-cancer view, it is possible to explore the varying degrees of Expression-Copy number (E-C) and Expression-Methylation (E-M) association across tumor types. A combined model of gene expression as a function of both copy number and methylation (E-CM) is also available. Data in the pan-cancer view are comparable across tumor types. In the single model view, it is possible to perform detailed analyses of E-M and E-C associations in individual tumor types (or to analyze tissue-agnostic/pan-cancer associations). Analyses in the single model view are heavily customizable, and raw data can be downloaded. Model statistics for the single model view are, however, not comparable across tumor types. PANORAMA uses data from The Cancer Genome Atlas. For full methodological details, please read the manuscript associated with PANORAMA, available at bioRxiv.
Please cite any analyses using PANORAMA as follows:
Fougner, C., Höglander, E.K., Lien, T.G., Sørlie, T.S., Nord, S. & Lingjærde, O.C. A pan-cancer atlas of transcriptional dependence on DNA methylation and copy number aberrations. bioRxiv 2020.05.04.076901 (2020). doi: 10.1101/2020.05.04.076901
In the pan-cancer view, model statistics from the same gene should be considered comparable across tumor types. Across genes, model statistics related to copy number should also be considered comparable. Model statistics related to methylation are to an extent comparable across genes, but certain biases introduced by the methylation arrays are unavoidable (e.g. some genes may only be associated with a few CpG probes, while other genes may be associated with several hundred CpG probes). These biases are partially adressed by our PCA-based approach, but should nonetheless be kept in mind when interpreting results.
Methylation and copy number-based model statistics are not directly comparable as different methods are used to model them. Expression-methylation associations are modeled using five covariates (MethSigs) and generalized additive models, whereas expression-methylation associations are modeled using one covariate and linear regression.
The tumor types in The Cancer Genome Atlas have variable sample numbers. In order to make model statistics comparable across tumor types (in pan-cancer analyses), we randomly downsampled to one hundred tumors, modeled transcriptional associations, and repeated sampling/modeling one hundred times. The median values of these hundred runs were used as the best estimate of model statistics. As a result, all tumor types with fewer than one hundred samples, with all required data, were excluded from pan-cancer analyses. All tumor types (except for ovarian serous cystadenocarcinoma) can be analyzed in the single model view, however statistics here are not comparable across tumor types. Data from Illumina HumanMethylation450 arrays were only available for ten ovarian serous cystadenocarcinoma samples, and the tumor type was therefore fully excluded.
Results differ between the the pan-cancer view and the single model view due to the downsampling method described above. In the single model view, all available samples are used for modeling.
Gene expression (RNA-Seq, FPKM-UQ normalized RSEM data) is in the form of log2(ReadCount + 1). Copy number (based on Affymetrix SNP6.0 arrays, segments derived using circular binary segmentation, made gene centric using Ziggurat Deconstruction in GISTIC2.0) is in the form of log2(copynumber/2), meaning 0 represents a copy number of 2. Methylation data (from HumanMethylation450 arrays) are principal components (MethSigs) derived from all CpGs associated with a gene (by default defined as being within 50 000 bases from coding regions). See the associated article for further details regarding methylation data. MethSigs are directionless and can be mirrored.
Only protein coding genes with data available for all three data levels (gene expression, methylation and copy number) are included in PANORAMA. In the pan-cancer view, only genes associated with at least five CpGs were included, but genes with fewer than five associated CpGs can be analyzed in the single model view. Genes with variation in expression below a certain level are also excluded (see associated manuscript).
Abbreviations for tumor types can be found here .
The shaded line represents the 95% confidence interval.
Raw data from individual probes are not provided if using the standard window size (50 000 bases), as MethSigs are pre-computed for improved performance. Raw probe-level data are however provided when using custom window sizes. To find out which CpGs are included in an analysis using the standard window size, simply enter the gene location (provided above the plot) -/+ 50 000 bases added to the start and end positions.
The source code for the web application can be found here. The source code for the analyses underlying results shown in the pan-cancer view can be found here.
The full PANORAMA dataset can be downloaded from here. Adjusted R2 values used for expression-copy number associations are found in the column lm_logE_logC_r2_adj. Adjusted R2 values used for expression-methylation associations are found in the column gam_logE_M_r2_adj. Adjusted R2 values used for the combined model are found in the column methGAM_cnaLM_logE_logC_M_r2_adj.
The data underlying all analyses are from The Cancer Genome Atlas pan-cancer dataset.
The code used for chromosome ideograms is based on code from the copynumber R package by Nilsen et al.