Association of expression of epigenetic molecular factors with DNA methylation and sensitivity to chemotherapeutic agents in cancer cell lines

Background Altered DNA methylation patterns play important roles in cancer development and progression. We examined whether expression levels of genes directly or indirectly involved in DNA methylation and demethylation may be associated with response of cancer cell lines to chemotherapy treatment with a variety of antitumor agents. Results We analyzed 72 genes encoding epigenetic factors directly or indirectly involved in DNA methylation and demethylation processes. We examined association of their pretreatment expression levels with methylation beta-values of individual DNA methylation probes, DNA methylation averaged within gene regions, and average epigenome-wide methylation levels. We analyzed data from 645 cancer cell lines and 23 cancer types from the Cancer Cell Line Encyclopedia and Genomics of Drug Sensitivity in Cancer datasets. We observed numerous correlations between expression of genes encoding epigenetic factors and response to chemotherapeutic agents. Expression of genes encoding a variety of epigenetic factors, including KDM2B, DNMT1, EHMT2, SETDB1, EZH2, APOBEC3G, and other genes, was correlated with response to multiple agents. DNA methylation of numerous target probes and gene regions was associated with expression of multiple genes encoding epigenetic factors, underscoring complex regulation of epigenome methylation by multiple intersecting molecular pathways. The genes whose expression was associated with methylation of multiple epigenome targets encode DNA methyltransferases, TET DNA methylcytosine dioxygenases, the methylated DNA-binding protein ZBTB38, KDM2B, SETDB1, and other molecular factors which are involved in diverse epigenetic processes affecting DNA methylation. While baseline DNA methylation of numerous epigenome targets was correlated with cell line response to antitumor agents, the complex relationships between the overlapping effects of each epigenetic factor on methylation of specific targets and the importance of such influences in tumor response to individual agents require further investigation. Conclusions Expression of multiple genes encoding epigenetic factors is associated with drug response and with DNA methylation of numerous epigenome targets that may affect response to therapeutic agents. Our findings suggest complex and interconnected pathways regulating DNA methylation in the epigenome, which may both directly and indirectly affect response to chemotherapy.

Background Cancer cells acquire multiple epigenomic alterations, including aberrant DNA methylation and DNA hydroxymethylation of genes and genome regions, loss or gain of imprinting and allele switching of imprinted loci, and global DNA hypomethylation [1][2][3][4][5][6]. Epigenetic changes in malignant cells result in transcriptional and posttranscriptional rewiring, influencing cell cycle, growth, and proliferation. Epigenetic dysregulation in tumors leads to silencing of tumor suppressor genes and of genes involved in DNA repair, activates oncogene expression, alters gene function, affects transcriptional regulatory networks, and increases genome instability [1,4,[6][7][8][9][10][11]. Global DNA hypomethylation of malignant cells has been associated with tumor evasion of the immune response [12].
A number of epigenetic factors have complex and intertwined roles affecting DNA methylation. There is an extensive cross-talk among the DNA methylation, demethylation, and histone modification pathways in germ line, embryonic stem, normal somatic, and malignant cells [26,27,31,35,[40][41][42]. DNA methylation is influenced by histone modifications, and histone methylation and acetylation marks directly affect DNMT localization, binding, and activities [27,35,40,43]. Specific GMD roles in DNA methylation and demethylation and examples of their interactions are presented in Additional file 1: Table S1 and accompanying text. GMD components may directly or indirectly affect sensitivity of cancer cells to treatment. DNMTs are directly inhibited by DNA hypomethylating agents, while other antitumor agents target additional GMD products [1,7,[44][45][46][47][48][49]. The Hsp90 inhibitor 17-DMAG diminishes the binding of DNMT1 and of the histone methyltransferase EZH2 to Hsp90, attenuates the interaction between DNMT1 and EZH2, and mediates the depletion of DNMT1 and EZH2 [50]. HDAC inhibitors (HDACi) affect DNA methylation through a variety of mechanisms. Vorinostat downregulates transcription of DNMT1 and DNMT3B and changes DNA methylation of TERT and DLC1 [51][52][53]. Panobinostat depletes protein levels of DNMT1 and EZH2 and disrupts DNMT1 interaction with EZH2 and the polycomb repressive complex 2 (PRC2) [50]. Trichostatin A downregulates gene and protein expression of DNMT1 and induces global DNA hypomethylation [54]. Belinostat reduces global DNA methylation and depletes protein levels of the PRC2 subunits EZH2 and SUZ12 [55].
Due to the significance of epigenetic factors in regulation of DNA methylation, it is important to investigate how GMD expression may directly or indirectly affect tumor response to treatment (Fig. 1). We used cancer cell line data from two public resources, the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer [67][68][69][70][71][72], to examine associations of drug response with 72 GMDs (Additional file 1: Table S1) that are directly or indirectly involved in DNA methylation or demethylation. We investigated correlations of their pretreatment expression with methylation of their putative genome targets and with cancer cell line response to a variety of antitumor agents with different mechanisms of action.

Selection of candidate genes involved in DNA methylation and demethylation
Additional file 1: Table S1 provides the list of the 72 GMDs analyzed in this study. Their products are directly or indirectly involved in DNA methylation or demethylation in human tissues. Information about their biological roles in DNA methylation or demethylation was obtained from the biomedical literature and from GeneCards [73] and the Online Mendelian Inheritance in Men (OMIM) [74].

Drug response data
To examine the relationship between pretreatment GMD expression and tumor response to antitumor agents, we used gene expression, DNA methylation, and drug response data for 645 cell lines, the identity of which was matched between the Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer datasets [67][68][69][70][71][72] (Additional file 2: Table S2). The IC50 measures of drug response, representing the total drug concentration that reduced cell activity by 50%, were available for 24 agents from CCLE [67,68,72]. Additional IC50 values for 251 agents were obtained from the Genomics of Drug Sensitivity in Cancer portal [69,71,75]. Below were refer to these drug response measures as GDSC measures. After our analysis was completed, GDSC released a second batch of drug response values, referring in their release to the initial dataset as GDSC1 and the second dataset as GDSC2 [75]. All GDSC data analyzed in our study were from the GDSC1 dataset.
All CCLE and GDSC drug sensitivity values were transformed to the log 10 (IC50) scale. Cell line identities in the CCLE and GDSC datasets were verified using Cellosaurus [76]. Response measures for 11 agents which were present in both CCLE and GDSC data were analyzed separately, without combining the CCLE and GDSC measures. For those agents in the GDSC dataset that had duplicate measurements [71], we used the combined average of their drug response measures from separate experiments. The resulting dataset included 275 CCLE and GDSC drug response measures for 255 distinct antitumor agents. The concordance of drug response measures between the CCLE and GDSC datasets has been reported previously [77][78][79]. Information about mechanisms of action of the agents was collected from the CCLE and GDSC portals, their accompanying publications [67,68,71,75] and biomedical literature.

Gene expression data retrieval
For the RNA-seq data used in this project, RPKM gene expression values were downloaded from the CCLE portal of the Broad Institute [72,80]. RNA sample library preparation using Illumina TruSeq RNA Sample Preparation protocol, RNA-sequencing using Illumina HiSeq 2000 and HiSeq 2500, and initial data processing was previously described by the CCLE project [81].

DNA methylation data filtering
Cell line methylation data for 485,512 probes, generated by the GDSC project [71] using Illumina Infinium HumanMethylation450 (450 K) BeadChip array (Illumina, Inc.), were downloaded from NCBI GEO [82]. Methylation probe beta-values for individual cell lines Direct mechanism affecƟng drug response Fig. 1 Possible hypothetical mechanisms by which GMD expression may directly or indirectly affect response of cancer cells drugs. GMDs may directly influence drug response through a variety of mechanisms. Among indirect influences of GMDs on drug response examined in this study, we focused on the effect of GMD expression on DNA methylation of epigenome targets with detection p-values ≥ 10 -3 and 340 entire probes with median detection p-values ≥ 10 -6 were excluded. In addition, 60,332 probes overlapping with single nucleotide polymorphisms were filtered out based on the probe masking recommendations for hg19 (GRCh37) [83,84]. The final methylation dataset used in analysis had methylation beta-values for 424,840 probes that passed all filtering. Chromosomal regions (cytobands) were identified according to the UCSC genome annotation for the hg19 (GRCh37) human genome assembly based on the probe coordinates in the Illumina Infinium HumanMethylation 450 K BeadChip annotation.

Calculation of gene region-averaged methylation values
In order to compute gene region-averaged methylation beta-values from individual probe measures, we developed an R program (available upon request) which followed the algorithm developed previously by the authors of the IMA software [85]. We recently reported a version of our software adapted for the Illumina Infinium MethylationEPIC BeadChip array [86]. For this study, we used a similar version which we adapted for the Illumina Infinium HumanMethylation450 BeadChip array. We used the Illumina Infinium HumanMethylation450 BeadChip annotation of each probe [87] according to the UCSC genome browser data to compute gene regionaveraged methylation for 6 gene regions: TSS1500 (200-1500 bases upstream of the transcriptional start site, or TSS), TSS200 (0-200 bases upstream of the TSS), 5′UTR (within the 5′ untranslated region, between the TSS and the ATG start site), 1st exon, gene body (between the ATG start site and the stop codon), and 3′UTR (within the 3′ untranslated region, between the stop codon and poly A signal). The resulting methylation values were computed for 93,591 regions in 20,643 genes and ncRNA, with each gene represented by up to 6 regions. Additional file 20: Fig. S1 shows the distribution of methylation betavalues among 424,840 individual probes, the combined distribution among 93,591 gene regions, and separate distributions for each gene region category (TSS1500, TSS200, 5′UTR, 1st exon, gene body, and 3′ UTR) in 645 cell lines.

Association analysis of GMD expression, epigenome-wide methylation of individual probes and gene regions, and drug response
To examine possible direct influences of GMD expression on drug response (Fig. 1), we analyzed Spearman correlation between RPKM expression measures of 72 GMDs listed in Additional file 1: Table S1 and log(IC50) of 255 antitumor agents. Significance of the associations was evaluated using the Benjamini-Hochberg adjustment procedure for false discovery rate (FDR) [17], while accounting for 255 agents and 72 genes. We identified the associations between GMD expression and log(IC50) which were both statistically significant (satisfying FDR adjusted p < 0.05) and strong (satisfying the absolute value of Spearman correlation coefficient |ρ| > 0.5) (Fig. 2a). Here and below, we refer to the FDR adjusted p-values as p FDR . We discuss the strength of statistically significant associations based on the absolute value of their Spearman correlation coefficient |ρ|.
All association analyses were performed in the combined dataset of different cancer categories (pancancer analysis including 645 cell lines), and also separately within each of the 23 individual cancer categories with at least 10 cell lines that had both methylation and expression data (Additional file 2: Table S2). The initial information about tumor sites was obtained from GDSC, CCLE, and Cellosaurus [67-69, 71, 72, 75, 76]. While many cancer categories used in our analysis were based on the Cancer Genome Atlas (TCGA) definitions, some cancer types from the same organ were grouped into broader categories in order to allow for an inclusion of a broader range of similar cell lines than those defined by TCGA. Several additional categories not presented in TCGA (e.g., small cell lung cancer, neuroblastoma, and others) were also analyzed (Additional file 2: Table S2). In the analysis stratified by individual cancer categories with ≥ 10 cell lines, we accounted for 23 cancer types in the FDR adjustment.
We also explored potential indirect mechanisms which may mediate the associations between GMD expression and drug response. We used Spearman correlation to identify the strongest significant associations between expression of the 72 GMDs and methylation of their epigenome targets by using methylation beta-values of 424,840 individual probes and 93,591 gene region methylation values averaged among the probes within each region (Fig. 2b). Among individual probes, in the combined pancancer analysis of all tumor types we searched for associations between GMD expression and methylation beta values with p < 10 −6 72 , i.e., p < 1.389 × 10 -8 , based on published recommendations [88] for the p-value threshold that would be appropriate for finding single gene associations with methylation probes of the HumanMethylation 450 K BeadChip array. We adjusted it by the number of GMDs for which the associations were examined. When analyzing associations between GMD expression and individual probes within each of the 23 cancer categories, we further adjusted this threshold by using p < 1.38910 −8 23 = 6.039 × 10 -10 . In the pancancer correlation analysis between GMD expression and gene region methylation, we used the FDR adjustment that accounted for 72 GMDs and 93,591 gene regions. In correlation analyses between GMD expression and gene region methylation stratified by cancer types, we also accounted for the 23 cancer types. In addition to using the p-value threshold, we focused on the strongest correlations that had the absolute value of Spearman |ρ| > 0.5. We made a distinction between the cis-correlations of expression of a GMD with methylation of its own probes (which could suggest the regulation of expression of that GMD by its methylation, or a possible copy number variation of that GMD which may affect both its methylation and expression measures) and the trans-correlations of each GMD with the probes located in other genes, according to the UCSC annotation of the HumanMethylation 450 K BeadChip array.
After identifying putative epigenome targets that were strongly and significantly correlated with GMD expression (Fig. 2b), we examined associations of DNA methylation of these epigenome targets with drug response. Spearman correlation analysis of methylation measures with log(IC50) of each of the 255 agents was performed for methylation beta-values of the target methylation probes and gene regions that had been strongly and significantly associated with GMD expression. Significance of correlation of methylation of individual target probes or gene regions with GMD expression and with log(IC50) was evaluated using the FDR adjustment, accounting for 255 agents and the number of target methylation probes and gene regions. Analysis within cancer categories also accounted for 23 cancer types. We focused on the strongest significant correlations with |ρ| > 0.5. If the number of such correlations was small, we  Fig. 2 Overall design of the study. a Analysis of direct associations between GMD expression and drug response. b A two-step approach to identify possible indirect GMD effects on drug response. Epigenetic targets significantly associated with GMD expression were identified first, and then, the correlation of methylation of these significant targets with drug response was analyzed. c. Analysis of average DNA methylation values. In each analysis, we examined associations in all cancer categories combined (pancancer analysis) and in individual cancer types with ≥ 10 cell lines. GMD expression data included RNA-seq RPKM values for 72 GMDs. Epigenome-wide DNA methylation data included beta-values for 424,840 individual probes and gene region-averaged values for 93,591gene regions from 20,643 genes and noncoding RNA. Average epigenome-wide methylation values were computed as a mean of beta-values for 424,840 probes which passed the QC and filtering. Drug response measures consisted of 275 log(IC50) values for 255 anticancer agents obtained from CCLE and GDSC datasets. The criteria for identifying significant strong associations are provided below each diagram. p FDR , p-value adjusted for false discovery rate according to Benjamini-Hochberg procedure provided an additional discussion of more modest significant correlations with |ρ| > 0. 4. In addition to the analysis of individual probes and gene regions, we also examined the association of GMD expression with epigenome-averaged DNA methylation and of epigenome-averaged DNA methylation with log(IC50) of antitumor agents (Fig. 2c). Epigenome-wide averaged DNA methylation was computed as a mean of beta-values among 424,840 methylation probes which passed the quality control (QC) and probe filtering. The resulting p-values were FDR adjusted for multiple testing. Separate analyses of average epigenome-wide DNA methylation were performed in the pancancer data and within 23 individual cancer categories with ≥ 10 cell lines.

Regression analysis of associations of cell line response to trametinib
Among the agents which were associated with GMD expression or with methylation status of epigenome targets in our study, sensitivity and resistance to the MEK inhibitor trametinib have been previously associated with specific DNA and protein sequence changes including BRAF V600E and KRAS or NRAS protein-changing variants [89,90]. For those GMDs and target gene regions and probes which were significantly and strongly (|ρ| > 0.5) associated with response to trametinib either in the pancancer dataset or in any individual cancer category, we performed a regression analysis conditional on the presence of BRAF V600E or any non-synonymous KRAS or NRAS variant as predictors of trametinib response. We examined whether the association of GMD expression or methylation of their epigenome targets with response to trametinib remained statistically significant after accounting for the gene sequence variants known to affect sensitivity or resistance to trametinib. Information about the sequence variants in BRAF, KRAS, and NRAS was obtained from GDSC whole exome sequencing data [75]. Regression analysis was performed using the Imtest R package v. 0.9-36 for testing linear regression models, using log(IC50) as a dependent variable, and gene mutation status and GMD expression or probe or gene region methylation as predictor variables. The p-values for association of response to trametinib with GMD expression or with target probe and gene region methylation were FDR adjusted for multiple testing.

Validation of the top study findings in publicly available independent datasets
In order to validate the top results from our correlation analyses between GMD expression, epigenome target methylation, and drug response, we used publicly available comprehensive independent datasets containing drug response, DNA methylation, and gene expression measures. Our first validation analysis used the NCI-60 cancer cell line panel dataset, previously screened by the National Cancer Institute, which we analyzed using CellminerCDB v. 1.2 [48,[91][92][93]. In the CellminerCDB analysis of NCI-60 cell line panel data, we examined Pearson correlation between GMD expression (measured as log2 of averaged gene expression measures from five microarray platforms, Affymetrix Human Genome HG-U95, Affymetrix Human Genome HG-U133, Affymetrix Human Genome U133 Plus 2.0, Affymetrix GeneChip Human Exon 1.0 ST, and Agilent Whole Human Genome Oligo arrays) and log (GI50) measures of drug response (representing drug activity measures in CellminerCDB multiplied by -1, in order to make the correlations in the CCLE-GDSC and NCI-60 datasets directly comparable) [48,94]. We also analyzed NCI-60 data using CellminerCDB in order to validate significant correlations of GMD expression with target DNA methylation in the pancancer data. Because CellminerCDB utilizes gene level DNA methylation values which are inferred from probes located predominantly in the upstream gene regions [95], we used CellminerCDB NCI-60 DNA methylation data to confirm significant CCLE-GDSC associations of DNA methylation of upstream gene regions (TSS1500, TSS200, 5′UTR, and the 1st exon). CellminerCDB employs Pearson correlation in its analyses.
The second validation analysis used the NCI SCLC cell line dataset, containing measures for 66 small cell line cancer cell lines [86,96]. It is available from the NCI Small Cell Lung Cancer Project site [97], with SCLC DNA methylation and transcript expression data also available from NCBI GEO (accession numbers GSE145156 and GSE73160). In the validation analysis using the NCI SCLC cell line data, we used Pearson correlation to examine associations of GMD expression, measured using Affymetrix GeneChip ® Human Exon 1.0 ST Array, with log(IC50) measures of drug response, and Spearman correlation to analyze associations between DNA methylation of individual probes, measured using Illumina Infinium MethylationEPIC BeadChip, and average methylation of gene regions, with GMD expression and with log(IC50) measures of drug response, using methodology described in our earlier report [86]. Measures of miRNA methylation were not included in the validation analysis of SCLC data.
For additional validation of the epigenome targets identified in our analysis of CCLE-GDSC data, we explored the clinical relevance of the findings in pancreatic ductal adenocarcinoma (PAAD) based on the literature reports which analyzed patient survival data in TCGA and in other patient datasets.

Searchable online resource
Our analysis generated an extensive set of tables with detailed information about the associations of genes affecting DNA methylation or demethylation. In order to provide the scientific community with the opportunity to independently explore these associations, we developed a web resource with dynamic searching and filtering features. The web resource is available at https ://brb.nci.nih.gov/gmdta bles/. It was developed using HTML, CSS, and the DataTables Javascript plug-in as highly flexible tools that allow researchers to visualize, search, filter, and download our results data for their own use. The online site also provides information about the 645 cancer cell lines used in our analysis. Table 1 summarizes significant associations of GMD expression with log(IC50) which satisfied Spearman |ρ| > 0.4 and p FDR < 0.05. Seven negative and positive correlations in individual cancer categories satisfied p FDR < 0.05. All of them were strong (0.5171 ≤|ρ|≤ 0.7900; Table 1). The highest number (4) of significant associations was observed in breast cancer.

Association of GMD expression with drug response
Pancancer correlations were highly significant but did not reach |ρ| > 0.5. Four genes had modest correlations with |ρ| > 0.4 (Fig. 3a), all of which were negative, indicating that increased GMD expression was associated with drug sensitivity. They were KDM2B (13 correlations), DNMT1 (3), APOBEC3G (1), and EHMT2 (1). Additional file 3: Table S3 and Fig. 3b provide an expanded list of 379 significant pancancer correlations satisfying a relaxed threshold of |ρ| > 0.3 and p FDR < 0.05. In the majority of them (91.8%, or 348 out of 379 correlations), increased GMD expression was associated with  Table S3). In contrast, ZBTB4, SMUG1, and CDKL5 expression was predominantly associated with drug resistance. Many epigenetic drugs were associated with GMD expression. Increased expression of the histone demethylase KDM2B gene was associated with sensitivity to the HDACi vorinostat, tubastatin A, panobinostat, belinostat, CAY10603, VNLG/124, and AR-42, the bromodomain inhibitor I-BET-762, the SIRT1 inhibitor selisistat, the EHMT1/EHMT2 inhibitor UNC0638, and the DOT1L protein methyltransferase inhibitor SGC0946 (Table 1; Additional file 3: Table S3). Correlations of KDM2B expression with agents targeting histone modifications likely involve the epigenetic role of KDM2B and its role in gene regulation [98].
We observed many associations of sensitivity to HDACi and the bromodomain inhibitor I-BET-762 with elevated expression of a number of GMDs, e.g., SETDB1, EHMT2, SUZ12, MBD1, UHRF2, and TDG (Additional file 3: Table S3). MBD1 expression was associated with sensitivity to the EHMT1/EHMT2 inhibitor UNC0638. In contrast, ZBTB4 and GADD45A expression was associated with resistance to HDACi. Associations with pretreatment expression of multiple GMDs are in agreement with the multifaceted actions of epigenetic agents which affect multiple molecular components [1,100,103].
Other associations suggest indirect involvement of the epigenetic pathways in drug response. Elevated expression of KDM2B, DNMT1, EHMT2, and UHRF1 was associated with sensitivity to daporinad, a nicotinamide phosphoribosyltransferase inhibitor (Table 1; Additional  file 3: Table S3). KDM2B, DNMT1, EZH2, UHRF1, and MBD1 were associated with sensitivity to the endotelin receptor A inhibitor zibotentan. Expression of KDM2B, DNMT1, EHMT2, EZH2, SUZ12, MBD1, UHRF2, and SETDB1 was associated with sensitivity, and that of ZBTB4 with resistance to the RIPK inhibitor XMD13-2 (Table 1; Additional file 3: Table S3). As sample sizes in individual tumor types were modest (Additional file 2: Table S2), many associations were significant in the pancancer analysis only. Their strength could be influenced by the differences in GMD expression and drug response among cancer types. Pancancer associations may indicate the GMD importance in response to the agents with similar activity across different tumor types.

Association of GMD expression with DNA methylation of epigenome targets
In order to examine indirect modulation of drug response by GMDs via their influence on DNA methylation, we identified their genome methylation targets which were strongly and significantly associated with their expression (Fig. 2b).
GMD expression was associated with methylation of probes and regions in many important cancer genes (Additional files 4, 5: Tables S4 and S5). Selected examples of such associations are discussed in detail in Additional file 19: Data S1. For example, we observed epigenetic regulation of methylation of probes and/or gene regions of ABL1, ABL2, MET, XRCC5, KIFC3, and TIMP. Similarly, we observed associations of expression of multiple GMDs with a probe in TGFBI, whose product has been associated with poor prognosis in colorectal cancer and is a predictive biomarker for dasatinib sensitivity [110,111]. Among the ABC family transporter genes, methylation of ABCC1 and ABCC3 was associated with GMD expression, suggesting multiple epigenetic pathways of their regulation.
Expression of multiple GMDs was associated with methylation of many probes and regions in genes involved in inflammation, e.g., IRAK2 which encodes an activator of the NF-κB pathway, and tumor necrosis factor receptor genes TNFRSF10B and TNFRSF1A. Among the Hippo pathway components, methylation of WWTR1 (TAZ) and TEAD1 was significantly correlated with GMD expression (Additional files 4, 5: Tables S4 and S5; Additional file 19: Data S1).
GMD expression was correlated with methylation of other genes involved in epigenetic processes or global transcriptional regulation (Additional files 4, 5: Tables S4 and S5; Additional file 19: Data S1). Expression of the histone methyltransferase SETDB1 and histone lysine demethylase KDM2B genes was positively correlated with methylation of the histone deacetylase HDAC9, suggesting HDAC9 regulation by SETB1 and KDM2B or coregulation among different histone modifiers. KDM2B and TET3 expression was positively associated with methylation of FTO, which participates in RNA methylation [13]. TET3 and SETDB1 expression was positively correlated with methylation of NNMT, whose product promotes tumorigenesis and regulates the availability of methyl groups for cellular methylation reactions [112,113]. MED1 and MED20 methylation was correlated with multiple GMDs, suggesting a potential influence of GMD expression on global transcriptional regulation. MED1 and MED20 are subunits of the mediator of RNA polymerase transcription. They participate in the Mediator complex, which is involved in transcriptional regulation of RNA polymerase II-dependent genes [114].

Table 2 GMDs with the highest numbers of strong transcorrelations between their expression and methylation levels of target individual probes and gene regions in pancancer analysis
Listed are the counts of correlations satisfying p < 1.389 × 10 -8 and Spearman |ρ| > 0.5 for individual probes and p FDR < 0.05 and Spearman |ρ| > 0. 5  and region methylation of GMDs with their expression was strongly negative (Additional files 4, 5, 6, 7: Tables S4-S7), suggesting regulation of expression of these GMDs by their promoter methylation. Consistent with the well-documented repressive effect of the MGMT promoter methylation on its expression [37], methylation of the 5′ UTR, the 1st exon, and several individual probes of MGMT was negatively correlated with its expression.

Analysis of individual cancer categories
We observed strong and significant correlations of GMD expression with methylation of probes and gene regions in the stratified analysis among cancer categories. The use of the threshold of p < 6.039 × 10 -10 for the probes identified 372 very strong correlations with 0.5801 ≤|ρ|≤ 1 (Additional file 8: Table S8), including 259 trans-correlations between 44 GMDs and probe methylation in 166 target genes. They represent the strongest and highly significant associations of GMD expression with individual probes. Methylation of many other probes was also correlated with GMD expression but did not satisfy the stringent p-value threshold (data not shown). Correlation analysis of GMD expression with methylation of gene regions identified 14,609 associations with 0.5 < |ρ|≤ 1 and p FDR < 0.05, including 14,558 trans-correlations between expression of all 72 GMDs and the gene regions in 8,336 target genes (Additional file 9: Table S9). Expression of many GMDs was correlated with methylation of multiple probes and gene regions (Table 3). A large number of associations was observed in chronic leukocytic leukemia (CLLE; Additional file 21: Fig. S2; Additional file 11: Table S11 (17), AIDCDA (15), and APOBEC3B (11). Consistent with their roles in DNA methylation and with the direction of associations in the pancancer dataset, DNTM1, DNMT3A, and DNTM3B expression was associated with increased methylation of many individual probes and gene regions in multiple tumor types (Table 3; Additional file 21: Fig. S2; Additional files 10, 11: Tables S10 and S11). Expression of UHRF1, whose product has multiple roles in DNA methylation including interactions with DNMT1, DNMT3a, DNMT3b and G9a, control of DNMT1 abundance, and targeting DNMT1 to hemimethylated DNA during replication (Additional file 1: Table S1) [24,25,32,40,115], was strongly associated with methylation of multiple probes and regions in many tumor types (Table 3; Additional file 21: Fig. S1; Additional files 10, 11: Tables S10 and S11). While UHRF1 associations did not reach |ρ| > 0.4 in the pancancer data ( Fig. 4; Additional files 6, 7: Tables S6 and S7), its expression had weaker positive significant associations with 12,611 probes with p < 1.389 × 10 -8 and 0.3 < ρ < 0.4 and only 10 negative correlations with p < 1.389 × 10 -8 and − 0.4 < ρ < − 0.3 (data not shown). This highlights the importance of URHF1 in DNA methylation in tumors.
Associations of some GMDs were specific to individual cancer categories, suggesting heterogeneity of the mechanisms and of the strength of epigenetic interactions among cancer histologies. For example, in CLLE, ZBTB38 expression was significantly (p FDR < 0.05) negatively correlated with ρ < − 0.5 with 869 regions of other genes and had no positive trans-associations. By contrast, when using this threshold, ZBTB38 had only 3 negative and 1 positive correlations with gene regions in NSCLC and only 1 negative correlation in breast cancer cell lines (Additional file 21: Fig. S2; Additional file 11: Table S11). It had no negative trans-correlations and 1 and 11 positive trans-correlations with gene regions in the COAD/ READ and PAAD categories, respectively. Many other GMDs also had variable numbers of positive and negative strong associations in different tumors (Table 3; Additional file 21: Fig. S2; Additional files 8, 9, 10, 11: Tables S8-S11).
Similar to the pancancer analysis, we observed multiple strong significant associations of GMD expression with methylation of other GMDs and other genes involved in epigenetic processes and chromatin structure, maintenance, and regulation. For example, APOBEC2 expression in bladder cancer was associated with methylation of the 5′ UTR and the 1st exon of the DNA demethylase ALKBH2, whose product removes N 1 -meA and N 3 -meC (ρ = − 0.8687; Additional files 1, 9: Tables S1 and S9) [38,39].

Table 3 GMDs with the highest numbers of strong correlations between their expression and methylation levels of target probes and gene regions within individual cancer categories
Listed are the counts of correlations satisfying p < 6.039 × 10 -10 and Spearman |ρ| > 0.5 for individual probes and p FDR < 0.05 and Spearman |ρ| > 0. 5  We observed many strong tumor type-specific significant correlations of GMD expression with methylation of genes important in cancer. The detailed results are presented in Additional files 8, 9: Tables S8 and S9, and selected examples are discussed in Additional file 19: Data S1. They include associations of methylation of regions of the ABL2 oncogene in breast cancer and in COAD/READ, and of the epidermal growth factor receptor EGFR gene in CLLE. Upstream gene regions of the tumor suppressor RUNX1 were positively correlated with DNMT3A expression in liver hepatocellular carcinoma. RUNX1 is downregulated in the early stages in hepatocellular carcinoma [116], and our findings suggest a potential role of DNA methylation in its regulation. The upstream region of MYCN, which may play a regulatory role in MYCN expression [86,117], was associated with GMD expression in breast cancer and in COAD/READ. In CLLE, the body of EGFR was strongly positively correlated with EHMT2 and PHC2.
In several tumor types, GMD expression was associated with methylation of the regions of MLKL and RIPK3 encoding key players in necroptosis [118], RIPK2 and RIPK4, which are involved in inflammatory signaling and NF-κB activation [119,120], and IRAK2, IRAK3, and IRAK4, which mediate the toll-like receptor and interleukin-1 receptor signaling pathways and are involved in the NF-κB activation [121] (Additional file 9: Table S9; Additional file 19: Data S1). Numerous GMDs were strongly associated with methylation of components of the TNF-α signaling pathway [122] including TNF and other TNF family members, e.g., TNSF11 (RANKL) and TNSF13B (BAFF) involved in activation of NF-κB signaling [123], TNFAIP3 and TNFAIP8L2 encoding TNF-α induced proteins, and the TNF receptor superfamily members.
GMD expression was strongly correlated with methylation of multiple components of the Hippo signaling pathway [122] (Additional files 8, 9: Tables 8 and S9; Additional file 19: Data S1), including YAP1 in stomach adenocarcinoma (STAD), WWTR1 (TAZ) in STAD, NSCLC, breast cancer, sarcoma, CLLE, and COAD/ READ, TEAD1 in CLLE, TEAD2 in glioma and SCLC, LATS1 in CLLE, LATS2 in mature B-cell lymphoma, and MST1 in CLLE. Methylation of RASSF1, RASSF2, RASSF3, RASSF6, RASSF7, and RASSF9, from the RASSF regulator family [122] was strongly associated with expression of several GMDs in a variety of cancer categories. As discussed above, WWTR1 and TEAD1 methylation was also associated with GMD expression in the pancancer data. These findings indicate a strong epigenetic regulation of the Hippo pathway.

Association between methylation of target probes and gene regions and drug response
After identifying 1,306 target probes in 45 genes and 11,754 gene regions in 8,374 genes, which were strongly and significantly associated with GMD expression in pancancer analysis or in individual cancer types (Additional files 4, 5, 8, 9: Tables S4, S5, S8, and S9), we examined the association of their methylation with log(IC50) (Fig. 2). Only 4 probes and 3 regions had significant correlations with |ρ| > 0.5 in the pancancer data (p FDR ≤ 2.94 × 10 -14 for the probes and p FDR ≤ 9.26 × 10 -17 for the gene regions; Additional files 12, 13: Tables S12 and S13). The probe cg16411668 in a non-coding region was associated with KDM2B, DNMT3A, and SETDB1 expression and with panobinostat sensitivity (ρ = -0.5245). The probes cg08422793 and cg20824939 in intergentic regions and cg20092122 in BST2, the bone marrow stromal antigen 2, were associated with sunitinib resistance (0.5027 ≤ ρ ≤ 0.5162) and with APOBEC3G and/or APOBEC3C expression. Consistent with the cg20092122 association, the TSS1500, TSS200, and the 1st exon of BST2 were also associated with sunitinib resistance (ρ = 0.5167, ρ = 0.4622 and 0.4507; Additional file 13: Table S13). All these upstream regions were associated with CBX2 expression. The 5′UTR of SELPLG was also associated with sunitinib resistance (ρ = 0.5305), and with HDAC1 expression. The CYR61 body was associated with panobinostat sensitivity (ρ = -0.5021) and with KDM2B and SETDB expression.
We observed modest (|ρ| > 0.4) significant correlations involving both probes and the entire regions of many important genes (Additional files 12, 13: Tables S12 and S13). The 5′ UTR, 1st exon, and multiple probes in the tumor suppressor gene DAPK3 were associated with the HDACi vorinostat and panobinostat. Panobinostat sensitivity was also correlated with methylation of a probe and the entire TSS1500 in NNMT, which controls the methylation potential of tumor cells [112], consistent with NNMT upregulation in a panobinostat resistant glioma cell line [139] and with the correlation of NNMT expression with vorinostat resistance [140]. Individual probes and the body of the oncogene DDA1 were associated with sensitivity to the HDACi vorinostat and panobinostat, the bromodomain inhibitor I-BET-762, the PDK1 inhibitor BX-912, the LXR agonist T0901317, and the HER2 inhibitor TL-2-105 (Additional files 12, 13: Tables S12 and S13). The 5′UTR and its probes in RUNX1 were associated with resistance to sunitinib, cyclopamine, and Z-LLNle-CHO. The 5′UTR, the body, and their probes in the transcriptional regulator SP1 gene were associated with resistance to refametinib and tanespimycin. The 5′UTR and its probes in the transcriptional regulator MAFK gene were associated with sensitivity to the IKK inhibitor BMS-345541, the CRAF inhibitor TL-2-105, and the HDACi vorinostat. Many MAFK probes were also associated with other agents. The TSS200 of TREX1 was associated with vorinostat sensitivity (ρ = -0.4101, p FDR = 5.28 × 10 -19 ), and the TSS200 of TMEM173 (STING) was associated with sunitinib resistance (ρ = 0.4202, p FDR = 5.09 × 10 -9 ).

Regression analysis of response to trametinib
After identifying significant methylation probes and gene regions associated with trametinib (Additional files 4, 5: Tables S4 and S5) and of the GMDs whose expression was associated with response to that agent with |ρ| > 0.5 (Table 1), we included them individually as predictor variables in multivariate regression analysis of trametinib response. We also used the mutation status of BRAF, KRAS, and NRAS as additional predictor variables. When BRAF V600E and non-synonymous changes in KRAS or NRAS were considered, methylation of the 5′UTR of C7orf49 and of the probe cg00172872 in the intergenic region on 12q21.33 remained significantly associated with trametinib in pancancer and breast cancer (5.24 × 10 -13 ≤ p FDR ≤ 0.0023). BRAF V600E was also highly significant in these models both in breast cell lines and in the pancancer data (p ≤ 2.52 × 10 -5 ), while the variants in KRAS or NRAS were significant in the pancancer data (p ≤ 2.23 × 10 -17 ; data not shown). cg00172872 was associated with CBX2, SETDB1, and TET3 expression (Additional files 4, 14: Tables S4 and S14), while the 5′UTR of C7orf49 was associated with GADD45A (Additional files 9, 15: Tables S9 and S15). GADD45A expression was also strongly correlated with trametinib response in breast cancer (ρ = -0.7012; p FDR = 0.0399; Table 1). When adding the BRAF V600E, KRAS, and NRAS mutation status to the model, association of GADD45A expression with trametinib in breast cell lines had p = 0.0003 prior to FDR adjustment, and p FDR = 0.1566 after the adjustment (data not shown). These results suggest the importance of the GADD45A expression and C7orf49 methylation in trametinib response. C7orf49 (CYREN) is a cell-cyclespecific inhibitor of classical non-homologous end joining of DNA double-strand break repair, regulating the selection of DNA double-strand repair pathway [141].
Stratified analysis within tumor types identified a strong and significant correlation between epigenome methylation in bladder cancer and sensitivity to the CDK inhibitor THZ-2-49 (ρ = -0.8596, p FDR = 0.0243; data not shown). Two correlations in COAD/READ were strong with p FDR < 0.15, including sensitivity to the PDK-1 inhibitor BX795 and the proteasome inhibitor MG-132 (ρ = -0.6376 and -0.8791, p FDR = 0.126 for both; data not shown). The biological mechanisms of these associations require further investigation.

Validation of the findings and their clinical significance using independent datasets
Among pancancer correlations of GMD expression with drug response presented in Table 1, which satisfied Spearman |ρ| > 0.4 and p FDR < 0.05, seven associations had both GMD expression data and log(GI50) drug response data for the same agents available in the NCI-60 dataset in CellminerCDB (Additional file 16: Table S16). Among them, KDM2B expression was strongly and highly significantly correlated with sensitivity to the HDAC inhibitor vorinostat in the NCI-60 cancer cell lines (Pearson r = -0.51, p = 4.3 × 10 -5 ), providing a strong support for our initial finding of this association in the CCLE-GDSC dataset. Four additional associations for KDM2B, DNMT1, and APOBEC3G had the same direction of association between GMD expression and drug sensitivity both in the CCLE-GDSC and NCI-60 datasets, but they did not reach statistical significance in the NCI-60 data (Additional file 16: Table S16).
Additional file 17: Table S17 shows the strength of Pearson correlation in the NCI-60 data from CellminerCDB, used for validation of significant correlations between GMD expression and DNA methylation of upstream gene regions (TSS1500, TSS200, 5′UTR, and the 1st exon) in the pancancer CCLE-GDSC data from Additional file 5: Table S5. Among the 116 significant correlations from CCLE-GDSC data listed in Additional file 17: Table S17 which also had comparable NCI-60 data (GMD expression and gene-averaged methylation derived from the upstream probes) in CellminerCDB, 63 (54% of the total) had both the Pearson correlation p < 0.05 in the NCI-60 data and the same direction of association in both datasets, confirming our initial findings. Many additional GMD-target gene associations in Additional file 17: Table S17 had the same direction of correlation both in the CCLE-GDSC and NCI-60 datasets but did not reach statistical significance.
We also observed a strong and consistent confirmation of our findings in an independent NCI SCLC dataset consisting of 66 small cell lung cancer cell lines, which we had generated previously [86]. Additional file 18: Table S18 provides Spearman correlation results between GMD expression and DNA methylation of gene regions in 66 SCLC cell lines. They validate the significant findings in the SCLC category of the CCLE-GDSC data from Additional file 9: Table S9 that had p FDR < 0.05 and |ρ| > 0.5. Among 734 significant GMD-gene region correlations with available data in both datasets, 521 (71%) had both associations in the same direction and p < 0.05 in the independent NCI SCLC dataset. Among validated results for multiple associated GMDs in Additional file 18: Table S18, we note multiple correlations involving the KMT2A (MLL) gene which is frequently mutated in SCLC, and EZH2, an important epigenetic drug target in SCLC, pharmacologic inhibition of which suppresses SCLC growth and chemoresistance [144][145][146].
We were also able to validate several significant correlations of individual target epigenome probes and gene regions with drug response in the SCLC category in the CCLE-GDSC dataset (Additional files 14, 15: Tables S14 and S15) using the associations in the NCI SCLC dataset, even though these two datasets contained many different agents and used two different Illumina methylation arrays. We used the NCI SCLC dataset to confirm the associations of the TSS1500 of CXCL17 and TSS1500 of PPR18 with response to docetaxel and of the probes cg0260189 in the body of BIK with docetaxel, cg04619882 in the body of KIAAA1324 with dactolisib, and cg04619885 in the body of UBE2O with PD0325901 (0.2503 ≤|ρ|≤ 0.3623, 0.0029 ≤ p ≤ 0.0427, and the direction of associations was also identical in both datasets; data not shown). For some agents which were unique to the CCLE-GDSC screen, confirmation of clinically important associations with epigenomic targets may be suggested based on indirect evidence. For example, methylation of the TSS1500 of TEAD2 in the SCLC category of the CCLE-GDSC dataset was associated with resistance to the mTOR inhibitor temsirolimus (Additional file 15: Table S15). It is consistent with an earlier report by an independent group using SCLC CCLE cell lines and with our previous findings in the NCI SCLC dataset, which showed that increased methylation and low expression of the genes encoding TEAD co-activators YAP1 and TAZ in the Hippo pathway in SCLC were associated with resistance to multiple mTOR inhibitors [86,147,148].
We further evaluated the available indirect support for the potential clinical significance of our findings in pancreatic adenocarcinoma, by examining published reports based on patient data. Among the five genes whose probes and/or region methylation was associated with in vitro drug response in our analysis of the PAAD category in the CCLE-GDSC dataset (Additional files 14, 15: Tables S14 and S15), FMOD had been previously reported to be associated with patient survival. It encodes fibromodulin, an extracellular matrix protein overexpressed in pancreatic ductal adenocarcinoma [149]. In our study, methylation of the FMOD gene body in PAAD was associated with response to the Hsp90 inhibitor 17-AAG (17-allylamino-17-demethoxygeldanamycin; Additional file 15: Table S15). FMOD protein expression had been previously associated with PAAD patient survival in the Queensland Centre for Medical Genomics dataset [149]. In other cancer categories, multiple studies have reported an association of upregulation of FMOD with poor patient survival in TCGA glioblastoma patients, and its product has been suggested to have an immunosuppressive role, whereas the silencing of FMOD leads to apoptosis in CCLE [150][151][152].
In our analysis of the CCLE-GDSC data, methylation of a probe and the gene body of TPO (encoding thyroid peroxidase) was associated with response to the c-Met and NPM-ALK inhibitor PF-2341066 in the PAAD category (Additional files 14, 15: Tables S14 and S15). TPO was previously reported to be among the most mutated genes in PAAD patient tumors in TCGA, suggesting a possible combined influence of epigenetic regulation of this gene and the mutational landscape on treatment response [153].

Discussion
Using patient-derived cell line genomic and drug response data we identified significant associations of 72 important GMDs with drug response and with DNA methylation based on multiple probes across the epigenome. We were able to confirm many associations in independent datasets using direct validation of comparable associations in the NCI-60 and NCI SCLC cancer cell line panels, and indirect evidence from reports on PAAD patient data. Our results provide a resource for future studies of GMDs which may influence methylation of a particular gene of interest, or analyses to explore direct and indirect associations of GMDs with tumor cell line response for specific therapeutic and pharmacological agents. Expression of multiple GMDs was strongly and significantly correlated with response to a variety of agents, even though the associations in the pancancer data were modest. GMD expression had widespread associations with methylation of genes involved in tumor development and progression and in drug response, suggesting multiple overlapping regulatory influences on the epigenome.
When analyzing indirect GMD effects on drug response (Fig. 2B), we used the threshold of the Spearman correlation coefficient ρ, to focus on the strongest significant correlations of GMD expression with methylation of their targets, and on correlations of methylation of the most strongly associated targets with drug response. Individual GMDs also had multiple statistically significant weaker correlations with their targets which we did not report. For example, we identified 1,905 strong significant correlations of GMD expression with methylation of individual probes in the pancancer dataset satisfying p < 1.389 × 10 -8 and Spearman |ρ| > 0.5 (Additional file 4: Table S4). When examining weaker GMD-probe correlations using the same significance threshold of p < 1.389 × 10 -8 , we found 24,904 associations with |ρ| > 0.4, and 254,827 correlations with |ρ| > 0.3 (data not shown). These results suggest common and complex influence of GMDs on DNA methylation in tumor cells. Weaker associations may indicate important biological influences of GMDs on cancer cell regulation, possibly under specific conditions or in subsets of tumor cells with specific mutational and/or expression profiles.
While the associations of methylation of the target probes and gene regions with log(IC50) may suggest a possible regulation of drug sensitivity or resistance resulting from DNA methylation on gene expression, many correlations with methylation targets involved epigenetic agents, which may suggest additional epigenetic mechanisms. Examples include the HDACi panobinostat, vorinostat, and AR-42 and the bromodomain inhibitor I-BET-762 (Additional files 12, 13, 14, 15: Tables S12-S15). Examples of correlations of methylation of target genes with response to epigenetic drugs include methylation of DAPK3, DDA1, NNMT, MAPK, TREX1, and ABCC3. Even though methylation of those genes was measured prior to treatment, such genes may not necessarily directly affect sensitivity or resistance to epigenetic drugs. While a direct involvement of their products in the response to epigenetic agents is possible, another potential explanation could be that correlations involving methylation of specific target genes may indicate more global influences of different levels of GMD expression on epigenome methylation prior to treatment. In that case, methylation of specific target genes could be a marker of the overall epigenetic activity of one or more GMDs affecting multiple target genes, rather that indicate a direct influence of a specific target gene on drug response. Furthermore, in addition to their effect on DNA methylation, many GMDs analyzed in this study have other epigenetic or regulatory roles which are targeted by some of the agents. Some GMDs, e.g., HDACs, may indirectly regulate gene expression by modifying a diverse set of protein targets including transcription factors [154]. Further biological investigation may be needed to address whether the correlations of drug response with DNA methylation of target genes which were associated with GMD expression may be explained by the mechanisms involving the action of specific target gene products (e.g., by an effect of a transporter on a drug concentration within a cancer cell) or by broad non-specific effects of pretreatment GMD expression, which affects DNA methylation of multiple genes in the epigenome.
Our methylation dataset was restricted to the combined measurements of 5-mC and 5-hmC using the Illumina Infinium HumanMethylation450 BeadChip array. The products of several GMDs analyzed in this study, e.g., TET1, TET2, TET3, and TDG, generate 5-fC and 5-caC, whereas MGMT demethylates O 6 -meG, and the action of ALKBH2, and ALKBH3 results in the removal of N 1 -meA and N 3 -meC [1,13,[37][38][39] (Additional file 1: Table S1). Drug resistance mechanisms involving some of these pathways, such as the role of MGMT expression in temozolomide resistance, were not detected in our study which used in vitro assay measures, likely because temozolomide is a prodrug which is converted to an active compound in the body, but possibly inconsistently in in vitro screening assays [155]. Similarly, we did not analyze methyladenine modifications as they were not captured in the available methylation data.
Altered GMD function in tumors can arise both from DNA mutations and transcriptional changes [4,10,144]. We analyzed the variation in GMD expression and did not examine GMD mutation status. As some GMDs may also have gain-of-function or loss-of-function variants in malignant cells, future large-scale analyses may investigate how drug response of tumor cells may be jointly influenced by DNA and protein sequence changes in GMDs, their copy number variation, gene fusions involving GMDs, and variation in GMD expression. Drug response may also be affected by the sensitivity or resistance mutations acquired by the genes encoding drug targets or by additional genes. Our regression analysis of trametinib response confirmed associations for GADD45A and its putative epigenome targets while accounting for the mutation status of BRAF V600E, KRAS, and NRAS.
Our study provides an extensive reference set of associations between expression of GMDs, their methylation of their epigenome targets, and response to drug treatment in a variety of cancer categories. These results provide a new insight into the epigenetic landscape of molecular interactions in tumors and suggest potential mechanisms of epigenetic influences on tumor cell response to a variety of chemotherapeutic agents.

Conclusions
We identified multiple associations of GMD expression with drug response and with DNA methylation of individual probes and gene regions in the epigenome. Methylation of many epigenome targets was correlated with response to treatment. Our findings suggest potential direct and indirect influences of GMD expression on drug response, which may be mediated by interconnected regulation of DNA methylation pathways.