Association of variably methylated tumour DNA regions with overall survival for invasive lobular breast cancer

Background Tumour DNA methylation profiling has shown potential to refine disease subtyping and improve the diagnosis and prognosis prediction of breast cancer. However, limited data exist regarding invasive lobular breast cancer (ILBC). Here, we investigated the genome-wide variability of DNA methylation levels across ILBC tumours and assessed the association between methylation levels at the variably methylated regions and overall survival in women with ILBC. Methods Tumour-enriched DNA was prepared by macrodissecting formalin-fixed paraffin embedded (FFPE) tumour tissue from 130 ILBCs diagnosed in the participants of the Melbourne Collaborative Cohort Study (MCCS). Genome-wide tumour DNA methylation was measured using the HumanMethylation 450K (HM450K) BeadChip array. Variably methylated regions (VMRs) were identified using the DMRcate package in R. Cox proportional hazards regression models were used to assess the association between methylation levels at the ten most significant VMRs and overall survival. Gene set enrichment analyses were undertaken using the web-based tool Metaspace. Replication of the VMR and survival analysis findings was examined using data retrieved from The Cancer Genome Atlas (TCGA) for 168 ILBC cases. We also examined the correlation between methylation and gene expression for the ten VMRs of interest using TCGA data. Results We identified 2771 VMRs (P < 10−8) in ILBC tumours. The ten most variably methylated clusters were predominantly located in the promoter region of the genes: ISM1, APC, TMEM101, ASCL2, NKX6, HIST3H2A/HIST3H2BB, HCG4P3, HES5, CELF2 and EFCAB4B. Higher methylation level at several of these VMRs showed an association with reduced overall survival in the MCCS. In TCGA, all associations were in the same direction, however stronger than in the MCCS. The pooled analysis of the MCCS and TCGA data showed that methylation at four of the ten genes was associated with reduced overall survival, independently of age and tumour stage; APC: Hazard Ratio (95% Confidence interval) per one-unit M-value increase: 1.18 (1.02–1.36), TMEM101: 1.23 (1.02–1.48), HCG4P3: 1.37 (1.05–1.79) and CELF2: 1.21 (1.02–1.43). A negative correlation was observed between methylation and gene expression for CELF2 (R = − 0.25, P = 0.001), but not for TMEM101 and APC. Conclusions Our study identified regions showing greatest variability across the ILBC tumour genome and found methylation at several genes to potentially serve as a biomarker of survival for women with ILBC.


Introduction
Invasive lobular breast cancer (ILBC) is the second most common histological subtype of breast cancer accounting for 10-15% of all cases [1][2][3]. ILBCs are typically Suman et al. Clin Epigenet (2021) 13:11 oestrogen receptor (ER) and progesterone receptor (PR) positive and human epidermal growth factor receptor 2 (HER2) negative and are strongly associated with hormonal risk factors for breast cancer [4][5][6][7]. The incidence of ILBC increased sharply in the late 1990s as a consequence of the increased use of hormone replacement therapy (HRT) [8][9][10][11][12][13]. Awareness of the increased risk of breast cancer associated with HRT led to reduced use and a decline in ILBC incidence [14], but it has been shown to increase again recently [15,16].
ILBCs display an obscure growth pattern with small, round and discohesive cells growing in a single file without forming any distinct clusters [17]. This is likely to be related to a loss of E-cadherin protein which is common in ILBC tumourigenesis and is a hallmark of this subtype [18]. Compared with other breast cancer types, ILBCs are less likely to form a firm and distinct lump and often present as undefined palpable masses on mammography [19,20]. This poses a significant challenge for its early detection by routine mammographic screening [19,[21][22][23]. This occult nature may explain the detection of ILBC cases at advanced stages [4,[24][25][26]. ILBCs display a unique metastatic behaviour and often metastasis to the gastrointestinal tract [27,28], colon [29], ovaries [30] and uterus [31], which is uncommon for other breast cancers types.
ILBC is biologically and histologically heterogeneous with several histological subtypes described that show distinct clinical behaviour and outcomes [3,17,[32][33][34][35]. Aberrant tumour DNA methylation is a hallmark of cancer that occurs early in cancer development and is thus a potentially valuable marker of tumour progression and patient survival. Alterations in tumour DNA methylation have been investigated in detail for many types of cancer, including breast cancer but ILBCs are largely underrepresented in these studies [36,37]. Studies focusing on ILBC-specific DNA methylation alterations have mainly used a candidate gene approach and have reported aberrant promoter methylation status for specific genes such as CDH1 [38][39][40][41], RASSF1A, HIN-1, RAR-β, cyclin-D2, TWIST [42], ADAM33 [43], SFRP1 [44] and DAPK1 [45]. Moelans et al. (2015) compared the methylation profiles of classic ILBC (n = 20), pleomorphic ILBC (n = 16) and IDBC (n = 20) for 24 established and putative tumour suppressor genes and found lower TP73 and MLH1 promoter methylation and higher RASSF1 promoter methylation in pleomorphic compared with classic ILBC [46]. Bae et al. (2004) compared the methylation profiles of ILBC (n = 19), IDBC (n = 60) and mucinous breast cancer (n = 30) for a panel of 12 genes and found BRCA1 promoter hypermethylation in 92% of mucinous breast cancer compared with 39% in ILBC and 28% in IDBC. They also reported ILBC and mucinous breast cancer samples to be more frequently methylated for other genes in the panel compared with IDBC [47].
In this study, we hypothesised that genome-wide variations in DNA methylation patterns within the ILBC group may guide or reflect different tumour biologies leading to subgroups of tumours that differ in their clinical behaviour. Our aims were twofold: i) to investigate the genome-wide DNA methylation variability within the ILBC group and ii) to assess associations between tumour methylation at the most variable methylated regions and overall survival for women with ILBC.

Variably methylated regions in ILBC
We identified 2,771 regions across the genome that showed substantially variable methylation (P < 10 −8 ) across ILBCs in the MCCS (Additional file 2: Table S1). These VMRs corresponded to 2,208 genes and 563 intergenic regions. The most significant regions (P < 10 −8 ) and the genes associated with these regions were chr20:13199787-13201844 (ISM1,  29 (Fig. 1). There was some tendency for VMRs including more CpGs to be more highly ranked (Additional file 1: Fig. S1). We found a significant enrichment for CpG island-associated regions compared to all probes included in the HM450K array (Fig. 2a). Gene annotation also showed that 62% of the VMRs were located in gene promoter regions (1st Exon, 5 prime UTR, TSS1500 and TSS200) compared with 20% in gene body regions and 23% in enhancer regions (Fig. 2b). The pathway enrichment analysis showed that the genes associated with the VMRs were enriched for 1,973 terms (FDR-adjusted P < 0.05) including 54 KEGG pathways with stronger evidence for neuroactive ligand-receptor interaction (hsa04080), breast cancer (hsa05224), pathways in cancer (hsa05200), hippo signalling pathway (hsa04390), Rap1 signalling pathway (hsa04015) and PI3K-Akt signalling pathway (hsa04151). Figure 3 shows the twenty most significant KEGG pathways enriched in the VMRs.
Replication of the VMR analysis in TCGA dataset (n = 168), identified 2760 VMRs, of which 763 (28%) overlapped with the MCCS. The ten most significant VMRs identified in the MCCS ranked highly in the TCGA dataset (Table 2).
Pathway enrichment analysis of the 763 overlapping VMRs resulted in 416 enriched functional terms (FDRadjusted P < 0.05) including nine enriched KEGG pathways. Of these, 369 overlapped with pathways identified for all MCCS VMRs; neuroactive ligand-receptor interaction (hsa04080) and hippo signalling pathway (hsa04390) were among the KEGG pathways that were also found to be significantly enriched using all MCCS VMRs.

VMRs and association with overall survival
In the MCCS, higher tumour methylation showed association with shorter overall survival for APC (HR = 1.28, 95% CI: 1.07-1. 53 Table 3, all VMRs had an average methylation level below 0.5 and the direction of association was positive (gains in methylation associated with shorter survival).
In TCGA dataset, the crude HRs were all positive, consistent with the MCCS dataset, albeit generally greater, in particular for ISM1 (HR = 1. 48 Table 4).

Correlation with gene expression
A relatively strong negative correlation between DNA methylation and gene expression was observed for six of the nine tested VMRs in TCGA (Fig. 4). These included No or slightly positive correlation between DNA methylation and gene expression levels was observed for APC, TMEM101 and NKX6. The feature-by-feature analysis of correlations with gene expression was very consistent with the analysis using average methylation, virtually all associations being in the same direction, with only moderate variation in effect estimates (Additional file 3: Table S2).

Discussion
We investigated the genome-wide DNA methylation pattern of ILBC tumours, with the aim of identifying methylation markers predictive of patient outcome. Scanning of the ILBC methylome revealed regions of variable methylation in ILBC tumours. The VMRs were primarily located in CpG island regions and were significantly enriched in pathways such as breast cancer (hsa05224), pathways in cancer (hsa05200), hippo signalling pathway (hsa04390), Rap1 signalling pathway (hsa04015) and PI3K-Akt signalling pathway (hsa04151). These pathways have previously been found to be dysregulated in cancer tissue [48][49][50][51][52][53]. Some of the key genes involved in the enriched pathways included APC, DAPK1, BMP2 and CCND2. DAPK1 is an important regulator of cell apoptotic pathways [54] and DAPK1 promoter hypermethylation has previously been reported in ILBCs with a potential role in tumour progression [45,55]. BMP2 is a member of the TGF-ß superfamily and is involved in cell proliferation and differentiation during tumour formation [56]. Promoter methylation of BMP2 has been associated with breast cancer progression and drug resistance [57]. CCND2 promoter methylation was previously reported to be a common event in breast cancer and have prognostic value [58]. We found a similar DNA methylation variability profile in TCGA dataset, in particular for the VMRs showing strongest variability in the MCCS. Several previous studies have reported tumour DNA methylation to have prognostic value in cancer [59][60][61][62][63][64]. Methylation at many gene promoters has been reported to have independent prognostic value in breast cancer including HOXA11 [65], ESR1 and PITX2 [66], HOXD13 [67] CDH22 [68] BRCA1 and RASSF1 [69,70]. Tumour DNA methylation and its prognostic significance has also been investigated for certain breast cancer subtypes, in particular gene expression-based subtypes. Thomas et al. (2017) used hierarchical clustering based on DNA methylation to further segregate luminal A tumours into two subgroups and found that the subgroup with lower relative methylation showed better prognosis [71], similar to the findings of our study. Another study using whole-genome methylation sequencing stratified triplenegative breast cancers into three methylation-defined clusters and found the hypomethylated cluster to show better prognosis compared with the other two highly methylated clusters [72], also consistent with our results. However, to our knowledge, no study has reported on the overall tumour methylation variability in ILBC and tested the potential for the variably methylated regions to be used as prognostic markers. The assessment of VMRs was genome-scale but only the highest ranking VMRs were tested for their association with survival. Although many of the tested VMRs showed a significant association with overall survival, there could be other VMRs or individual CpG sites for which methylation is associated with survival. We found that promoter hypermethylation at APC, TMEM101 and HCG4P3 was associated with shorter overall survival in the MCCS after adjustment for age and tumour stage. The results in TCGA were largely consistent with the MCCS, although associations generally appeared stronger; this might suggest that the prognostic value of these DNA methylation markers is greater for women with more advanced ILBC. In the pooled analysis, DNA methylation at four genes (APC, TME101,  ISM1, b APC, c TMEM101, d ASCL2, e NKX6, f HIST3H2A, g HCG4P3, h HES5, i CELF2 [73]. The association of APC promoter methylation with reduced survival has also been reported for other cancer types, such as

Table 3 Hazard ratios (HRs) for the association between the methylation levels at the ten most significant variably methylated regions (VMRs) and overall survival in the Melbourne Collaborative Cohort Study (MCCS) and The Cancer Genome Atlas (TCGA) dataset
*Gene: Gene associated with the variably methylated regions (VMRs), most of the VMRs were located in the promoter region of the genes, **Average methylation: Average methylation level (beta-value) of the samples across the VMRs, HR hazard ratio, CI confidence interval  [75] and prostate cancer [76,77]. CELF2, an RNA binding protein involved in alternative splicing, has also been reported to be involved in breast cancer growth and progression. Piqué et al., (2019) found that CELF2 promoter methylation led to a loss of CELF2 expression that had a growth promoter effect in breast tumours. They also found that CELF2 promoter methylation was associated with worse patient outcome [78]. In TCGA data, we found a strong, negative correlation between CELF2 promoter methylation and the gene expression levels. TMEM101 is a transmembrane protein that has been shown to activate NF-kappa-beta signalling pathways. There is to our knowledge no previous literature suggesting a role of TMEM101 promoter methylation in relation to cancer progression/survival. HCG4P3 is also known as HLA complex group 4 pseudogene 3, and there is to our knowledge no record of this gene being involved in cancer.

MCCS
The main limitation of this study was the relatively small sample size that limited our analysis to all-cause death as an endpoint. The MCCS and TCGA data had different characteristics in terms of their study design and sample variation. The two studies had different follow-up times, and TCGA data had more young women and generally higher tumour stage (Table 1). Our findings for both the VMR and survival analysis were nevertheless consistent across the two studies. We considered the main factors that we thought could impact methylation profiles in tumours and ILBC survival, i.e. age and stage. Factors such as smoking, alcohol consumption or diabetes, and perhaps family history (via underlying genetic sequence) likely play some role, but it is presumably less important, so we did not include them in the analysis. These variables are not systematically collected with precision (questionnaires) in the clinical setting. In this context, our study identified methylation biomarkers, and it is likely that many factors worthy of investigation (genetic and lifestyle and environmental) play a role in explaining the observed associations. Finally, while we identified a large number of regions across the ILBC genome that showed substantial variable methylation pattern, only the strongest ten VMRs were tested for association with survival to minimise the multiple testing burden. If replicated by other studies, the methylation markers identified in our study may contribute to the development of molecular signatures for enhanced prediction of ILBC survival.

Conclusions
Our study indicates that methylation levels at the most variable regions across the genome may explain differences in tumour prognosis within the ILBC subtype. We identified APC, TMEM101, HCG4P3 and CELF2 promoter methylation as possibly relevant prognostic biomarkers for women with ILBC. Further studies are required to confirm our findings and to assess their utility in a clinical setting.

Study samples
The samples included in this study were obtained from the Melbourne Collaborative Cohort Study (MCCS) [79]. The MCCS was set up in 1990 with the aim of investigating the role of diet and lifestyle in cancer and other diseases. Between 1990 and 1994, 41,513 participants, aged 40-69, were recruited in the study, and baseline information on lifestyle, health and diet was collected through interviews. Women with ILBC included in this Table 4 Pooled hazard ratios for the association between methylation levels at the ten most VMRs and overall survival: meta-analysis of the MCCS and TCGA results *Gene: Gene associated with the variably methylated regions (VMRs), most of the VMRs were located in the promoter region of the genes, HR hazard ratio, CI confidence interval  Clinical and pathological characteristics of the study participants are listed in Table 1.

Endpoints
Incidences of cancer cases and deaths in the MCCS participants are regularly updated by linkage to the Victorian and national cancer and death registries, which are considered to be virtually complete. The latest linkage was completed on 31 March 2017 and death data were considered to be complete up to 31 December 2016. Overall survival was defined as the time (in years) from breast cancer diagnosis to death (from any cause) or end of follow-up.

DNA extraction from formalin-fixed paraffin embedded breast tumour tissue
Pathology material related to each ILBC case had previously been retrieved from the diagnostic service laboratory and reviewed by qualified pathologists. Unstained sections had been prepared and stored desiccated at 4 °C for up to 20 years. DNA extraction was conducted as described in Wong et al. [80]. Briefly, the tumour areas most suitable for macrodissection were identified  [81] and recorded by directly marking up representative H&E stained sections. An average of two corresponding 3um methyl green stained FFPE sections were macrodissected as described in Wong et al. [82], and DNA was extracted using the QIAamp DNA FFPE protocol. Tumour purity was estimated using the R tool Infini-umPurify [83] that takes methylation beta-values of the tumour samples and uses the methylation levels of pre-selected informative differentially methylated CpG sites (iDMCs) identified from TCGA data (when data from normal-adjacent tissue are not available) to estimate tumour purity for each tumour sample by density evaluation of Gaussian kernel. Tumour purity estimates, obtained as the proportion of tumour cells in each sample, were high, ranging from 37 to 88% across samples; 88% of the samples had an estimated tumour purity greater than 50%.

Genome-wide DNA methylation profiling
Genome-wide DNA methylation was measured using the HumanMethylation450K (HM450K) BeadChip array (Illumina). For each sample, a total of 300-500 ng of tumour DNA was bisulfite-converted using Zymo Gold EZ-DNA kit (Irvine, CA) and restored using the DNA Restoration Kit as per the manufacturer's instructions (Illumina, CA, United States). Sample DNA quantity was assessed using an in-house modified quality control protocol [80]. Samples that passed the final quality check were run on the HM450K array (Illumina) according to manufacturer's instructions.

Data pre-processing and normalisation
Raw intensity files (IDAT files) were imported into the R computing environment using the Bioconductor package minfi [84], and all samples were pre-processed and normalised together. Data quality was first evaluated by assessing the detection P-value, which was obtained for every CpG site in every sample. Samples with an average detection P-value > 0.01 were considered poor quality and were removed from further analysis. CpG probes with a detection P-value > 0.05 in at least one sample were considered unreliable and were removed from further analysis. Data were normalised using the minfi functional normalisation (FNORM) method to correct for both within-array (technical bias between type I and type II probes) as well as between-array unwanted variations [85]. After data pre-processing and normalisation, a total of 449,005 CpG sites remained for analysis. Betavalues (ratio of the methylated probe intensity and the sum of methylated and unmethylated probe intensity) and M-values (log 2 beta-value) were calculated. M-values were used in all statistical analyses, while beta-values were used for data exploration and visualisation, as suggested in [86].

TCGA data
Raw DNA methylation data (IDAT files) for 168 ILBC cases were downloaded from the TCGA legacy database (Study Accession: phs000178) using the R package TCGABiolinks [87]. Methylation data were pre-processed and normalised similarly to the MCCS, and methylation values (beta-values and M-values) were calculated for 168 ILBCs at 440,380 CpG positions across the genome. Survival data were retrieved for 159 (95%) ILBC cases. Gene expression data in the form of normalised counts (RNA sequencing-Illumina Hi-Seq) were retrieved for 159 (95%) ILBC cases. Cases of ILBC in the TCGA dataset were diagnosed between 1992 and 2013. Clinical characteristics of the TCGA samples are listed in Table 1.

Statistical analysis Variable methylation analysis
Variable methylation analysis was performed using the DMRcate package in R [88]. To identify the variably methylated regions (VMRs), the variance of M-values was computed across 130 ILBCs in the MCCS, and Gaussian smoothing was applied to the resulting per-CpG-site test statistics using the default DMRcate options. DMRcate uses the method of Satterthwaite to smooth test statistics and derive respective P-values. Nearby significant CpG sites were collapsed in clusters using a bandwidth of 1000 base pairs (bp). The clusters that showed the highest variability in DNA methylation (i.e. regions with a minimum adjusted P-value (minfdr) of less than 10 −8 ) were defined as the VMRs. There were 396 solo CpGs that were not included in the VMR calculation. This analysis was replicated using the 168 ILBC samples from TCGA.

Gene set enrichment analysis
Gene set enrichment analysis was performed on all the genes associated with the VMRs using the web-based tool Metaspace using the default settings [89]. Pathway and gene set enrichment analysis were carried out using the KEGG Pathway database [90]. All genes in the human genome were used as the enrichment background. Pathways and biological terms with a P-value < 0.01, a minimum count of 3 and an enrichment factor > 1.5 (the ratio between the observed counts and the counts expected by chance) were selected and grouped into clusters.

Survival analysis
Survival analyses were undertaken for the ten most variably methylated regions identified across the MCCS ILBC samples. Follow-up started at the date of diagnosis and ended at the date of death or end of follow-up, whichever came first. Cox proportional hazards regression models were used to calculate hazard ratios (HRs) and 95% confidence intervals (CI) for the association between DNA methylation levels (M-values) and risk of death. Three models were fitted: (1) univariable, with DNA methylation as a crude predictor, and multivariable, (2) with additional adjustment for age at diagnosis and (3) with adjustment for age at diagnosis and tumour stage. For each VMR, the methylation level was defined as the average methylation value across all CpG sites covering the VMR. The same analysis was carried out using the 168 ILBC samples from TCGA. Survival analyses were undertaken using the R package Survival [91]. HRs from the two individual studies were then pooled using fixedeffects meta-analysis with inverse variance weights.

Association with gene expression
To test if DNA methylation correlated with gene expression at the ten strongest VMRs (identified in the MCCS), we assessed the correlation between average methylation levels (average M-values for all CpGs covering a VMR) and gene expression levels using Pearson's correlation;we used matching gene expression and DNA methylation data available in the TCGA dataset for nine of the ten strongest VMRs.The correlations with gene expression were also assessed for individual CpG sites of each VMR.