DNA methylation in ductal carcinoma in situ related with future development of invasive breast cancer

Ductal carcinoma in situ (DCIS) is a heterogeneous, pre-invasive lesion associated with an increased risk for future invasive ductal carcinoma. However, accurate risk stratification for development of invasive disease and appropriate treatment decisions remain clinical challenges. DNA methylation alterations are early events in the progression of cancer and represent emerging molecular markers that may predict invasive recurrence more accurately than traditional measures of DCIS prognosis. We measured DNA methylation using the Illumina HumanMethylation450K array of estrogen-receptor positive DCIS (n = 40) and adjacent-normal (n = 15) tissues from subjects in the New Hampshire Mammography Network longitudinal breast imaging registry. We identified locus-specific methylation differences between DCIS and matched adjacent-normal tissue (95,609 CpGs, Q < 0.05). Among 40 DCIS cases, 13 later developed invasive disease and we identified 641 CpG sites that exhibited differential DNA methylation (P < 0.01 and median |∆β| > 0.1) in these cases compared with age-matched subjects without invasive disease. The set of differentially methylated CpG loci associated with disease progression was enriched in homeobox-containing genes (P = 1.3E-09) and genes involved with limb morphogenesis (P = 1.0E-05). In an independent cohort, a subset of genes with progression-related differential methylation between DCIS and invasive breast cancer were confirmed. Further, the functional relevance of these genes’ regulation by methylation was demonstrated in early stage breast cancers from The Cancer Genome Atlas database. This work contributes to the understanding of epigenetic alterations that occur in DCIS and illustrates the potential of DNA methylation as markers of DCIS progression.


Background
Ductal carcinoma in situ (DCIS) is a non-invasive precursor lesion to invasive ductal carcinoma and is frequently diagnosed upon screening mammography. In 2015, more than 60,000 DCIS diagnoses are expected in the USA, accounting for over 25 % of new breast cancer cases [1]. Importantly, the risk of developing invasive breast cancer among women with DCIS is significantly increased compared to the general population [2,3]. Standard treatment of DCIS focuses on preventing the development of invasive breast cancer with at least local surgical excision to remove the proliferative lesion, although more extensive treatment including mastectomy is sometimes performed [4,5]. The risk of local recurrence in patients treated with excision alone has been reported to be as high as 25 % over 10 years [6]. However, a majority of patients will never develop invasive disease once treated, and so there exists a growing concern that a subset of DCIS patients may be receiving excessive treatment [7,8]. Overtreatment can potentially subject the patient to unnecessary risk of treatment, without potential benefit, and burden the health care system [9]. Thus, identifying differential risks of developing invasive breast cancer among DCIS patients may improve patient outcomes and reduce healthcare costs.
The deregulation of epigenetic modifications, such as DNA methylation, is an early event in breast carcinogenesis [10,11,5,12]. The addition of a methyl group to a cytosine residue followed by a guanine (CpG) is recognized to be an important regulator of gene expression and chromosomal stability [13]. Although DNA methylation deregulation occurs early in carcinogenesis, the patterns of epigenetic alterations associated with development of invasive disease remain unclear. DNA methylation alterations are potential markers for DCIS outcome prediction and may identify cellular changes that modulate tumor progression. Previous studies have highlighted that DNA methylation alterations are widespread in invasive breast carcinoma and result in cellular dysfunction [14][15][16][17][18]. However, few studies have investigated the epigenetic alterations that occur in earlier stages of disease, and of those that have, many used a candidate gene approach [5,19,20,10,21,22]. Accordingly, the identification of DNA methylation alterations that can inform risk of subsequent invasive disease has the potential to both impact clinical treatment decisions and improve our understanding of cancer progression.
In this study, we identified subjects through the New Hampshire Mammography Network (NHMN) and investigated DNA methylation patterns in estrogen-receptor (ER) positive DCIS samples for their relation with time to diagnosis of invasive breast cancer. We have made use of a unique resource in the NHMN, which follows patients longitudinally, to examine DCIS of women who did and did not develop invasive disease over the same period of time. Here, we have used genome-wide DNA methylation arrays to comprehvensively identify DCIS-specific alterations in DNA methylation that differ in these two groups. We subsequently extended our findings to independent populations of DCIS and invasive ductal carcinomas to further explore disease-specific methylation deregulation and provide evidence for the potential impact of these alterations on gene expression [15]. Through these analyses, we have identified epigenetic changes in ER positive DCIS that may contribute to increased risk of developing invasive breast cancer.

Results
Unsupervised clustering of DNA methylation in ductal carcinoma in situ Patient demographic data and DCIS pathologic characteristics are presented in Table 1. To characterize DNA methylation patterns of DCIS tissues (n = 40) and available matched adjacent-normal samples (n = 15), we measured genome-wide DNA methylation using the Infinium HumanMethylation450 BeadArray. The scheme of our analysis strategy for identifying deregulated methylation in DCIS is depicted in Additional file 1: Figure S1. First, we performed unsupervised hierarchical clustering of the 10,000 most variable DNA methylation loci and observed two distinct clusters of methylation profiles among DCIS samples ( Figure. 1). The optimal number of two clusters was verified by a resampling based unsupervised consensus clustering method (Additional file 2: Figure S2) [23]. There was no significant enrichment for membership in either methylation cluster for DCIS grade, subsequent diagnosis of invasive breast cancer, or subject age. Next, we assessed locus-specific patterns of differential methylation between DCIS tissues and matched adjacent-normal breast tissue at single base resolution fitting linear mixed effects models across 55 tissues (40 DCIS and 15 adjacent normal). This analysis revealed profound alterations in global methylation patterns (Additional file 3: Figure S3) with the identification of 95,609 differentially methylated CpGs significant after correction for multiple comparisons (Q < 0.05, 20.3 % of CpGs).

Methylation patterns and development of invasive breast cancer
To identify DNA methylation related with subsequent development of invasive breast cancer, we fit Cox proportional hazards models to examine the association between time-to-event data and the DNA methylation values at each CpG locus independently (beta-value) and adjusted for age and DCIS grade. We identified 641 CpGs, representing 397 genes, whose methylation was associated with development of invasive disease (P < 0.01, with at least 10 % change in methylation, Additional file 4: Table S1). The strongest associations with development of invasive ductal carcinoma (IDC) are presented in Table 2 and representative plots of cumulative incidence for high and low methylation groups (stratified by median DNA methylation values at a given CpG) are shown in Fig. 2. Notably, among the 641 CpG progression-associated loci there were 276 CpGs (Additional file 4: Table S1) that exhibited differential methylation between DCIS and adjacent normal as well as further directionally consistent methylation alterations in subjects that ultimately progressed to invasive disease (Additional file 5: Figure S4 for representative plots). As expected, a stratified analysis determined that the strength of association was stronger with an ipsilateral IDC to DCIS outcome compared with a contralateral IDC outcome (Additional file 6: Table S2). Among DCIS patients with a subsequent diagnosis of invasive breast cancer, sidedness was not related with the methylation clusters defined in Fig. 1.
We next sought to identify common biological processes, pathways, and molecular functions with enrichment among genes with progression-related methylation using the set of 641 CpG loci. Among the 375 Database for Annotation, Visualization, and Integrated Discovery (DAVID) genes that had at least one of the 641 differentially methylated CpGs, an overrepresentation of homeobox genes (8.54-fold enrichment, P = 1.3E-09) and limb morphogenesis genes (4.9-fold enrichment, P = 1.0E-05) were identified (Additional file 7: Table S3). A majority (70.3 %, Additional file 4: Table S1) of progressionrelated CpGs are located within or in close proximity to genes. Nearly 50 % of progression-related CpGs that exhibited losses in methylation were open sea probes, while only 9.5 % (Additional file 8: Table S4) tracked to CpG Islands. In contrast, 31.4 % of CpG loci with gains in DNA methylation tracked to CpG Islands (Additional file 8: Table S4). To further examine the nature of progression-related methylation in DCIS, we evaluated the potential enrichment of genomic features among the 641 CpG loci. We observed a substantial enrichment of polycomb group target genes (PCGTs) (P = 7.3E-07, Table 3), open sea probes (P = 1.5E-04, Table 3), and informatically predicted enhancer regions (P = 1.8E-22, Table 3) among progression-related CpG loci. CpG loci that were located in CpG islands were depleted among progression-related loci (P = 9.8E-07, Table 3), and we did not observe enrichment of progression-related CpGs to CpG island shores (P = 0.06, Table 3), CpG island shelves (P = 0.36, Table 3), or transcription factor binding sites (TFBSs) (P = 0.89, Table 3).

Extension to an independent set of DCIS and IDC samples
An independent cohort of unmatched ER positive pure DCIS (n = 17) and ER positive invasive ductal carcinomas (IDC, n = 115) with methylation data from the same array platform was identified to validate differential methylation of progression-related genomic regions between DCIS and IDC. While data on subsequent diagnosis of invasive breast cancer were not available for DCIS cases in the independent cohort, increasing invasive potential may be reflected in the methylation changes that exist between DCIS and IDC. Comparing DCIS and IDC in the independent cohort revealed 3000 CpGs that were differentially methylated (P < 0.01, at least 10 % change, Additional file 9: Table S5). Overall, four CpGs were significantly differentially methylated in both populations (P < 0.01, at least 10 % change, Additional file 10: Table  S6). In an expanded gene-level analysis, there were 397 genes related with the 641 CpG progression-associated loci. Among this gene list were 72 genes (18.1 %, Additional file 9: Table S5) that experienced differential methylation between DCIS and IDC in the independent cohort. An unsupervised clustering of DNA methylation from the 72 shared genes (77 CpGs) distinguished a subset of DCIS subjects who later developed invasive disease (Additional file 11: Figure S5). In the independent cohort results, 28 out of 72 genes had a methylation change in the same genomic region (i.e., promoter region, gene body region, and same direction as CpGs that were identified in the discovery cohort. Restriction to these 28 genes did not improve clustering (Additional file 11: Figure S5). A DAVID analysis revealed that genes associated with mesenchymal cell differentiation experienced the greatest enrichment among the 72 gene-set, but the finding did not reach statistical significance (1.73-fold enrichment, P = 0.72).

Assessment of methylation alterations and gene expression in a set of primary TCGA breast tumors
Next, we investigated the relation of methylation alteration with gene expression for progression-related CpGs. Methylation of progression-related CpGs was strongly correlated with gene expression in Stage I invasive breast cancers; 101 out of 384 CpGs were significantly correlated (P < 0.01, Additional file 12: Table S7). Among the strongest associations between methylation and expression were CpG sites associated with TMEM139, HOXB2, and TBX1 genes as well as the long non-coding RNA HOTAIR (Fig. 3). As expected, a large majority (87.1 %) of significant associations between methylation and gene expression were negative correlations. The dependency of regulation of gene expression by methylation on genomic context was also apparent. A majority of the CpGs whose methylation was positively correlated with expression were located in gene bodies (69.3 %), while those CpGs negatively correlated with expression were equally likely to be associated with promoter regions and first exons of genes (48.9 %) or the gene body (47.7 %).

Discussion
A challenge in the management of DCIS is variability in risk of developing invasive breast cancer. Decisions regarding treatment are complicated by a lack of reproducible clinical and pathologic factors that can reliably predict risk of future invasive disease after surgical excision. Identification of molecular alterations differentiating lesions that will remain indolent from those that will likely acquire an invasive phenotype begins to address a critical need to provide more accurate risk assessment. Emerging evidence from the initial stages of cancer development suggests that the patterns of DNA methylation, a stable mark capable of transcriptional control, are deregulated early and may serve as a predictor of malignant potential [19,21,5]. The present study aimed to characterize the patterns of epigenetic deregulation that occur in the early stages of breast cancer by investigating genome-wide methylation patterns in DCIS and adjacent-normal tissue. We identified methylation of CpG loci that were strongly associated with development of invasive breast cancer. Notably, some of these alterations exhibited further change when comparing DCIS with invasive tumors. The CpG loci related to disease progression were highly enriched for homeobox family of proteins and genomic features such as polycomb group gene targets. Importantly, CpGs with progression-related changes in methylation in DCIS had strong correlations with gene expression in an independent set of early stage ER positive breast cancers, suggesting that differential methylation of these sites may contribute to an increased risk of invasion through aberrations in gene expression.
DCIS represents a direct precursor to invasive ductal carcinoma. Altered patterns of DNA methylation have the ability to the ability to modify the regulation of gene expression, impact chromosomal instability, and have frequently been observed in the early stages of breast cancer [18]. However, the precise timing of molecular alterations during tumorigenesis and how these changes to the breast epigenome may influence the risk of becoming invasive cancer remain incompletely defined. In prior work and in this study, extensive epigenetic differences were observed between adjacent-normal and DCIS tissues, confirming that pre-invasive DCIS lesions harbor a high degree of disruption in the methylation patterning of For each 10 % change in methylation, Cox proportional hazards models were adjusted for subject age and DCIS grade ductal epithelial cells [24]. The striking difference between DCIS and tissues with no histologic evidence of malignancy provides additional evidence that epigenetic alterations in breast cancer are early events. Importantly, the delineation of methylation changes in DCIS compared to normal tissue helps to differentiate those early events that prime cells with oncogenic insults from later events (i.e., events measured in invasive breast cancer) that may increase malignant potential. It is clear that pronounced molecular alterations occur during the normal to DCIS transition; however, genetic alterations to explain the divergent gene expression patterns between in situ and invasive breast carcinomas have not been observed [25]. Epigenomic changes represent potential molecular attributes that define whether a carcinoma remains indolent or shifts toward an invasive phenotype. Indeed, we observed that altered DNA methylation associated with an invasive disease diagnosis preferentially targeted gene groups were involved with key developmental pathways. For instance, DNA methylation of informatically predicted enhancer regions was enriched among progression-related CpGs. Interestingly, dysregulation of DNA methylation at these distal regulatory sites has previously been related with the expression of cancer genes [26]. Homeobox genes and other developmental transcription factors become preferential targets of de novo methylation in DCIS, consistent with previous associations between homeobox gene methylation and recurrence in invasive breast cancers [21,27]. Together, these data provide evidence that perturbations in methylation among homeobox genes may promote a transformed and invasive phenotype. Similarly, biological processes that were not shown to be enriched via DAVID, but present among the differentially methylated loci, included Polycomb group protein target genes (PCGTs). Polycomb group proteins reversibly repress their target genes that are required for differentiation and necessary to promote the self-renewal potential of stem cells [28]. In our study, the PCGTs were substantially more likely to have progression-specific DNA hypermethylation than non-targets providing evidence that acquisition of DNA methylation at these genes could predispose cells to malignant transformation. In summary, genes involved in developmental processes are dysregulated early in disease development and may Fig. 2 Cumulative incidence of invasive breast cancer diagnosis stratified by the median methylation in DCIS of each CpG into high or low methylation groups. a Increased methylation in DCIS of a CpG in the south shore of a HOXB13 promoter CpG island is associated with invasive breast cancer outcome (HR = 1.86, 95 % CI, 1.37-2.53) (high-methylation beta-value range, 0.14-0.71, low-methylation beta-value range, 0.03 -0.13). b Increased methylation in DCIS of a gene body CpG island site in EN1 is associated with invasive breast cancer outcome (HR = 3.61, 95 % CI, 1.90-6.88) (high-methylation beta-value range, 0.27-0.58, low-methylation beta-value range, 0.069 -0.27). c Increased methylation in DCIS of a gene body CpG island site in DLX4 is associated with invasive breast cancer outcome (HR = 1.92, 95 % CI, 1.37-2.69) (high-methylation beta-value range, 0.31-0.78, low-methylation beta-value range, 0.078-0.31). d Increased methylation in DCIS of a CpG site in the south shore of a CpG island in the 5′'UTR of TBX15 is associated with invasive breast cancer outcome (HR = 2.12, 95 % CI, 1.43-3.15) (high-methylation beta-value range, 0.42-0.82, low-methylation beta-value range, 0.13 -0. 39) represent critical events that are exacerbated as ductal cells progress to a more malignant phenotype.
At the gene level, the strongest locus-specific associations with progression tracked to genes with previously demonstrated involvement in carcinogenesis. For example, overexpression of EN1 is associated with pro-survival in basal-like breast cancer [29]; HOXB13 hypermethylation has been shown to be a late event in breast tumorigenesis and is associated with invasiveness [30,31]; increased expression of DLX4 reduces invasion in vitro and metastasis in vivo [32]; and increased TBX15 methylation was a poor prognostic indicator in prostate cancer [33]. Moreover, there were 276 CpG loci including those that track to the HOXB13, DLX4, and TBX15 genes that exhibited a significant gain or loss of methylation from normal to DCIS and were further hypermethylated or hypomethylated in lesions from women with a subsequent diagnosis of invasive disease. Additional deregulation of methylation at these sites suggests that a greater proportion of cells harbor alterations associated with an invasive phenotype. Based on these preliminary results, it seems reasonable to envision a clinical scenario where leveraging both pathologic and molecular characteristics of DCIS and surrounding normal tissues will contribute to decisions of whether more or less aggressive treatment and/or monitoring is warranted.
The results from comparing methylation in an independent set of DCIS tissues with IDC indicated that differences between invasive disease and DCIS overlap with those that potentiate development of an invasive phenotype. Our DAVID analysis-despite being best suited for gene sets between 100-2000 genes-of the 72 genes shared between data sets suggests that epigenetic deregulation of genes involved in cellular development and differentiation is likely a defining feature progression to invasive disease [34]. Further, alterations that are critical for development of an invasive phenotype may occur in DCIS and once the cancerous cells escape from the duct additional epigenetic programming may be required for further progression or metastasis. Regardless, in early stage invasive breast tumors from TCGA, we observed that progression-related CpGs exhibit strong correlations with gene expression. Our results provide evidence that the acquisition or loss of DNA methylation at the identified progression-related CpGs has a functional impact on gene expression. Examples of potential critical methylation changes include the following: GATA2 expression (methylation was positively correlated with expression among the TCGA samples) has been shown to be elevated in breast cancer and serves to repress PTEN expression [35], HOXB2 for which increased methylation is associated with bladder cancer invasiveness [36], and the long non-coding RNA HOTAIR, which demonstrated increases in methylation associated with progression and was also positively correlated with expression in TCGA tumors. Interestingly, HOTAIR is a long noncoding RNA responsible for Hox gene silencing. HOTAIR is increased in expression in primary breast tumors and metastases, and loss of HOTAIR can inhibit cancer invasiveness [37]. Early methylation events that target homeobox genes in DCIS may, in part, be responsible for the aberrant gene expression of homeobox genes that has been observed in cancer [38]. Overall, we observed that progression-related methylation patterning is strongly related with transcription of genes that are known to be associated with the invasive phenotype.
The major objective of our work was to determine epigenetic events that contribute to development of invasive breast cancer. Our study is strengthened by the use of time-to-event data for subsequent diagnosis of invasive breast cancer. The longitudinal aspect of the cohort is an important distinction from previous cross-sectional approaches that have examined DCIS and invasive disease with a lack of clinical follow-up. In spite of this, each cohort in our investigation had a limited sample size. Therefore, future studies of methylation alterations in additional DCIS specimens with follow-up data are needed to further confirm and more precisely define potential biomarkers of risk stratification among DCIS subjects. In addition, future population-based studies that collect samples from the same individual across the spectrum of disease and investigate the relation of DNA methylation with breast cancer risk factors may also be informative for enhanced prediction of risk for disease progression.

Conclusions
Our approach identified methylation alterations present in DCIS that may contribute to breast tumorigenesis and development of invasive disease. Progression from DCIS to invasive disease represents a substantial increase in the risk of breast cancer mortality, and identification of disease progression markers for DCIS is needed. We have provided evidence for methylation alterations that are associated with development of invasive disease and defined specific epigenetic programs in DCIS disease progression that are particularly susceptible to deregulation. Further examination of these alterations is warranted to demonstrate their potential utility in defining risk for subsequent invasive disease and thus impact treatment decision-making.

Availability of supporting data
Data for all NHMN DCIS DNA methylation microarray experiments are available on the National Center for Biotechnology Information's gene expression omnibus under the accession number GSE66313.

Study design and patient population
Pure ductal carcinoma in situ (DCIS) samples for 43 subjects were identified through the New Hampshire Mammography Network (NHMN), under the approval of the Institutional Review Board. The NHMN is a statebased mammography registry that records information from breast imaging exams and subsequent pathology, cancer, and vital status outcomes for consenting women [39]. For the present report, the analyses were based on samples collected from women who underwent resection of a breast lesion at Dartmouth-Hitchcock Medical Center (Lebanon, NH, USA). Three DCIS samples were removed from analysis in the quality assessment and control steps of methylation data processing detailed below. Among ER positive DCIS cases (n = 40), there were 13 patients with a subsequent diagnosis of invasive ductal carcinoma (IDC) and 27 age-matched subjects with similar follow up time who did not have a subsequent diagnosis of IDC. The time to progression and date of last known follow-up for those without subsequent diagnosis were recorded; median clinical follow-up for this cohort was 7 years. Slides from all subjects underwent central pathology review by a breast pathologist (JDM) to confirm diagnosis of ductal carcinoma in-situ and record histopathologic features. Multiple (at least two) 2-mm tissue cores were taken from the archived formalin-fixed paraffin-embedded (FFPE) blocks from selected areas of DCIS and adjacent-normal tissue for DNA extraction and bisulfite modification. The available block with the greatest, homogeneous amount of target tissue was selected for DNA extraction. The patient-matched normal breast tissues adjacent to DCIS did not have any histologic evidence of malignancy and were randomly selected from 15 subjects. Patients were not treated with neoadjuvant therapy, and subsequent invasive ductal carcinoma samples from matched DCIS subjects were not evaluated in this study. Material for the independent population was obtained from ER positive pure DCIS (n = 17) and ER positive IDC (n = 115) as described in GSE60185 [24]. All samples were taken in compliance with the Helsinki Declaration.

Array-based DNA methylation assessment and quality control
Formalin-fixed paraffin embedded tissue samples were disrupted before subsequent DNA purification using the TissueLyserII (Qiagen, Valencia, CA, USA) for 1 min at 30 Hz. DNA was then isolated using the QIAamp DNA formalin-fixed paraffin embedded tissue kit (Qiagen, Valencia, CA, USA) according to the manufacturer's protocol. Genomic DNA was bisulfite modified using the EZ DNA methylation kit (Zymo Research, Irvine, CA) and bisulfite-converted DNA from formalin-fixed paraffin embedded (FFPE) samples were processed as described in the Infinium FFPE Restoration guide (Illumina, San Diego, CA, USA) [40][41][42]. A recent study established the minimal effect that FFPE restoration has on methylation values via the demonstration of a very strong correlation between methylation β-values from paired fresh frozen and FFPE tissues [42]. After FFPE restoration, the resulting material was used as input for the hybridization on the Infinium HumanMethyla-tion450 BeadChip (Illumina, San Diego, CA, USA), which has demonstrated high concordance with other bisulfite modification techniques such as pyrosequencing [17,[43][44][45]. Samples were randomized to plates and subjected to epigenome-wide DNA methylation assessment. The methylation status for each CpG locus was calculated as the ratio of fluorescent signals (β = Max (M, 0) / [Max(M,0) + Max(U,0) + 100]), ranging from 0 (non-methylated) to 1 (completely methylated), using average probe intensity for the methylated (M) and unmethylated (U) alleles. For preprocessing of the methylation data, we used the Chip Analysis Methylation Pipeline (ChAMP version 0.98.3) package in R [46]. Briefly, ChAMP takes the raw IDAT files from the arrays using data import, assesses the quality of probes and samples, provides adjustment for probe type bias, and allows for adjustment for batch-effects. As a part of the quality control step, three DCIS samples were removed because greater than 20 % of probes among these samples had a detection P-value greater than 0.01. In total, there were 40 DCIS and 15 adjacent-normal samples that passed quality control (QC) and were included in subsequent analyses. Probes were dropped from future analyses if the median detection P-value was >0.01 across all samples, which left a total of 397,600 out of 485,577 probes that passed QC. In the ChAMP package, Fig. 3 Genes whose expression levels correlated with methylation level of progression-related CpGs from Stage I ER positive TCGA breast tumors. Bubble points represent genes that demonstrate a significant relationship between methylation of a CpG and expression of its nearby gene in a genomic-context dependent manner. Increasing bubble diameter corresponds with decreasing P value. CpG position relative to gene location is plotted versus correlation coefficient between CpG methylation and gene expression we implemented beta-mixture quantile normalization (BMIQ) to adjust the data for bias introduced by the Infinium type 2 probe design [47]. Next, we corrected for potential batch-effects by applying the ComBat normalization method using the R-package "sva" [48]. Data for all DNA methylation microarray experiments are available on National Center for Biotechnology Information's gene expression omnibus [35] in accordance with MIAME under the accession number GSE66313.

Statistical analysis Data assembly
All methylation data were analyzed using the R software environment, version 3.0.3 (www.r-project.org).

Unsupervised hierarchical clustering
Hierarchal clustering was based on Manhattan distance and average linkage of the 10,000 most variable CpG loci. The optimal number of clusters was established based on 1000 resampling interactions of K-means clustering for K = 2,3,4,5, with Euclidean distance being the distance metric implemented in the "ConsensusCluster-Plus" R package [49].

Locus-by-locus analysis for detecting differentially methylated CpG loci
Locus-specific patterns of differential methylation between DCIS tissues (n = 40) and matched adjacentnormal breast tissue (n = 15) were identified via linear mixed effects models fit to each CpG independently, modeling logit-transformed β-values as the dependent variable and tissue-type and subject age as the independent variables. The linear mixed effects models included a random effect term for subject to account for repeat measures on the same subject. P values obtained from our linear mixed effects models were adjusted for multiple testing using false discovery rate estimation. The Q values were computed by the "qvalue" package in R and a Q value < 0.05 was deemed statistically significant. To identify epigenetic alterations associated with development of invasive disease among DCIS patients, locusspecific differences in DNA methylation of DCIS samples were examined using Cox proportional hazards models adjusted for subject age and DCIS grade (low/intermediate or high) [50]. To reduce Type I error and maintain sensitivity of discovery, we selected CpGs whose methylation was associated with development of invasive breast cancer at P < 0.01 and then limited our analyses to those CpG sites that exhibited a median |Δβ| greater than 0.1. We also implemented a stratified Cox proportional hazards analysis to explore whether there were different strengths in association between cases of ipsilateral (n = 7) and contralateral (n = 6) to DCIS.

Independent cohort of pure DCIS and pure IDC
Preprocessing and normalization of this data have previously been described above (ChAMP), and the raw data is available in Gene Expression Omnibus with accession number GSE60185. To identify epigenetic alterations that differentiate disease states (i.e., IDC compared with DCIS), we fit via linear models to each CpG independently, modeling logit-transformed β-values as the dependent variable and disease state as the independent variable.

The Cancer Genome Atlas data
Level 3 normalized Illumina Infinium Human Methyla-tion450 BeadChip and RNASeqV2 rsem.genes.normali-zed_results data were downloaded from the TCGA (http://cancergenome.nih.gov/). All Stage I ER positive breast cancers from Caucasians were selected (n = 65). Expression data and CpG sites from the 641 differentially methylated loci were paired by gene symbol for the 65 samples, resulting in a total of 384 unique methylation and expression pairs used in the correlation analysis. The relation of methylation with gene expression was evaluated with Spearman correlations.

Enrichment analyses of biological pathways and common sequence features
The Database for Annotation, Visualization, and Integrated Discovery was used for an analysis of molecular pathways [34,51]. In the DAVID analysis, the set of genes represented on the Illumina HumanMethyla-tion450 array that remained after QC was used as the referent set and the set of genes associated with 641 progression-related CpG loci composed the gene set tested. Enrichment analyses of common sequence features among progression-related CpG loci were performed using two-tailed Fisher's exact tests. The Polycomb group target genes (PCGT) status for a given CpG loci was based on whether the gene associated with that CpG was described as a PcG target in previously published works [52][53][54][55]. Putative transcription factor binding sites (TFBSs) located within 50 bp of progressionassociated methylation of loci were obtained from the tfbsConsSites track of the UCSC Genome browser [56].

Additional files
Additional file 1: Supplemental Figure S1. Diagram of analytic strategy used to identify deregulated epigenetic patterns in DCIS and their relation with invasive breast cancer development.
Additional file 2: Supplemental Figure S2. Consensus clustering of 40 samples and 10,000 most variable CpGs identified 2 clusters as the optimal number.
Additional file 3: Supplemental Figure S3. Results from the locus-bylocus examination of differential methylation between DCIS and normaladjacent to DCIS tissues.