Epigenome-wide association study reveals decreased average methylation levels years before breast cancer diagnosis
- Karin van Veldhoven†1, 2,
- Silvia Polidoro†2,
- Laura Baglietto2,
- Gianluca Severi2,
- Carlotta Sacerdote2,
- Salvatore Panico3,
- Amalia Mattiello3,
- Domenico Palli4,
- Giovanna Masala4,
- Vittorio Krogh5,
- Claudia Agnoli5,
- Rosario Tumino6,
- Graziella Frasca6,
- Kirsty Flower7,
- Ed Curry7,
- Nicholas Orr9,
- Katarzyna Tomczyk9,
- Michael E. Jones10,
- Alan Ashworth8,
- Anthony Swerdlow9, 10,
- Marc Chadeau-Hyam1,
- Eiliv Lund11,
- Montserrat Garcia-Closas8, 10,
- Torkjel M. Sandanger11,
- James M. Flanagan†7Email author and
- Paolo Vineis†1, 2
© van Veldhoven et al. 2015
Received: 1 April 2015
Accepted: 29 June 2015
Published: 4 August 2015
Interest in the potential of DNA methylation in peripheral blood as a biomarker of cancer risk is increasing. We aimed to assess whether epigenome-wide DNA methylation measured in peripheral blood samples obtained before onset of the disease is associated with increased risk of breast cancer. We report on three independent prospective nested case-control studies from the European Prospective Investigation into Cancer and Nutrition (EPIC-Italy; n = 162 matched case-control pairs), the Norwegian Women and Cancer study (NOWAC; n = 168 matched pairs), and the Breakthrough Generations Study (BGS; n = 548 matched pairs). We used the Illumina 450k array to measure methylation in the EPIC and NOWAC cohorts. Whole-genome bisulphite sequencing (WGBS) was performed on the BGS cohort using pooled DNA samples, combined to reach 50× coverage across ~16 million CpG sites in the genome including 450k array CpG sites. Mean β values over all probes were calculated as a measurement for epigenome-wide methylation.
In EPIC, we found that high epigenome-wide methylation was associated with lower risk of breast cancer (odds ratio (OR) per 1 SD = 0.61, 95 % confidence interval (CI) 0.47–0.80; −0.2 % average difference in epigenome-wide methylation for cases and controls). Specifically, this was observed in gene bodies (OR = 0.51, 95 % CI 0.38–0.69) but not in gene promoters (OR = 0.92, 95 % CI 0.64–1.32). The association was not replicated in NOWAC (OR = 1.03 95 % CI 0.81–1.30). The reasons for heterogeneity across studies are unclear. However, data from the BGS cohort was consistent with epigenome-wide hypomethylation in breast cancer cases across the overlapping 450k probe sites (difference in average epigenome-wide methylation in case and control DNA pools = −0.2 %).
We conclude that epigenome-wide hypomethylation of DNA from pre-diagnostic blood samples may be predictive of breast cancer risk and may thus be useful as a clinical biomarker.
KeywordsEWAS Methylation Risk Biomarker Breast cancer Peripheral blood
Differences in DNA methylation observed in human tumour tissue compared to normal tissue were reported 30 years ago . Early reports showed hypomethylation of oncogenes in several carcinomas versus healthy tissues [2, 3]. Numerous studies since have established that hypermethylation, mainly of CpG islands (CGIs) on promoters of tumour suppressor genes [4, 5], and global (or genome-wide) hypomethylation in tumours relative to non-tumorous tissues occur in a wide variety of cancers [6, 7].
Despite the fact that most studies have measured global methylation in repetitive elements, other studies suggest that hypomethylation in cancer is not just limited to repeats but also occurs in gene regions [8–10]. In tumour DNA, Irizarry et al. found hypomethylation of CpG shores, but not of CpG islands, and Hansen et al. reported hypomethylated blocks across the epigenome [11, 12]. It was not the presence of repetitive sequences but rather of these hypomethylated blocks across unique sequences, which caused most of the overall hypomethylation in tumours [11, 12]. For this reason, we hypothesised that it would be possible to use the Illumina Infinium HumanMethylation450 (HM450) BeadChip array to assess genome-wide methylation levels. This array measures DNA methylation at approximately 485,000 CpG sites distributed across the entire genome, including CpGs on islands, shores, and shelves, as well as gene promoters and bodies, intergenic regions, and other areas . This covers ~1.5 % of the 28 million CpG sites known in the genome.
In the last few years, there has been increasing interest in using blood samples to measure DNA methylation in cancer cases and controls [14, 15]. The most robust candidate gene studies have used pre-diagnostic blood samples to report associations between breast cancer risk and methylation of ATM and BRCA1 genes [16–18]. However, most previously conducted studies—including genome-wide studies—have been retrospective, cross-sectional studies. A recent review and meta-analysis concluded that there could be great potential for DNA methylation in peripheral white blood cells (WBCs) as a biomarker for cancer risk when total 5-methylcytosine levels were measured; however, methylation measured by surrogate assays for repetitive elements was not associated with cancer risk, and factors such as study design and data analysis methods were often suboptimal [19, 20]. In addition, two other reviews highlighted challenges such as sample selection and population choice when planning epigenome-wide association studies (EWAS) [21, 22].
In the current study, we describe the results of nested case-control studies from three prospective cohorts in which we measured genome-wide methylation in peripheral WBCs of subjects who later developed breast cancer compared to subjects who remained cancer free during follow-up. We also compare our results with a recent report from the Melbourne Cancer Cohort Study (MCCS) that has used the same Illumina 450k methodology as our study and reported a significant association between epigenome-wide methylation and breast cancer risk (odds ratio (OR) per 1 SD = 0.69 (0.50–0.95, p = 0.02) . We estimated genome-wide methylation from the 450k methylation array and from overlapping CpG sites in whole-genome bisulphite sequencing, positing that genome-wide hypomethylation may be present before diagnosis and could be useful as a biomarker for early detection or risk of breast cancer.
Epigenome-wide hypomethylation is associated with risk of breast cancer
Association between average methylation and breast cancer risk in EPIC and NOWAC
(95 % CI)
Per 1 SD
Time to diagnosis
p het = 0.483
p het = 0.725
Per 1 SD
Time to diagnosis
p het = 0.351
p het = 0.276
Average methylation and breast cancer risk in four studies
Association between global methylation and breast cancer risk by CpG genomic feature per 1 SD in EPIC
# CpG loci
(95 % CI)
Including all probes
Excluding SNP probes
8.93 × 10−6
7.44 × 10−6
Gene region feature category
1.58 × 10−5
2.90 × 10−7
6.86 × 10−5
Association between principal components and subject variables in EPIC
% of variance explained
Minimum p value chips
Physical activity (cat)
Age at menarche (cat)
Age at menopause
Comparison of study-specific probe signatures associated with breast cancer risk
In the EPIC cohort using conditional logistic regression, we identified 26 probes significantly associated with breast cancer risk (p < 1.2 × 10−7) (Additional file 1: Table S7), and in the NOWAC study, we identified 0 significant probes (p < 1.2 × 10−7) and could not replicate the 26 probes identified in EPIC. Similarly to a previous study , we found that the majority of probes were hypomethylated in cases compared with controls in the EPIC cohort, consistent with the overall epigenome-wide hypomethylation.
In this study, we report genome-wide hypomethylation among breast cancer cases compared with matched controls in three out of four cohorts using the Illumina 450k array and WGBS. Specifically, in EPIC-Italy, hypomethylation was observed in gene body probes but not in gene promoters. This association was not associated with time to diagnosis indicating that it is unlikely to be attributable to an early process of carcinogenesis. We have further evaluated these findings using WGBS of pooled DNA samples from cases and controls. Results were consistent with overall genome-wide hypomethylation in cases compared to controls, specifically in gene body sequences compared with CpG islands. Principal component analysis in the EPIC cohort highlighted other factors that may impact on genome-wide methylation, such as age, menopausal status, and folate levels.
The significant heterogeneity between the three Illumina 450k studies was primarily driven by results from the Norwegian population (NOWAC) that differed from those in the Italian (EPIC) or Australian (MCCS) populations. This could be explained by differences in the distribution of environmental, lifestyle, or other subject characteristics. We observed differences in the distribution of several breast cancer risk factors between EPIC and NOWAC, which might explain the heterogeneity of results, including mean age, weight, height, smoking status, and menopausal status (Additional file 1: Table S8). There is also a significant difference in follow-up time in EPIC (mean 8.9 years (range 0.04–15.7 years)) compared to NOWAC (mean 4.8 years (range 3.1–6.6 years)) (Additional file 1: Table S8). However, further studies will be needed to confirm the association between epigenome-wide methylation and breast cancer risk and the possible modification by menopausal status which was associated with principle components of methylation variation in both NOWAC and EPIC.
Previous studies assessing peripheral blood DNA methylation and breast cancer risk have produced inconsistent results. Most studies assessing “global methylation” used a retrospective or cross-sectional design and did not measure sequence-specific genome-wide methylation, but rather methylation in various repetitive elements (such as LINE-1, ALU, and Sat2) as a surrogate measure, using different types of assays and methods, making it difficult to compare results across studies [19, 20]. One of the few large prospective breast cancer studies that assessed genome-wide levels of LINE-1 DNA methylation in three independent cohort studies (each consisting of >200 cases and >200 controls) using pre-diagnostic blood samples concluded that there was no difference between cases and controls in LINE1 methylation, even after adjustment for confounding . In contrast to this, a recent report from the prospective Sister Study (n = 294 cases) shows hypomethylation in LINE1 associated with breast cancer risk . These conflicting results suggest that new standardised methods are required to interpret and analyse epigenome-wide methylation using repetitive element assays .
Overall, the mean genome-wide methylation level is ~0.2 % lower in cases compared to controls, which may be interpreted as representing a larger difference in a smaller proportion of probes. Several studies have previously reported genome-wide signatures of breast cancer using the 450k array  or the predecessor 27k array [24, 28]. Using the 27k array, one study reported 250 CpG sites to be differentially methylated between 289 cases and 612 controls in pre-diagnostic blood samples, of which the majority (75%) were hypomethylated in cases compared with controls . Another report identified a 92-probe signature (FDR q < 0.05) with a larger signature of n = 1850 probes (raw p < 0.037) also reported in the study . Using our data from the 450k array, we have attempted to replicate independently the specific probes reported in these two 27k studies (with the majority of 27k probes also present on the 450k array) but have failed to find any overlap between the two 27k studies [24, 28] and the two 450k studies (EPIC and NOWAC), with the direction of changes not significantly different to chance for these probes. These differences may be attributable to differences in the subject populations and tumour pathologies but most likely due to low power in each of these studies. These data indicate that the study-specific signatures reported here and in other reports do not yet converge on a robust and validated set of individual probes further supporting the need to increase the study sizes to identify robustly individual CpG sites associated with breast cancer risk before proceeding with extensive validation of these top hits.
While the majority of investigators have predominantly used whole blood DNA for epigenetic epidemiology studies, it is well known that the epigenetic state for various subsets of CG sites in the genome are dependent on blood cell type, age, and various exposures . While various methods can be used to account for each of these possibilities, such as excluding the probes affected or adjusting for the confounders, these are not always perfect. Our results suggest that the association with risk is unlikely to be explained by a different white blood cell composition among cases and controls as there was no change in the results with or without accounting for blood cell type. However, these analyses do not adjust for immune cell activation and clonal expansion which might also contribute to epigenetic variation in white blood cell DNA samples as reported recently . The most appropriate study design to address these limitations would be to collect blood samples and sort into different cell types prior to storage in a prospectively collected cohort with many years of follow-up to accumulate incident cancer cases.
We observed hypomethylation for CpGs located on shores and shelves of CpG islands and in gene bodies but not in promoters, supporting the lack of variability in CpG island promoters . Like many previous studies, we also observed hypomethylation of probes that map to all categories of repetitive elements (data not shown). However, our observation of increasing hypomethylation across the whole genome with increasing breast cancer risk, measured both continuously and categorically, supports the hypothesis that hypomethylation is not restricted to repetitive elements but includes all areas of the genome [11, 30]. One hypothesis for a mechanism driving this hypomethylation is a general deficiency in methylation enzymes or substrates due to the complex interaction between folate, alcohol use, and one-carbon metabolism genes in relation to breast cancer risk  and methylation . While we show an association between genome-wide methylation and folate levels in EPIC (Additional file 1: Table S2, p = 0.04), further validation of this finding is needed to support this hypothesis.
In conclusion, the results of this study indicate that genome-wide hypomethylation, measured in pre-diagnostic blood samples using the Illumina HM450 array or by WGBS, could predict breast cancer risk. However, additional studies with larger sample sizes, WBC counts, as well as additional breast cancer risk factor information including genetic factors are needed to evaluate its potential value as an independent risk biomarker.
Availability of supporting data
The EPIC data set supporting the results of this article is available in the Gene Expression Omnibus (GEO) repository, accession GSE51057.
For this study, we have used three independent cohorts in which we have selected incident breast cancer cases compared to matched cancer-free controls in a nested case-control study design. These were the Italian cohort of the European Prospective Investigation into Cancer and Nutrition (EPIC) study (n = 166 pairs) , the Norwegian Women and Cancer (NOWAC) study (n = 192 pairs) , and the Breakthrough Generations Study (BGS) (n = 548 pairs) . All study participants signed informed consent forms, and each cohort was approved by the national ethical review boards.
Participants for this nested case-control study were selected from the Italian cohort of the European Prospective Investigation into Cancer and Nutrition (EPIC) study. This sub-cohort consists of 46,857 volunteers (including 32,157 women), recruited from 5 different centres within Italy (Varese, Turin, Florence, Naples, and Ragusa) . Incident cases were identified through cancer registries with <2 % losses to follow-up. We identified 166 incident female breast cancer cases, for each of which we collected 166 healthy female controls (matched on date of birth (±5 years), month of recruitment and study centre). Average follow-up (cases and controls combined) was 106.8 months (range: 0.53–188.8 months) and average time to diagnosis was 63.4 months (range 0.53–187.8). Main features of the resulting study population are summarised in Additional file 1: Table S1. For all study participants, detailed baseline information about lifestyle habits and personal and family history was collected through questionnaires, along with blood samples and anthropometric measurements at enrolment between 1993 and 1998. All participants signed an informed consent form, and the ethical review boards of the International Agency for Research on Cancer (IARC) and of local participating centres approved the study protocol.
Participants for this nested case-control study were selected from the Norwegian Women and Cancer (NOWAC) study . This study recruited from 1991 to 2006 and collected questionnaire information from 170,000 women with repeated collection of information after 4–6 years (2 or 3 times) and a biobank of more than 50,000 blood samples from participants in 2003–2006. Incident breast cancer cases were identified through the Norwegian Cancer Registry. We selected 192 incident female breast cancer cases, matched to 192 healthy female controls (matched on birth year and month of recruitment). Average time to diagnosis was 25.2 months (range 0–60). Main features of the resulting study population (n = 336) are summarised in Additional file 1: Table S2, and differences between the NOWAC and EPIC cohorts are described in Additional file 1: Table S3. All participants signed an informed consent form, and the NOWAC study was approved by the Regional Committee for Medical and Health Research Ethics in North Norway.
The Breakthrough Generations Study (BGS) is a large general population cohort consisting of ~110,000 women enrolled in the UK from 2003 to 2011 . For the methylation analyses, we have selected DNA samples from a case-control study nested in the BGS cohort. The inclusion criteria are as follows. We initially selected all confirmed incident cases and matched controls at the time of selection that met the following criteria: white ethnicity, subjects not related to another previously selected enrolled participant (first family member recruited), provided a blood sample received at the processing laboratory in post <2 days after collection, sample not clotted, and with available DNA extracted from buffy coats at concentration >40 ng/μL. Controls were individually matched to cases on age, ethnicity, and date of recruitment. This resulted in a total of 916 case-control pairs from whom we selected a random sample of 548 case-control pairs to make four DNA pools of cases and four DNA pools for their matched controls. We stratified the DNA samples into four pathology sub-groups (123 cases with in situ tumours, 66 cases with invasive estrogen receptor (ER)-negative tumours, 179 cases with invasive ER-positive tumours with early onset (age at diagnosis <50 years), and 189 cases with invasive ER-positive tumours with late onset (age at diagnosis >50 years)). Although all cases had a date of diagnosis after blood collection at the time of selection, subsequent record updates identified one case in the in situ pool diagnosed 2 years prior to blood collection, two cases in the in situ and ER-negative pools diagnosed 22 days prior to blood collection, and one case in the ER-positive late onset cancers with a previous diagnosis of in situ cancer 22 years prior to the diagnosis of the invasive cancer. Due to the pooling nature of this experiment, these few subjects cannot be excluded from analyses; however, they are unlikely to change the overall results. Each pool included 200 ng of peripheral blood DNA from each of the subjects to make a pooled DNA sample that was subsequently processed for library preparation and sequencing. Main features of the resulting study population (n = 548 cases and 548 matched controls) are summarised in Additional file 1: Table S4. All BGS participants signed an informed consent form, and the study was approved by the South East Research Ethics Committee (NREC 03/1/014).
DNA methylation measurement, data pre-processing, and quality control for 450k arrays
DNA extractions and methylation array processing were conducted in the same laboratory (HuGeF, Torino, Italy) for both the EPIC and NOWAC studies. DNA was extracted from buffy coats or blood cell fractions using the QIAsymphony DNA Midi Kit (Qiagen, Crawley, UK). Five hundred nanograms of DNA were bisulphite-converted with the EZ-96 DNA Methylation-Gold™ Kit (Zymo Research, Orange, CA, USA) according to the manufacturer’s protocol. Next, the Illumina Infinium HumanMethylation450 BeadChip was hybridised as per the manufacturer’s protocol. This array measures DNA methylation at 485,512 cytosine positions across the human genome, of which 482,421 CpG sites and 3091 non-CpG sites; hereafter, the term CpG will be used to refer to all of these, unless otherwise specified. BeadChips were washed and scanned using the Illumina HiScan SQ scanner, and intensities were extracted from the images using GenomeStudio (v.2011.1) and its Methylation module (1.9.0). Bisulphite conversion efficiency was assessed using control probes present on the chip, failing samples outside 3 SD of the sample distribution; all samples passed this initial quality control step. Additional pre-processing included background subtraction and colour correction to account for the dye bias seen in Infinium II probes. This was done by equalising the intensities in the green and red channels to the average intensity across the two colours as measured by normalisation control probes present on the BeadChip. The methylation level at each CpG was expressed as a β value, which represents the fraction of methylated cytosines at that specific location.
Probes that were not detected in >20 % of the samples were excluded from the analyses. The analysis of other quality control measures provided by GenomeStudio suggested that the resulting filtered subset did not show any major quality issues. Missing data were first imputed using the k-nearest neighbours method as implemented in the R package “impute” for the principle components analysis only . We then used the empirical Bayes method of Johnson et al.  (commonly referred to as “ComBat”) to minimise potential chip-specific batch effects. Lastly, in order to adjust the distributions of β values across probe type (Infinium I and II) and to enable joint analysis, we performed peak-based correction using two methods as described by Dedeurwaerder et al.  and Teschendorff et al. . Because the peaks of type I and type II probes are well defined in our study samples, both methods performed sufficiently well. We opted for the beta-mixture quantile normalisation (BMIQ) method , for the main analyses.
Probe and sample exclusions following quality control
Probe and sample exclusions are described in Additional file 3: Figure S2. In the EPIC cohort, the DNA methylation was measured at 485,577 loci on the genome in 166 cases and 166 matched controls before quality control exclusions. Sixty-five of these loci were SNPs, which were excluded from the analyses. Out of all 332 subjects, two subjects had to be excluded because of a diagnosis with another cancer prior to developing breast cancer and another two subjects because their matched pair was not located on the same chip. Following these initial sample exclusions, pre-processing of the DNA methylation data excluded 36,655 CpGs from the analyses because of missing values in >20 % of the samples and another three samples because of missing values for >5 % of the remaining CpGs. Finally, one sample (which formed an incomplete match pair) and 40,108 non-specific CpGs were excluded, resulting in 324 samples in which DNA methylation was measured at 408,749 CpGs. In the NOWAC cohort, DNA methylation was measured at 485,577 loci on the genome in all subjects: 192 cases and 192 matched controls. Sixty-five of these loci were SNPs, which were excluded from the analyses, as well as 224 CpGs after applying ComBat. We excluded 9 samples due to missing covariate data. Pre-processing of the DNA methylation data further excluded 28,459 CpGs from the analyses because of missing values in >20 % of the samples and another 14 samples because of missing values for >5 % of the remaining CpGs. Finally, 23 samples (which formed an incomplete match pair) and 40,417 non-specific CpGs were excluded, resulting in 338 samples (169 case-control pairs) in which DNA methylation was measured at 416,412 CpGs. Including only probes overlapping across the two datasets resulted in 407,455 probes.
White blood cell type adjustment
Previous studies have highlighted the importance of taking the type of different WBCs into account when analysing DNA methylation in whole blood [40, 41]. WBC differentials were not available for our samples. To address this, we used HM450 methylation data obtained from purified CD4 T-cells, CD8 T-cells, CD19 B-cells, monocytes, natural killer (NK) cells, neutrophils and eosinophils, and whole PBMCs (n = 6 subjects) . We identified the probes that differed significantly between each individual cell type and PBMC (linear regression using β values, p < 1e−07 and delta-β > 0.05). This identified n = 10,082 unique probes, which were subsequently removed from the statistical analyses, assuming as a first approach that blood composition only marginally affected methylation patterns at other sites (n = 444,054 remaining probes). Genome-wide estimation of cell composition was also used to infer cell proportions using the reference-based method  which did not change the results, rather than the reference-free adjustment method . Methylation array data from the EPIC cohort is available at GEO with accession GSE51057.
DNA methylation measurement, data pre-processing, and quality control for whole-genome bisulphite sequencing
DNA samples from the BGS cohort case-control study were stratified into four pathology sub-groups (in situ cases, ER-negative cases, ER-positive early onset <50 years, and ER-positive late onset >50 years, see Additional file 1: Table S4). Due to the high cost of whole genome sequencing, we used a pooling approach where incident breast cancer cases (n = 548) were pooled into 4 pools of DNA, and the matched healthy controls (n = 548) were pooled into matched pools. We pooled 200 ng of DNA from each subject into the 8 DNA pools that were then processed for WGBS using a published protocol for library preparation . Libraries were sequenced using PE100bp reads using the HiSeq2500 with 2 lanes per library. Sequencing was conducted by the Institute of Cancer Research Tumour Profiling Unit. Data processing followed a standard pipeline: The quality of reads was analysed using SolexaQA . Mate pairs were trimmed to 80 bp, reflecting a balance between uniquely mappable, high-quality reads. Bismark  was used to map trimmed read pairs to a bisulphite-converted representation of the hg19 (GRCh37) genome, using Bowtie 2. Bismark then calculated the proportion of methylated reads at each CpG site, after removing duplicated reads. This provided single nucleotide level resolution with approximately 50-fold coverage of ~14 million mappable CpG sites (13,903,531 CpGs). All subsequent analysis was performed in R, using “GRanges” package to generate coverage-weighted summary methylation values for different genomic categories/regions. We observed that the raw average methylation across CpG sites was dependent on coverage and therefore calculated a coverage-weighted mean methylation for each CpG site. Coverage-weighted mean was calculated with the following formula: Wmean = (M1*W1 + M2*W2 + M3*W3…)/sum(weights), where the CpG site was weighted (w = 1) if the coverage was greater than the median coverage in that pool and scaled down (w = 0.9, 0.8, 0.7, etc.) with each 10 % decrease in coverage from the median. We selected the 450k array CpG locations from the array annotation file and calculated coverage-weighted averages across all CpG sites that mapped to each genomic range and averaged across the CpG sites. We present the data from the CpG sites overlapping the 450k array for validation, with analysis of the whole data set to be reported elsewhere (Flanagan and Garcia-Closas, in preparation). We observed strong correlation between methylation values as measured by WGBS and Illumina 450k arrays for all probes (R 2 > 0.97) and for probes with methylation values between 20 and 80 % methylated (R 2 > 0.77).
For the 450k array data, the mean β value across all probes was calculated for each sample as a measurement for epigenome-wide methylation, and a paired Wilcoxon test was used to assess differences between cases and controls. An age-adjusted estimate of the odds ratio of breast cancer was obtained from a conditional logistic regression model with case-control status as the outcome and the epigenome-wide methylation measurement as continuous predictor. We adjusted for age due to residual age differences between the controls that were matched to within 5 years in EPIC. The epigenome-wide methylation levels were categorised into quartiles based on the distributions in controls. As a quantitative measure of the overall methylation, each quartile was allocated its median value (pseudo-continuous variable). To ease comparison with the corresponding methylation distribution in controls, medians were centred and standardised using the observed mean and standard deviation over all probes investigated. Odds ratios for epigenome-wide methylation were estimated overall and by time between blood collection and diagnosis. Robust logistic regression was also used to confirm these results. We have performed the receiver operating curve (ROC) analysis to assess the classification performance of average DNA methylation levels to predict breast cancer case status. We report the odds ratios (ORs), 95 % confidence intervals (95 % CIs), and corresponding p values. p values <0.05 were considered to be statistically significant. B-spline logistic regression models fitted in the “bs” R package were used to explore the relationship between continuous measures of methylation levels and breast cancer risk and to estimate individual risk distribution. Meta-analysis was conducted using the “rmeta” R package and a random effects model for the summary estimate.
Probes were classified into different categories either reflecting their physical location in relation to CpG islands (island, shore, shelf) or based on a functional criterion (promoter, gene body, UTR, intergenic) according to the Illumina manifest file. CpG islands were classified as previously defined . A CpG shore is defined as the area 2 kb on either side of the CpG island, and a CpG shelf is defined as the area 2 kb outside of the CpG shore [48, 12]. As in the work of Sandoval et al., we combined TSS200, TSS1500, 5′UTR, and 1st exon into a single “promoter” region . Mean methylation over all probes within each category was calculated and ORs estimated, as described above.
Probe-wise analysis of 450k arrays was performed by first adjusting for technical confounding effects; DNA methylation levels at each CpG locus were adjusted using a generalised linear model (GLM) with beta-distributed response  including microarray and position on the microarray as technical confounders. Subsequently, to assess the association with case-control status, residuals from these models were entered as independent variable in a Poisson GLM with person-years of follow-up time as offset term and additionally adjusted for age at blood draw; this parameterisation yields results that are practically equivalent to those obtained using Cox proportional hazards model . Multiple comparisons were taken into account by considering a Bonferroni-corrected significance threshold α = 0.05/407,455 ≈ 1.2 × 10−7.
PV and KvV are funded by the HuGeF Foundation, Torino, Italy. JMF is funded by a Breast Cancer Campaign Fellowship. JMF acknowledges the funding from the Cancer Research UK (A13086), the Imperial Biomedical Research Centre, and the Centre for Systems Oncology and Cancer Innovation (CSOCI). PV and MC-H acknowledge the European FP7 project Exposomics (Grant Agreement 308610 to PV). DP and GM are funded by a grant from Associazione Italiana per la Ricerca sul Cancro (AIRC), Italy. MGC is funded by the Breakthrough Breast Cancer and the Institute of Cancer Research, UK. We thank the Breakthrough Breast Cancer and the Institute of Cancer Research for funding of the Breakthrough Generations Study (awarded to AA and AS). We also acknowledge the NHS funding to the NIHR Institute of Cancer Research/Royal Marsden Hospital Biomedical Research Centre. The authors would like to thank the study participants, study staff, and the doctors, nurses, and other healthcare staff and data providers who have contributed to the BGS. The authors would like to acknowledge Gianluca Campanella for the bioinformatics assistance, Olivia Fletcher for the helpful discussions and contribution to the Breakthrough Generations Study, Penny Coulson for the data management in BGS, and Stuart MacGregor for the helpful discussions on DNA pooling study design. Both PV and JMF had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
- Feinberg AP, Vogelstein B. Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature. 1983;301(5895):89–92.PubMedView ArticleGoogle Scholar
- Feinberg AP, Vogelstein B. Hypomethylation of ras oncogenes in primary human cancers. Biochem Biophys Res Commun. 1983;111(1):47–54.PubMedView ArticleGoogle Scholar
- Gama-Sosa MA, Slagel VA, Trewyn RW, Oxenhandler R, Kuo KC, Gehrke CW, et al. The 5-methylcytosine content of DNA from human tumors. Nucleic Acids Res. 1983;11(19):6883–94.PubMed CentralPubMedView ArticleGoogle Scholar
- Cheung HH, Lee TL, Rennert OM, Chan WY. DNA methylation of cancer genome. Birth Defects Research Part C: Embryo Today. 2009;87(4):335–50.PubMed CentralView ArticleGoogle Scholar
- Esteller M. Dormant hypermethylated tumour suppressor genes: questions and answers. J Pathol. 2005;205(2):172–80.PubMedView ArticleGoogle Scholar
- Ehrlich M. DNA hypomethylation in cancer cells. Epigenomics. 2009;1(2):239–59.PubMed CentralPubMedView ArticleGoogle Scholar
- Nishiyama R, Qi L, Tsumagari K, Weissbecker K, Dubeau L, Champagne M, et al. A DNA repeat, NBL2, is hypermethylated in some cancers but hypomethylated in others. Cancer Biology & Therapy. 2005;4(4):440–8.View ArticleGoogle Scholar
- Grunau C, Brun ME, Rivals I, Selves J, Hindermann W, Favre-Mercuret M, et al. BAGE hypomethylation, a new epigenetic biomarker for colon cancer detection. Cancer Epidemiol Biomark Prev. 2008;17(6):1374–9. doi:https://doi.org/10.1158/1055-9965.EPI-07-2656.View ArticleGoogle Scholar
- Lindsey JC, Lusher ME, Anderton JA, Gilbertson RJ, Ellison DW, Clifford SC. Epigenetic deregulation of multiple S100 gene family members by differential hypomethylation and hypermethylation events in medulloblastoma. Br J Cancer. 2007;97(2):267–74.PubMed CentralPubMedView ArticleGoogle Scholar
- Wasson GR, McGlynn AP, McNulty H, O’Reilly SL, McKelvey-Martin VJ, McKerr G, et al. Global DNA and p53 region-specific hypomethylation in human colonic cells is induced by folate depletion and reversed by folate supplementation. The Journal of Nutrition. 2006;136(11):2748–53.PubMedGoogle Scholar
- Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG, et al. Increased methylation variation in epigenetic domains across cancer types. Nat Genet. 2011;43(8):768–75.PubMed CentralPubMedView ArticleGoogle Scholar
- Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet. 2009;41(2):178–86.PubMed CentralPubMedView ArticleGoogle Scholar
- Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics. 2011;6(6):692–702.PubMedView ArticleGoogle Scholar
- Cho YH, Yazici H, Wu HC, Terry MB, Gonzalez K, Qu M, et al. Aberrant promoter hypermethylation and genomic hypomethylation in tumor, adjacent normal tissues and blood from breast cancer patients. Anticancer Res. 2010;30(7):2489–96.PubMed CentralPubMedGoogle Scholar
- Choi JY, James SR, Link PA, McCann SE, Hong CC, Davis W, et al. Association between global DNA hypomethylation in leukocytes and risk of breast cancer. Carcinogenesis. 2009;30(11):1889–97.PubMed CentralPubMedView ArticleGoogle Scholar
- Brennan K, Garcia-Closas M, Orr N, Fletcher O, Jones M, Ashworth A, et al. Intragenic ATM methylation in peripheral blood DNA as a biomarker of breast cancer risk. Cancer Res. 2012;72(9):2304–13.PubMedView ArticleGoogle Scholar
- Iwamoto T, Yamamoto N, Taguchi T, Tamaki Y, Noguchi S. BRCA1 promoter methylation in peripheral blood cells is associated with increased risk of breast cancer with BRCA1 promoter methylation. Breast Cancer Res Treat. 2011;129(1):69–77.PubMedView ArticleGoogle Scholar
- Wong EM, Southey MC, Fox SB, Brown MA, Dowty JG, Jenkins MA, et al. Constitutional methylation of the BRCA1 promoter is specifically associated with BRCA1 mutation-associated pathology in early-onset breast cancer. Cancer Prevention Research. 2011;4(1):23–33. doi:https://doi.org/10.1158/1940-6207.CAPR-10-0212.PubMed CentralPubMedView ArticleGoogle Scholar
- Brennan K, Flanagan JM. Is there a link between genome-wide hypomethylation in blood and cancer risk? Cancer Prevention Research. 2012;5(12):1345–57. doi:https://doi.org/10.1158/1940-6207.CAPR-12-0316.PubMedView ArticleGoogle Scholar
- Li L, Choi JY, Lee KM, Sung H, Park SK, Oze I, et al. DNA methylation in peripheral blood: a potential biomarker for cancer molecular epidemiology. Journal of Epidemiology. 2012;22(5):384–94.PubMed CentralPubMedView ArticleGoogle Scholar
- Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12(8):529–41.PubMed CentralPubMedView ArticleGoogle Scholar
- Verma M. Epigenome-wide association studies (EWAS) in cancer. Current Genomics. 2012;13(4):308–13.PubMed CentralPubMedView ArticleGoogle Scholar
- Severi G, Southey MC, English DR, Jung CH, Lonie A, McLean C, et al. Epigenome-wide methylation in DNA from peripheral blood as a marker of risk for breast cancer. Breast Cancer Res Treat. 2014;148(3):665–73.PubMedView ArticleGoogle Scholar
- Xu Z, Bolick SC, DeRoo LA, Weinberg CR, Sandler DP, Taylor JA. Epigenome-wide association study of breast cancer using prospectively collected sister study samples. J Natl Cancer Inst. 2013;105(10):694–700.PubMed CentralPubMedView ArticleGoogle Scholar
- Deroo LA, Bolick SC, Xu Z, Umbach DM, Shore D, Weinberg CR, et al. Global DNA methylation and one-carbon metabolism gene polymorphisms and the risk of breast cancer in the Sister Study. Carcinogenesis. 2014;35(2):333–8.PubMed CentralPubMedView ArticleGoogle Scholar
- Nelson HH, Marsit CJ, Kelsey KT. Global methylation in exposure biology and translational medical science. Environ Health Perspect. 2011;119(11):1528–33.PubMed CentralPubMedView ArticleGoogle Scholar
- Shenker NS, Polidoro S, van Veldhoven K, Sacerdote C, Ricceri F, Birrell MA, et al. Epigenome-wide association study in the European Prospective Investigation into Cancer and Nutrition (EPIC-Turin) identifies novel genetic loci associated with smoking. Hum Mol Genet. 2013;22(5):843–51.PubMedView ArticleGoogle Scholar
- Anjum S, Fourkala EO, Zikan M, Wong A, Gentry-Maharaj A, Jones A, et al. A BRCA1-mutation associated DNA methylation signature in blood cells predicts sporadic breast cancer incidence and survival. Genome Medicine. 2014;6(6):47.PubMed CentralPubMedView ArticleGoogle Scholar
- Chu M, Siegmund KD, Hao QL, Crooks GM, Tavare S, Shibata D. Inferring relative numbers of human leucocyte genome replications. Br J Haematol. 2008;141(6):862–71.PubMedView ArticleGoogle Scholar
- Ruike Y, Imanaka Y, Sato F, Shimizu K, Tsujimoto G. Genome-wide analysis of aberrant methylation in human breast cancer cells using methyl-DNA immunoprecipitation combined with high-throughput sequencing. BMC Genomics. 2010;11:137.PubMed CentralPubMedView ArticleGoogle Scholar
- Stolzenberg-Solomon RZ, Chang SC, Leitzmann MF, Johnson KA, Johnson C, Buys SS, et al. Folate intake, alcohol use, and postmenopausal breast cancer risk in the prostate, lung, colorectal, and ovarian cancer screening trial. Am J Clin Nutr. 2006;83(4):895–904.PubMedGoogle Scholar
- Christensen BC, Kelsey KT, Zheng S, Houseman EA, Marsit CJ, Wrensch MR, et al. Breast cancer DNA methylation profiles are associated with tumor size and alcohol and folate intake. PLoS Genet. 2010;6(7), e1001043.PubMed CentralPubMedView ArticleGoogle Scholar
- Palli D, Berrino F, Vineis P, Tumino R, Panico S, Masala G, et al. A molecular epidemiology project on diet and cancer: the EPIC-Italy prospective study. Design and baseline characteristics of participants. Tumori. 2003;89(6):586–93.PubMedGoogle Scholar
- Lund E, Kumle M, Braaten T, Hjartaker A, Bakken K, Eggen E, et al. External validity in a population-based national prospective study—the Norwegian Women and Cancer Study (NOWAC). Cancer Causes Control. 2003;14(10):1001–8.PubMedView ArticleGoogle Scholar
- Swerdlow AJ, Jones ME, Schoemaker MJ, Hemming J, Thomas D, Williamson J, et al. The breakthrough generations study: design of a long-term UK cohort study to investigate breast cancer aetiology. Br J Cancer. 2011;105(7):911–7.PubMed CentralPubMedView ArticleGoogle Scholar
- Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–5.PubMedView ArticleGoogle Scholar
- Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–27.PubMedView ArticleGoogle Scholar
- Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F. Evaluation of the Infinium Methylation 450K technology. Epigenomics. 2011;3(6):771–84.PubMedView ArticleGoogle Scholar
- Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics. 2013;29(2):189–96.PubMed CentralPubMedView ArticleGoogle Scholar
- Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, et al. Heterogeneity in white blood cells has potential to confound DNA methylation measurements. PLoS One. 2012;7(10), e46705.PubMed CentralPubMedView ArticleGoogle Scholar
- Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlen SE, Greco D, et al. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS One. 2012;7(7), e41361.PubMed CentralPubMedView ArticleGoogle Scholar
- Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:86.PubMed CentralPubMedView ArticleGoogle Scholar
- Houseman EA, Molitor J, Marsit CJ. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014;30(10):1431–9.PubMed CentralPubMedView ArticleGoogle Scholar
- Johnson MD, Mueller M, Game L, Aitman TJ. Single nucleotide analysis of cytosine methylation by whole-genome shotgun bisulfite sequencing. In: Ausube FM, editor. Current protocols in molecular biology. 2012. Chapter 21:Unit21 3.Google Scholar
- Cox MP, Peterson DA, Biggs PJ. SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics. 2010;11:485.PubMed CentralPubMedView ArticleGoogle Scholar
- Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011;27(11):1571–2.PubMed CentralPubMedView ArticleGoogle Scholar
- Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987;196(2):261–82.PubMedView ArticleGoogle Scholar
- Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288–95.PubMedView ArticleGoogle Scholar
- Ferrari SLP, Cribari-Neto F. Beta regression for modelling rates and proportions. J Appl Stat. 2004;31(7):799–815.View ArticleGoogle Scholar
- Whitehead J. Fitting Cox’s regression model to survival data using GLIM. J R Stat Soc: Ser C: Appl Stat. 1980;29(3):268–75.Google Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.