Subjects and study design
The subjects in the present study were selected from the prospective birth cohort “Environment and Childhood Asthma” (ECA) study in Oslo [24]. The study includes the 19 children with available blood samples for methylation analyses at 2, 10, and 16 years of age (n = 57 samples in total) and with unambiguous clinical phenotypes (no asthma or allergy, consistent asthma and allergy, or crossovers between ages 2 to 16). Replication using EpiTYPER was performed in 31 additional individuals (n = 93 samples) selected from the same cohort.
DNA extraction
From the 2-year-old subjects, DNA was extracted from blood clots obtained from the 2-year investigation (1993–1995) as described elsewhere [34]. From the 10-year-old subjects, DNA was extracted from peripheral whole-blood, using a MagnaPure LC (Roche). From the 16-year-old subjects, DNA was extracted from peripheral whole-blood on Autopure LS (Gentra/Qiagen) using the 2–5 × 107 protocol.
RNase and proteinaseK treatment of DNA
0.3–10 μg DNA was diluted in TE buffer (pH 8.0) to a final volume of 300 μl. RNase (USB) was added (final concentration 0.033 mg/mL), and the reactions were incubated at 37 °C for 30 min, followed by proteinaseK treatment (0.17 mg/mL) for 1 h at 55 °C. Purification was done using Genomic DNA Clean & Concentrator (Zymo Research) and eluted in 18 μl TE buffer.
RRBS library preparation
Sequencing libraries were prepared based on two protocols, depending of the available amount of DNA.
1 μg input
One microgram of RNase- and proteinaseK-treated DNA was digested with MspI (40U, New England Biolabs) at 37 °C for 2 h before purification (QIAquick Nucleotide Removal Kit, Qiagen) and eluted in 50 μl elution buffer (EB). End-repair was done using Illumina End Repair Mix with incubation for 30 min at 30 °C, followed by purification (QIAquick Nucleotide Removal Kit, Qiagen) and elution in 30 μl EB. The total volume of DNA was reduced to 15 μl (on a heating block), and DNA was adenylated using the Illumina A-tailing Mix (37 °C for 30 min). Illumina index-adapters (diluted 1:10 in dH2O) were ligated for 10 min at 30 °C, and the adapter-ligated fragments were purified (QIAquick PCR Purification Kit, Qiagen) and eluted in 30 μl EB. The samples were then separated on a 3 % NuSieve 3:1 agarose gel (Lonza). EtBr-stained gel slices containing adaptor-ligated fragments of 220–400 bp in size were excised and purified using two QIAquick MiniElute Kit (Qiagen) columns, with elution in a total of 20 μl EB. The adaptor-ligated and size-selected fragments were bisulfite-treated with the EpiTect Bisulphite Kit (Qiagen) using the formalin-fixed paraffin-embedded (FFPE) protocol and two consecutive rounds of 95 °C 5 min, 60 °C 25 min, 95 °C 5 min, 60 °C 85 min, 95 °C 5 min, and 60 °C 175 min and a final step at 20 °C 5 min and eluted in 20 μl EB. The final sequencing libraries were amplified in a 50-μl PCR containing 20 μl bisulfite-converted DNA, 14 μl dH2O, 5 μl 10 mM dNTPs, 5 μl 10× PfuTurbo Cx buffer, 5 μl TruSeq primer cocktail, and 1 μl PfuTurbo Cx hotstart DNA polymerase (2.5U, Agilent) at the following conditions: 95 °C for 5 min, followed by 17 cycles of 95 °C 20 s, 60 °C 30 s, 72 °C 30 s, and 72 °C for 7 min, and hold at 4 °C. The PCR reaction was purified using 90 μl of AMPure XP beads (Agencourt) and eluted in a final volume of 30 μl RSB buffer (Illumina) before quantified and analyzed using Qubit 2.0 (Invitrogen) and Bioanalyzer 2100 (Agilent), respectively. Libraries showing adapter dimers on the Bioanalyzer traces were subject to a second AMPure XP cleanup (DNA:bead ratio 1:1.25) and ran again on Qubit and Bioanalyzer. Finally, libraries were diluted to 10 nM in RSB buffer.
150–400 ng input
One hundred fifty to four hundred nanograms of RNase- and proteinaseK-treated DNA was digested with MspI (20U, New England Biolabs) at 37 °C for 2 h before purification (QIAquick Nucleotide Removal Kit, Qiagen) and eluted in 50 μl EB. End repair was done using Illuminas End Repair Mix with incubation for 30 min at 30 °C, followed by purification (QIAquick Nucleotide Removal Kit, Qiagen). EB was diluted twice in nuclease-free water, and samples were eluted in 30 μl of this solution. The total volume of DNA was reduced to 15 μl (on a heating block), and DNA was adenylated using the Illumina A-tailing Mix (37 °C for 30 min). Illumina index-adapters (diluted 1:10 in dH2O) were ligated for 10 min at 30 °C, and the adapter-ligated fragments were purified (QIAquick PCR Purification Kit, Qiagen) and eluted in 30 μl EB. Each sample was mixed with 50 ng Escherichia coli carrier DNA (DNA prepared as described by Gu et al. [35]) and separated on a 3 % NuSieve 3:1 agarose gel (Lonza). EtBr-stained gel slices containing adaptor-ligated fragments of 220–400 bp in size were excised and purified on QIAquick MiniElute Kit (Qiagen), with an elution volume of 11 μl. The adaptor-ligated and size-selected fragments were bisulfite-treated with the EpiTect Bisulphite Kit (Qiagen) using the FFPE protocol and two consecutive rounds of 95 °C 5 min, 60 °C 25 min, 95 °C 5 min, 60 °C 85 min, 95 °C 5 min, and 60 °C 175 min and a final step at 20 °C 5 min and eluted in 20 μl EB. The bisulfite-converted DNA was mixed with 131 μl dH2O, 20 μl 10 mM dNTPs, 20 μl 10× PfuTurbo Cx buffer, 5 μl TruSeq primer cocktail, and 4 μl PfuTurbo Cx hotstart DNA polymerase (10U, Agilent). The mix was divided into eight aliquots of 25 μl in a 96-well PCR plate, and PCR was performed at the following conditions: 95 °C for 5 min, followed by 17 cycles of 95 °C 20 s, 60 °C 30 s, 72 °C 30 s, 72 °C for 7 min, and hold at 4 °C. The aliquots were pooled together, and the PCR was purified by AMPure XP beads (Agencourt) at DNA:bead ratio 1:1.25, eluted in a final volume of 30 μl RSB (Illumina), and quantified and analyzed using Qubit 2.0 (Invitrogen) and Bioanalyzer 2100 (Agilent), respectively. Libraries showing adapter dimers on the Bioanalyzer traces were subject to a second AMPure XP clean up (DNA/bead ratio 1:1.25) and ran again on Qubit and Bioanalyzer. Finally, libraries were diluted to 10 nM in RSB buffer (Illumina).
Sequencing and alignment
In order to estimate an optimal input for library clustering, we performed a qPCR assay designed to amplify only those fragments carrying Illumina adapters at both ends. Inputs for clustering were calculated by comparing the amplification curves of our libraries to those of a previously sequenced RRBS library of known cluster density. For each subject, the three libraries (2, 10, and 16 years) were pooled according to calculated cluster input. Each pool was sequenced on one lane on a HiSeq 2000 (Illumina), generating 50 bp single-end reads. Raw sequencing data were processed by the standard Illumina pipeline for image analysis and base calling. Quality evaluation of raw sequence data was generated using FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc). Genomic alignment was performed by RRBSMAP [36] (v1.6). The genome (hg19) was indexed on CCGG (MspI restriction site), and a maximum of two mismatches on a read allowed and 3′-end adapter sequences were trimmed off to optimize the alignment of short MspI fragments. Alignment efficiency was calculated using the Picard CalculateHsMetrics tool (http://broadinstitute.github.io/picard). A minimum of ten reads was used to calculate the DNA methylation at C in a CpG context. The DNA methylation level at a CpG was extracted from the mapping results as the number of methylated Cs divided by the total number of Cs using the python script included in RRBSMAP. The bisulfite conversion rate was estimated based on the frequency of C-to-T conversions for cytosines that were not in a CpG context. For biological interpretation, CpGs were annotated by using the annotation module in SAAP-RRBS [37] and BEDTools [38].
Sequenom MassARRAY replication
Replication of age-associated DNA methylation in six genes covered by 71 CpGs was performed using the Sequenom MassARRAY EpiTYPER (Sequenom Inc, Hamburg, Germany). PCR primers (Additional file 6: Table S2) were designed using the Sequenom EpiDesigner software (www.epidesigner.com). To reduce variability in DNA methylation resulting from technical variation during PCR, each sample was amplified in triplicate and pooled prior to the MassARRAY analysis.
Statistical analysis
All statistical tests were conducted in R (www.r-project.org).
Analysis of age-associated DNA methylation
To investigate the total effect (TE) of age on DNA methylation in whole blood, we used a linear mixed-effects model with time as a fixed and subjects as a random factor. This model is suitable for repeated measurements and implemented in the lme4 and lmeTest R packages, which are able to deal with missing values. This is an advantage for RRBS datasets in particular, since it is challenging to produce a completely overlapping data set from all samples. Statistical analyses were performed for each CpG with data from at least ten individuals. Pairwise tests were done for all CpGs with at least ten complete pairs (2 to 10, 10 to 16, and 2 to 16 years). Adjustment for multiple testing was performed, considering a false discovery rate (FDR) below 5 % to be genome-wide significant using the method of Benjamini and Hochberg [39].
Analysis of cell-type proportions as a mediator
Cell counts for five cell types were available (lymphocytes, neutrophils, monocytes, eosinophils, and basophils) for all individuals and time points. The cell-type proportions (CTPs) were calculated by dividing the cell counts for a certain cell type by the sum for the cell counts across all cell types. The lymphocyte CTPs were not available for 2-year olds and were imputed by calculating the difference between 100 % and the four remaining CTPs. In addition, there were eight missing values across all cell types at time point 10 years and one at time point 16 years. Here, missing values were imputed by taking the mean within each cell type and time point. As can be seen in Figure S2 (Additional file 2), lymphocytes and neutrophils together explain on average 89 % of all CTPs. Using principal component analysis (irlba package in R) on the CTP data matrix, the first principal component (PC1) was evaluated. The two main cell types are well represented by PC1 with loadings of 98 and 96 %, respectively. The remaining three cell types were neglected because the possible impact of potential cell-type-specific DNA methylation changes would not be captured by DNA methylation values based on whole blood anyway. As an example, a change of 50 % within a cell type with a proportion of 5 % would lead to an overall DNA methylation change of 0.5 × 0.05 = 2.5 % given that DNA methylation in the other cell types remains constant. Thus, we use PC1 when investigating the role of CTPs in our analysis. In order to find out whether CTC acts (totally or partially) as a mediator, modern causal inference theory has been used to estimate different types of effects [40]. The analysis of mixed-models modeling CTPs as a mediator was done by using the mediation R package [41]. The results were corrected for multiple testing by the method of Benjamini and Hochberg [39] and considered genome-wide significant on the 5 % level (FDR).
GO analyses
GO analysis was performed using the DAVID functional annotation tool [26, 27] based on the results of the initial association tests for age and DNA methylation adjusted for CTC. The CpGs included were represented by 16,687 genes, of which 15,339 genes have a DAVID id. Out of these, 87 genes (78 with a DAVID id) had an adjusted p value ≤0.25. The analysis was done with default parameters, and the results were corrected for multiple testing by the method of Benjamini and Hochberg [39].
Investigation of asthma disease status as a confounder
In order to investigate whether the asthma phenotype was confounding the analyses of age-associated DNA methylation, we tested the representativeness of the control samples (with no asthma diagnosis) for the whole sample for all significant aDMPs identified in our study. We used a sampling procedure for investigating the DNA methylation differences between controls and the whole sample. For each time point and significant aDMP, we calculate the mean DNA methylation (%) difference between the control group (eight samples) and the whole group (19 samples). In addition, we derive the empirical distribution for these mean differences (selection versus all) for all possible combinations (eight out of 19). Given this distribution, we can evaluate for each time point whether the number of probes with “extreme differences” in DNA methylation for the control group compared to the overall sample is higher than expected. Overall, there are no larger differences in mean DNA methylation between the control group and the overall group than expected by chance. For example, the positive rates on the 5 % significance level across all aDMPs are 4.3 % at 2 years and 1.4 % at both 10 and 16 years, which is (much) less than expected. In addition, testing the significant aDMPs for an association with asthma revealed no significant results either (results not shown).