Skip to main content

Epigenome-wide contributions to individual differences in childhood phenotypes: a GREML approach



DNA methylation is an epigenetic mechanism involved in human development. Numerous epigenome-wide association studies (EWAS) have investigated the associations of DNA methylation at single CpG sites with childhood outcomes. However, the overall contribution of DNA methylation across the genome (R2Methylation) towards childhood phenotypes is unknown. An estimate of R2Methylation would provide context regarding the importance of DNA methylation explaining variance in health outcomes. We therefore estimated the variance explained by epigenome-wide cord blood methylation (R2Methylation) for five childhood phenotypes: gestational age, birth weight, and body mass index (BMI), IQ and ADHD symptoms at school age. We adapted a genome-based restricted maximum likelihood (GREML) approach with cross-validation (CV) to DNA methylation data and applied it in two population-based birth cohorts: ALSPAC (n = 775) and Generation R (n = 1382).


Using information from > 470,000 autosomal probes we estimated that DNA methylation at birth explains 32% (SDCV = 0.06) of gestational age variance and 5% (SDCV = 0.02) of birth weight variance. The R2Methylation estimates for BMI, IQ and ADHD symptoms at school age estimates were near 0% across almost all cross-validation iterations.


The results suggest that cord blood methylation explains a moderate degree of variance in gestational age and birth weight, in line with the success of previous EWAS in identifying numerous CpG sites associated with these phenotypes. In contrast, we could not obtain a reliable estimate for school-age BMI, IQ and ADHD symptoms. This may reflect a null bias due to insufficient sample size to detect variance explained in more weakly associated phenotypes, although the true R2Methylation for these phenotypes is likely below that of gestational age and birth weight when using DNA methylation at birth.


DNA methylation (DNAm) is an epigenetic process, which involves the attachment of a methyl group to cytosine bases, typically in the context of a cytosine-phosphate-guanine dinucleotide (CpG) site. The methylation status of a CpG site can have an impact on gene expression and downstream phenotypes [1]. In turn, methylation levels are determined by genetics, environment and stochastic processes [2, 3]. DNAm could therefore function as mediator of many genetic and environmental determinants of human development, functioning and pathology. A common research design to query the role of DNAm in these processes is an epigenome-wide association study (EWAS). As a large number of CpG sites are tested, to reliably identify relevant CpG sites, either large samples or big effect sizes are required, which for most traits or CpG sites are not available or unlikely [4].

However, analogous to lessons learned from genome-wide association studies, no matter the number of genome-wide significant CpGs identified in an EWAS, whether it be 0 or thousands, there is always a possibility that more CpGs are associated with a predictor or outcome, but did not reach significance due to lack of power. Since an EWAS estimates the associations of single CpG probes, no conclusions can be drawn about the overall contribution of genome-wide DNAm towards a phenotype. Such an overall estimate of variance explained by genome-wide DNAm (R2Methylation) would be highly informative for several reasons: (1) R2Methylation would provide a picture of how relevant DNAm levels are to an outcome, either as causal determinant or predictor. (2) R2Methylation would provide an upper limit of how much the combined effects of CpG sites identified by an EWAS (e.g. poly-epigenetic score) can explain. While estimates of R2Methylation would be clearly useful, the best approach to derive them is less clear. One option is to adapt the genomic restricted maximum likelihood (GREML) [5] approach used in genetics.

In genetics, the analogous measure of R2Methylation is the single nucleotide polymorphism heritability (SNP h2), i.e. the variance explained by all measured SNPs. A popular method to estimate SNP h2 is through a GREML analysis which consists of two steps: (1) The estimation of genetic relatedness values between participant pairs inferred from their similarity in measured SNP genotypes. (2) Estimating how well genetic relatedness predicts phenotypic similarity between participant pairs. While the GREML approach has been developed for genetic data, the analysis can be applied to any high dimensional data, such as genome-wide methylation data. First papers are now being published using GREML and alternative methods to estimate the variance explained by genome-wide DNAm. An early example is a study by Vazquez et al. [6], who used a Bayesian variant of a GREML model to predict breast cancer survival. The authors found that genome-wide DNAm is more predictive than the structural genome or traditional covariates alone, explaining 16.2% of variance. More recently, Zhang et al. [7] tested the validity of the GREML approach in methylation data using simulations and real data in a sample of adults. The authors estimated that concurrent blood DNAm levels explained 6.5% of the variance in BMI but were not associated with height, when controlling for genetic effects. In contrast, using a Bayesian approach not relying on similarity matrices, Banos et al. [8] estimated the proportion of BMI variance explained by concurrent DNAm to be 75.7% in adulthood. The CpG-level effects estimated by this model explained up to 30.8% in adult replications cohorts, but only 3.3%, 2.05% and 9.65% at birth, age 7 and age 15 respectively, with BMI and DNAm measured at the same time-points. The results suggest highly age specific effects depending on when both BMI and DNAm were measured.

As previous studies focused on DNAm and outcomes in adults, the variance of childhood outcomes explained by cord blood DNAm is unknown. In this study we aimed to use cord blood DNAm to estimate the R2Methylation of five child outcomes, previously addressed in EWAS studies: gestational age and birth weight, as well as BMI, IQ and ADHD symptoms at school age. These outcomes were chosen because they represent childhood outcomes in different areas (general health, cognition and psychopathology). In addition, all of these have been studied in multi-center population-based EWAS before, allowing for a comparison between R2Methylation measures and EWAS findings. Two of the phenotypes most robustly associated with DNAm in EWAS studies are gestational age and birth weight. For gestational age, 8899 CpGs have been found to be significantly associated in a previous EWAS at genome-wide significance [9]. Prediction models based on these CpGs were able to explain 50–80% of the gestational age variance in an independent sample [10, 11]. In the case of birth weight, 914 sites were associated based on an EWAS meta-analysis in 8,825 children [12]. Cord blood has also the potential to predict later development, e.g. nine CpG sites were associated with ADHD symptoms in school-age according to a recent EWAS in 2,477 children [13] and one CpG site predicted BMI in late childhood (n = 4133) [14]. In contrast, no genome-wide significant sites in cord blood were identified for BMI in early childhood [14] nor IQ in school-age (n = 3798) [15]. While the variance explained by specific sets of CpGs is known for some childhood outcomes, the genome-wide contribution has not been studied before. The aim of this study is to estimate the genome-wide contribution of cord blood DNA to various childhood outcomes.

We adapted the GREML approach to DNA methylation data to obtain R2Methylation estimates. To maximize generalizability of results, we applied this method to two population-based birth cohorts: ALSPAC (n = 775) [16, 17] and Generation R (n = 1382) [18]. In addition, we applied a Monte-Carlo cross-validation within cohorts, using a 90/10 training/validation split [19]. Results from cross-validation were pooled to provide a joint estimate of R2Methylation.


Results are based on the analyses of the ALSPAC and Generation R cohorts. For ALSPAC, 775 children were included, who were born in the former English county Avon between 1991 and 1992. For Generation R, 1382 children, who were born in the Dutch city of Rotterdam between 2002 and 2006, were included in the analysis. Both cohorts are population-based studies with mostly comparable participant characteristics, with the exception of maternal education, which was higher for Generation R (Table 1).

Table 1 Participant characteristics

DNAm explained 0 to 46% of the tested outcomes’ variances. See Table 2 for full results and Figs. 1, 2, 3 and 4 for a graphical representation of the estimate distribution across cross-validations.

Table 2 Variance explained by genome-wide DNA methylation
Fig. 1
figure 1

Variance explained in birth outcomes by cord blood DNA methylation (basic adjustment). Cross-validation distribution of ΔR2Methylation, the variance explained by genome-wide DNA methylation minus variance explained by covariates (sex and batch) in ALSPAC (red) and Generation R (blue). Vertical lines indicate mean ΔR2Methylation in ALSPAC (red), Generation R (blue) and a pooled estimate (black)

Fig. 2
figure 2

Variance explained in birth outcomes by cord blood DNA methylation (full adjustment). Cross-validation distribution of ΔR2Methylation, the variance explained by genome-wide DNA methylation minus variance explained by covariates (sex, maternal age, maternal smoking, maternal education, cell type proportions, batch, gestational age*, birth weight* (* not when outcome is gestational age or birth weight)) in ALSPAC (red) and Generation R (blue). Vertical lines indicate mean ΔR2Methylation in ALSPAC (red), Generation R (blue) and a pooled estimate (black)

Fig. 3
figure 3

Variance explained in childhood outcomes by cord blood DNA methylation (basic adjustment). Cross-validation distribution of ΔR2Methylation, the variance explained by genome-wide DNA methylation minus variance explained by covariates (sex and batch) in ALSPAC (red) and Generation R (blue). Vertical lines indicate mean ΔR2Methylation in ALSPAC (red), Generation R (blue) and a pooled estimate (black)

Fig. 4
figure 4

Variance explained in childhood outcomes by cord blood DNA methylation (full adjustment). Cross-validation distribution of ΔR2Methylation, the variance explained by genome-wide DNA methylation minus variance explained by covariates (sex, maternal age, maternal smoking, maternal education, cell type proportions, batch, gestational age, birth weight) in ALSPAC (red) and Generation R (blue). Vertical lines indicate mean ΔR2Methylation in ALSPAC (red), Generation R (blue) and a pooled estimate (black)

Gestational age had the highest R2 with 45.8% of the variance in gestational variance explained by DNAm in cord blood independent of sex and batch. We further tested the independent contribution of DNA methylation additionally controlling for maternal age, maternal smoking, maternal education and cell type proportions. In this fully adjusted model 32.3% (SDCV = 0.057) of variance was explained by DNAm. Notably, the ΔR2Methylation was twice as large in GenR (ΔR2Methylation = 51.7%, SDCV = 0.102) compared to ALSPAC (ΔR2Methylation = 23.4%, SDCV = 0.069). Across both cohorts 95% of cross-validation results ranged from 12.7 to 63.2% (Fig. 2). The ranges of the Generation R and ALSPAC estimate distributions overlapped to a substantial degree, with 54.3% of estimates lying between the Generation R smallest value and ALSPAC’s biggest.

For birth weight, the variance explained was estimated at 12.9% (SDCV = 0.047) with basic adjustment and 4.9% (SDCV = 0.021) with full covariate adjustment. Again, the estimate was much larger in Generation RR2Methylation = 16.9%, SDCV = 0.055) compared to ALSPAC (ΔR2Methylation = 2.8%, SDCV = 0.023). In the fully adjusted model, 95% of estimates were between − 0.1% and 25.0% with a minority of estimates overlapping between these two cohorts (31.8%) (Fig. 2).

DNAm in cord blood did not explain variance in any of the childhood outcomes at school age (BMI, ADHD and IQ). This result was consistent in both cohorts, in which all cross-validation estimates were very close to 0, with the vast majority (97.5%) of estimates being below 4% in both basic and fully adjusted model. Correspondingly, the cross-validation standard deviations were below 0.1%, suggesting that no matter which participants were randomly assigned to training or validation, the estimated effect was always near 0 (Figs. 3, 4).

CpG sites show varying degrees of correlation with neighboring sites. To assess, whether these correlated patterns may bias results, we performed further sensitivity analyses. Specifically, we averaged DNA methylation levels across correlated CpG sites (methylation correlated blocks (MCB)). This resulted in a set of independent CpGs and MCBs. DNAm, as defined by these independent features, explained slightly more variance in the birth outcomes compared to the main analysis (Table 3). Most notably, the independent set explained 3.6% more variance in gestational age and 2.4% in birth weight.

Table 3 Variance explained by genome-wide DNA methylation using independent CpG sites/methylation correlated blocks


This study is the first to report the extent to which childhood outcomes are explained by cord blood genome-wide DNAm. We observed that methylation patterns explained moderate variance for gestational age and birth weight, but no variance explained for prospective associations with BMI, IQ or ADHD symptoms at school-age.

A strength of the study was the use of two cohorts, which are among the largest samples of cord blood methylation currently available. Both cohorts are comparable in many ways, for instance they represent populations of European ancestries living in western European countries and similar outcome assessment ages. In addition, cord blood DNAm assessment was very similar, as both cohorts used the same methylation array and were normalized jointly.

The general trend of results regarding ranking from highest to lowest explained outcomes agreed between the cohorts. The highest estimates across both cohorts were found for gestational age, which is consistent with previous studies. Bohlin et al. tested a prediction model based on 58-132 CpG sites in cord blood using similar covariates (sex, maternal age, maternal smoking, cell composition) as in our study [10]. The authors were able to explain 50–65% of variance in a test sample of 685 participants from the MoBa cohort. Since we modeled a much higher number of probes, we would expect at least equal prediction performance in our study. The previous findings are consistent with the Generation R estimate of 52% variance explained and suggests that adding more probes from the Illumina 450 k array would not increase performance of the prediction model.

However, the previous results are less consistent with the 23.4% estimate in ALSPAC, indicating either a higher variability in lower powered samples or a potential bias towards null effects in lower sample sizes, as we will discuss later. Another contributor to study heterogeneity may be the different methods used to estimate gestational age. Most gestational age estimates in ALSPAC were based on the last reported menstrual period, whereas in Generation R most estimates were based on ultrasound scans. The latter method is expected to have less measurement error and thus higher variance explained assuming constant methylation effects.

Genome-wide DNAm also explained variance in birth weight, albeit less so than for gestational age. Interestingly, the estimate was again higher in GenR than ALSPAC. In contrast to gestational age, there was no apparent noteworthy difference in birth weight assessment, yet the estimates differed even more between cohorts than for gestational age, so other potential causes for the observed study differences must be discussed. One cause could be higher sampling variance in lower sample sizes. The different estimates may hint that the ΔR2Methylation values at sample sizes of around 1000 samples or lower may be highly variable, with lower sample sizes more likely to over or underestimate the true variance explained.

School-age outcomes showed a ΔR2Methylation near zero for BMI, IQ and ADHD symptoms at age 6 in both cohorts. In contrast to gestational-age and birth weight, these analyses present prospective associations over at least 6 years and have resulted in fewer genome-wide significant findings in previous EWAS [13,14,15]. This temporal component together with perhaps lower contribution of DNAm may weaken associations and result in lower variance explained estimates. While these factors lead to the expectation of a lower variance explained estimate in prospective estimates as opposed to cross-sectional analyses, estimates of 0% appear nevertheless unlikely. For instance, for ADHD, 9 CpG sites have been identified in a meta-analysis, in which most participants were drawn from ASLPAC and GenR [13]. Both cohorts showed a high lambda in the EWAS, not accounted for by confounding, suggestive of a highly poly-epigenetic signal. Therefore, 0% variance explained estimates in a subset of the data is implausible. Besides a true lower variance explained for the school-age outcomes, a potential bias towards 0 values in underpowered samples may be at play as well.

Assuming a high uncertainty of ΔR2Methylation, we would expect a large standard deviation in the cross-validation distribution, as some iterations will randomly show a variance explained that is much too high or too low. However, in our study all analyses with outcomes showing a 0% ΔR2Methylation, had an estimate near 0% in almost all cross-validation iterations. This resulted in very small cross-validation standard deviations, much smaller compared to the gestational age or birth weight analysis. This is incompatible with a high estimate uncertainty due to low sample size. Hence, we suspect that a bias towards 0 estimates is at play if outcomes, which are not very strongly associated with DNAm, are analyzed in small samples. Such a behavior has been previously noted by GCTA author Jian Yang in the context of GREML when applied to genetic data ( We therefore speculate that the true ΔR2Methylation values for the school-age outcomes are likely to be higher than 0% and below estimates found for gestational age and birth weight, which themselves did not display a bias towards 0% estimates. Interestingly, early GCTA studies indicated no SNP heritability for child psychiatric phenotypes [19], but later larger multi-center GCTA [20], and LD-score regression studies [21] have since then repeatedly demonstrated a SNP heritable component. Contrary to genetic studies, an additional source of variability in DNA methylation is the assessment time point. Estimates for the school-age outcomes are likely different for concurrent DNA methylation measures than cord blood, but sample size was not sufficient for these analyses in the current study.

A limitation of the current analyses is the coverage of the 450 k methylation array. The CpG sites measured by the array represent less than 2% of all CpG sites in the genome. While neighboring CpG sites tend to be correlated, CpG sites may also represent unmeasured CpG sites to a degree, but the correlations are not as stable or predictable as correlations between single nucleotide polymorphisms in linkage disequilibrium. Thus, the variance explained by array DNAm is unlikely the maximum which can be explained by all DNAm variation in humans. That said, the estimates do in theory represent the maximum that can be explained by the effects found in an EWAS using the same array, as it represents the joint effect of all measured CpG sites.

This study adjusted for a number of potential confounders, such as maternal smoking and education, as well as cell type proportions. Nevertheless, the observational nature of the study design makes it unclear whether the strong association between DNAm and gestational age represent direct effects of DNAm on gestational age, the effects of gestational age on DNAm, or the effect of unmeasured confounding. Furthermore, we only measured DNAm in a single tissue (cord blood). As DNAm can be tissue-specific, other tissue may show higher associations with studied outcomes, e.g. adipose tissue and body weight.

Despite the current limitations due to sample size, the results of the gestational age analysis demonstrate that GREML methods are applicable to studies of DNA methylation. We expect that increases in sample size will make this analytical approach more reliable for outcomes less strongly associated with DNAm. An increase in sample size would also allow for more complex questions to be answered. For example, as the method we utilized enables one to fit multiple similarity matrices, it is in principle possible to estimate ΔR2Methylation adjusted for genetic effects or to estimate the genome-wide interaction between genetic and epigenetic effects. Answers to these questions would not only be helpful in further understanding of how DNAm relates to development and health, but would also inform the design of future EWAS. For instance, EWAS might need to model interactions between genetics and methylation levels, if interactions on a genome-wide level are substantial [22]. We also observed a slight increase in the variance explained, when aggregating methylation levels of correlated CpG sites into MCBs. It is unclear, whether this reflects a downward bias when using a combination of independent and correlated CpG sites, or an increase in performance due the use of a simpler model based on fewer methylation features.

In summary, we showed that genome-wide DNAm in cord blood explains about a third of the variance in gestational age. DNAm was also associated to a lesser degree with birth weight. DNAm at birth, however, did not explain variance in child BMI, IQ and ADHD symptoms at school-age. The GREML approach holds promise for elucidating the relationship between genome-wide DNAm, child development and health outcomes, but increases in sample sizes are required to accurately estimate outcomes that are less strongly associated with DNAm and to explore more complex models, which can integrate different highly dimensional data.



Participants for this study were drawn from two European population-based birth cohorts: The ALSPAC Study and the Generation R study. ALSPAC had recruited 15,454 women with an expected delivery date between April 1991 and December 1992, who were living in the former English county Avon, resulting in 15,589 foetuses. Of these 14,901 were alive at 1 year of age. The development of their children was subsequently studied at multiple assessment waves. Cord blood DNAm was assessed for 1018 children. To avoid potential biases arising from shared family environment or population stratification, only one sibling per family was included in the analyses sample, as well as only children whose parents reported white ethnicity (analysis n = 775). Full cohort descriptions have been published previously [16, 17]. Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool ( Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time.

Generation R invited all pregnant women living in the city of Rotterdam, the Netherlands, with an expected delivery date between April 2002 and January 2006 to participate in the study, of which 9,778 were enrolled. Cord blood DNA methylation was assessed in a subgroup of 1396 children with parents of reported European national origin. After exclusion of siblings (one of each pair excluded), 1382 participants remained in the analysis. Full study descriptions have been published previously [18], see also for more information. All parents gave informed consent for their children's participation. The Generation R Study is conducted in accordance with the Declaration of Helsinki. Study protocols were approved by the Ethics Committee of Erasmus MC.


DNA methylation

DNAm was measured in cord blood at birth. Bisulfite conversion was performed with the EZ-96 DNAm kit (shallow) (Zymo Research Corporation, Irvine, USA). DNAm levels were then measured with the Illumina Infinium HumanMethylation450 BeadChip array (Illumina Inc., San Diego, USA). Preprocessing in ALSPAC was performed with the meffil package [23]. Quality control check included on a sample levels mismatched genotypes, mismatched sex, incorrect relatedness and on a CpG level low concordance with other time points, extreme dye bias, and poor probe detection. In Generation R, pre-processing was performed with the CPACOR workflow [24]. Quality control exclusion criteria included on a CpG level failed bisulfite conversion, hybridization or extension, and on a sample level sex mismatches and call rate <  = 95%. Both cohorts were normalized using a combined dataset, using meffil functional normalization with ten control probe principal components and slide included as a random effect, see Mulder et al. [25] for further details. To lessen the influence of methylation outliers while retaining a consistent sample size, extreme values were winsorized. Per CpG site, DNAm levels exceeding three times the interquartile range above the third or below the first quartile (3*IQR criterion) were replaced by the maximum or minimum value, respectively, of the sample below the exclusion criterion. Only autosomal probes were considered in this study for consistent interpretation of effects between sexes. This resulted in 470,870 and 473,864 CpG probes in ALSPAC and Generation R, respectively, which were used for the computation of the methylation similarity matrix.

Outcomes and covariates

Birth outcomes

In ALSPAC, birthweight was recorded by healthcare professionals at the time of birth and extracted from birth records [12]. Gestational age at delivery was also extracted from birth records. Obstetric practice and antenatal care at the time means that for most participants gestational age will have been estimated based on the last menstrual period, supplemented by ultrasound scans and paediatric/obstetric assessment of the newborn at birth.

In GenR, midwife and hospital registries were used to obtain information on birth weight. Gestational age was based on ultrasound examinations for mothers who enrolled in early or mid pregnancy, but based on last menstrual period for late pregnancy [26].

Childhood outcomes

In ALSPAC, measurements of height and weight, with the participant in light clothing and without shoes, were obtained at clinic visits when the children were seven years of age to calculate BMI. Non-verbal IQ at age 8 years was measured by the Wechsler Intelligence Scale for Children WISC-III UK [27]. ADHD symptomatology was assessed via maternal ratings at age 7, with the Development and Well-Being Assessment interview (DAWBA) [28].

In Generation R, when children were 6.0 (SD = 0.15) years old, children’s height and weight were measured at the research center without shoes or heavy clothing and used for the calculation of BMI (kg/m2). Non-verbal IQ was assessed at the same age using the Snijder-Oomen nonverbal intelligence test [29]. ADHD symptoms were rated by a primary caregiver (90% mothers) using the Conners’ Parent Rating Scale-Revised (CPRS-R) questionnaire at age 8.1 (SD = 0.15) [30].


In ALSPAC, mothers were asked about their smoking during pregnancy, and these data were used to generate a binary variable of any smoking during pregnancy. Maternal education was collapsed into whether they had achieved a university degree or not. White cell proportions were estimated with the Houseman method using a combination of four reference panels specific for cord blood [31]. In Generation R, maternal age was obtained at enrollment. Maternal smoking was defined as either “Never smoked”, “Quit smoking in early pregnancy”, “Continued smoking during pregnancy”. Maternal education during pregnancy was categorized as “no education”, “primary education”, “secondary education first phase”, “secondary education second phase”, higher education first phase”, higher education second phase”. See Table 1 for descriptive statistics of all variables.

Statistical analysis

We adapted the GREML approach to estimate R2Methylation. The GREML procedure consists of two steps: (1) Compute a genetic relatedness matrix (i.e. how genetically similar two individuals are based on SNP data), (2) Regress the outcome similarity between participants on the genetic relatedness (i.e. to establish whether greater genetic similarity between individuals relates to greater phenotypic similarity).

We refer to a methylation similarity matrix (M) as opposed to a genetic relatedness matrix (G). However, both M and G can be calculated with the same algorithm. First the methylation in beta values were z-score standardized. The resulting matrix (X) of methylation z-scores (columns: CpG sites, rows: participants) was then multiplied with the transpose resulting in XX’. XX’ was then standardized by dividing the matrix with the mean of the diagonal, resulting in an average value of 1 for the diagonal of M. We used the R package BGData 2.1.0 [32] to compute the similarity matrix.

The next step is to regress the outcomes on M and covariates using a mixed effects model fitted with REML. Fixed effects covariates included several variables known to be associated with DNAm levels: sex, maternal age, maternal smoking, maternal education, cell type proportions, gestational age, birth weight (unless a variable was the outcome). M and batch were defined as random effects.

Multiple imputation was used to avoid potential bias due to missing data and to make analyses more comparable between outcomes by including the same set of participants. We used the covariate and outcome variables to predict missing variables in 10 imputations with 30 iterations using MICE [33] in R.

According to power analyses with genetic data, to accurately estimate the variance explained using GREML methods, large sample sizes are necessary. Especially with less heritable traits sample sizes above 5000 participants are recommended [34]. Currently, studies that have measured DNAm and child outcomes in more than 1000 participants are rare. While the power requirements for DNAm data are unclear, there is nevertheless a high risk of sampling variance, with results randomly changing heavily depending on a particular sample composition. We attempted to reduce these risks by estimating R2Methylation in two independent cohorts, as well as by performing cross-validation within cohorts.

Cross-validation (CV) was applied in the following way: (1) M was estimated across all participants and stayed constant independent of later training and test assignment. (2) Ninety percent of the sample was randomly chosen as training sample and the GREML model was fitted in this training sample. (3) Based on the results of the training sample the best linear unbiased predictions (BLUP) were extracted for the test sample. The BLUP estimates are continuous values, which reflect the extent to which participants are predicted to have above or below average outcome values, based on their similarity in genome-wide methylation to other participants. (4) The outcome is predicted based on M and the covariates in the test set. (5) The predictions are correlated with the actual observed outcome in the test set and squared to obtain the variance explained by the model. (6) The variance explained by a covariate only model is subtracted to obtain the variance explained by DNAm beyond the other tested variables (ΔR2Methylation). (7) Step 1–6 are repeated to have results for 100 random training–testing splits (Monte-Carlo cross-validation). (8) The mean estimate of ΔR2Methylation, with standard deviation across the cross-validation splits are extracted. (9) Steps 1–8 are repeated for each imputed dataset (n = 10) and then pooled using Rubin’s rule. (10) Results of both cohorts are averaged, weighted by the inverse of the cross-validation variance.

We reran all analyses with an M matrix based on independent methylation features only. To construct such a matrix, we used EnMCB 1.7.3 to identify MCBs [35]. We aggregated neighboring CpG sites with r =  > 0.3 correlation and maximum 1000 bp gap into MCBs. We then averaged the methylation betas of all CpGs within one MCB resulting in a single value, representing the methylation status of a MCB. The MCB methylation levels were then combined with independent CpG sites (r < 0.3) and used to compute an M matrix. ALSPAC used 60,722 MCBs and 167,440 independent CpGs to produce an independent M matrix and Generation R used 71,175 MCBs alongside 206,062 independent CpGs sites.

These analyses were run with the qgg package in R, which has implemented GREML models with cross-validation [36]. We wrote additional functions, which can be found in the omicsR2 package: The provided functions simplify the process of comparing the predictive performance of DNA methylation compared to a covariates-only baseline model and add support for multiply imputed data.

Availability of data and materials

The datasets generated and analyzed during the current study are not publicly available to ensure participant privacy and compliance with Dutch and UK law, but are available on reasonable request. For Generation R data, please contact management ( and the corresponding author. For ALSPAC, please see for instructions on data access.



Epigenome-wide association studies


Body mass index




DNA methylation


Methylation similarity matrix


Genetic relatedness matrix


  1. Harris CJ, Scheibe M, Wongpalee SP, Liu W, Cornett EM, Vaughan RM, et al. A DNA methylation reader complex that enhances gene transcription. Science. 2018;362(6419):1182–6.

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Hannon E, Knox O, Sugden K, Burrage J, Wong CCY, Belsky DW, et al. Characterizing genetic and environmental influences on variable DNA methylation using monozygotic and dizygotic twins. PLoS Genet. 2018;14(8):1–27.

    Google Scholar 

  3. Van Dongen J, Nivard MG, Willemsen G, Hottenga JJ, Helmer Q, Dolan CV, et al. Genetic and environmental influences interact with age and sex in shaping the human methylome. Nat Commun. 2016;7:1–13.

    Google Scholar 

  4. Dall’ Aglio L, Muka T, Cecil CAM, Bramer WM, Verbiest MMPJ, Nano J, et al. The role of epigenetic modifications in neurodevelopmental disorders: a systematic review. Neurosci Biobehav Rev. 2018;94:17–30.

    Google Scholar 

  5. Yang J, Lee S, Goddard M, Visscher P. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Vazquez AI, Veturi Y, Behring M, Shrestha S, Kirst M, Resende MFR, et al. Increased proportion of variance explained and prediction accuracy of survival of breast cancer patients with use of whole-genome multiomic profiles. Genetics. 2016;203(3):1425–38.

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Zhang F, Chen W, Zhu Z, Zhang Q, Nabais MF, Qi T, et al. OSCA: a tool for omic-data-based complex trait analysis. Genome Biol. 2019;20(1):1–13.

    Google Scholar 

  8. Trejo Banos D, McCartney DL, Patxot M, Anchieri L, Battram T, Christiansen C, et al. Bayesian reassessment of the epigenetic architecture of complex traits. Nat Commun. 2020;11(1):1–14.

    Google Scholar 

  9. Merid SK, Novoloaca A, Sharp GC, Küpers LK, Kho AT, Roy R, et al. Epigenome-wide meta-analysis of blood DNA methylation in newborns and children identifies numerous loci related to gestational age. Genome Med. 2020;12(1):25.

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Bohlin J, Håberg SE, Magnus P, Reese SE, Gjessing HK, Magnus MC, et al. Prediction of gestational age based on genome-wide differentially methylated regions. Genome Biol. 2016;17(1):1–9.

    Google Scholar 

  11. Knight AK, Craig JM, Theda C, Bækvad-Hansen M, Bybjerg-Grauholm J, Hansen CS, et al. An epigenetic clock for gestational age at birth based on blood methylation data. Genome Biol. 2016;17(1):1–11.

    Google Scholar 

  12. Küpers LK, Monnereau C, Sharp GC, Yousefi P, Salas LA, Ghantous A, et al. Meta-analysis of epigenome-wide association studies in neonates reveals widespread differential DNA methylation associated with birthweight. Nat Commun. 2019;10(1):1–11.

    Google Scholar 

  13. Neumann A, Walton E, Alemany S, Cecil C, Gonzalez JR, Jima DD, et al. Association between DNA methylation and ADHD symptoms from birth to school age: a prospective meta-analysis. Transl Psychiatry. 2020;10(398):1–11.

    Google Scholar 

  14. Vehmeijer FOL, Küpers LK, Sharp GC, Salas LA, Lent S, Jima DD, et al. DNA methylation and body mass index from birth to adolescence: meta-analyses of epigenome-wide association studies. Genome Med. 2020;12(1):105.

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Caramaschi D, Neumann A, Cardenas A, et al. Meta-analysis of epigenome-wide associations between DNA methylation at birth and childhood cognitive skills. Mol Psychiatry. 2022.

    PubMed  Google Scholar 

  16. Niarchou M, Zammit S, Lewis G. The avon longitudinal study of parents and children (ALSPAC) birth cohort as a resource for studying psychopathology in childhood and adolescence: a summary of findings for depression and psychosis. Soc Psychiatry Psychiatr Epidemiol. 2015;50(7):1017–27.

    PubMed  Google Scholar 

  17. Fraser A, Macdonald-Wallis C, Tilling K, Boyd A, Golding J, Smith GD, et al. Cohort profile: the avon longitudinal study of parents and children: ALSPAC mothers cohort. Int J Epidemiol. 2013;42(1):97–110.

    PubMed  Google Scholar 

  18. Kooijman MN, Kruithof CJ, van Duijn CM, Duijts L, Franco OH, van IJzendoorn MH, et al. 2017 The generation R study: design and cohort update 2017. Eur J Epidemiol. 2016;31(12):1243–64.

    PubMed  Google Scholar 

  19. Trzaskowski M, Dale PS, Plomin R. No genetic influence for childhood behavior problems from DNA analysis. J Am Acad Child Adolesc Psychiatry. 2013;52(10):1048-1056.e3.

    PubMed  PubMed Central  Google Scholar 

  20. Pappa I, Fedko IO, Mileva-Seitz VR, Hottenga J-J, Bakermans-Kranenburg MJ, Bartels M, et al. Single nucleotide polymorphism heritability of behavior problems in childhood: genome-wide complex trait analysis. J Am Acad Child Adolesc Psychiatry. 2015;54(9):737–44.

    PubMed  Google Scholar 

  21. Middeldorp CM, Hammerschlag AR, Ouwens KG, Groen-blokhuis MM, Greven CU, Pappa I, et al. A genome-wide association meta-analysis of attention-deficit/hyperactivity disorder symptoms in population-based pediatric cohorts. J Am Acad Child Adolesc Psychiatry. 2016;55(10):896–905.

    PubMed  PubMed Central  Google Scholar 

  22. Czamara D, Eraslan G, Page CM, Lahti J, Lahti-Pulkkinen M, Hämäläinen E, et al. Integrated analysis of environmental and genetic influences on cord blood DNA methylation in new-borns. Nat Commun. 2019;10(1):1–18.

    CAS  Google Scholar 

  23. Min JL, Hemani G, Davey Smith G, Relton C, Suderman M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics. 2018;34(23):3983–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Lehne B, Drong AW, Loh M, Zhang W, Scott WR, Tan ST, et al. A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biol. 2015;16(1):1–12.

    CAS  Google Scholar 

  25. Mulder RH, Neumann A, Cecil CAM, Walton E, Houtepen LC, Simpkin AJ, et al. Epigenome-wide change and variation in DNA methylation in childhood: trajectories from birth to late adolescence. Hum Mol Genet. 2021;7(9):13.

    Google Scholar 

  26. Verburg BO, Steegers EAP, De Ridder M, Snijders RJM, Smith E, Hofman A, et al. New charts for ultrasound dating of pregnancy and assessment of fetal growth: longitudinal data from a population-based cohort study. Ultrasound Obstet Gynecol. 2008;31(4):388–96.

    CAS  PubMed  Google Scholar 

  27. Wechsler D, Golombok S, Rust J. WISC-III UK Wechsler intelligence scale for children: UK manual. Sidcup, UK Psychol Corp. 1992.

  28. Goodman A, Heiervang E, Collishaw S, Goodman R. The, “DAWBA bands” as an ordered-categorical measure of child mental health: description and validation in British and Norwegian samples. Soc Psychiatry Psychiatr Epidemiol. 2011;46(6):521–32.

    PubMed  Google Scholar 

  29. Tellegen P, Laros J. The construction and validation of a nonverbal test of intelligence: the revision of the Snijders-Oomen tests. Eur J Psychol Assess. 1993;9(2):147–57.

    Google Scholar 

  30. Conners CK, Sitarenios G, Parker JDA, Epstein JN. The revised Conners’ Parent Rating Scale (CPRS-R): Factor structure, reliability, and criterion validity. J Abnorm Child Psychol. 1998;26(4):257–68.

    CAS  PubMed  Google Scholar 

  31. Gervin K, Salas LA, Bakulski KM, Van Zelm MC, Koestler DC, Wiencke JK, et al. Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data. Clin Epigenetics. 2019;11(1):1–15.

    CAS  Google Scholar 

  32. Grueneberg A, de los Campos G. 2019 BGData—a suite of R packages for genomic analysis with big data. Genes Genom Genet. 2019;9(5):1377–83.

    CAS  Google Scholar 

  33. Buuren S, Groothuis-Oudshoorn K. MICE: multivariate imputation by chained equations in R. J Stat Softw. 2011;45:1–67.

    Google Scholar 

  34. Visscher PM, Hemani G, Vinkhuyzen AAE, Chen G-B, Lee SH, Wray NR, et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet. 2014;10(4):e1004269.

    PubMed  PubMed Central  Google Scholar 

  35. Yu X, Kong D-X. EnMCB: an R/bioconductor package for predicting disease progression based on methylation correlated blocks using ensemble models. Bioinformatics. 2021;37:4282–4.

    CAS  Google Scholar 

  36. Rohde PD, Sørensen IF, Sørensen P. qgg : an R package for large-scale quantitative genetic analyses implementation and main functions analysing human height. Bioinformatics. 2020;36(8):2614–5.

    CAS  PubMed  Google Scholar 

Download references


We thank all the children and families who took part in this study, as well as the support of general practitioners, hospitals, midwives and pharmacies. ALSPAC We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. GENR The Generation R Study is conducted by Erasmus MC in close collaboration with the Erasmus University Rotterdam, Faculty of Social Sciences, the Municipal Health Service Rotterdam area, the Rotterdam Homecare Foundation, Rotterdam and the Stichting Trombosedienst & Artsenlaboratorium Rijnmond (STAR-MDC), Rotterdam. The generation and management of the Illumina 450K methylation array data (EWAS data) for the Generation R Study was executed by the Human Genotyping Facility of the Genetic Laboratory of the Department of Internal Medicine, Erasmus MC, the Netherlands. We thank Mr. Michael Verbiest, Ms. Mila Jhamai, Ms. Sarah Higgins, Mr. Marijn Verkerk and Dr. Lisette Stolk for their help in creating the EWAS database. We thank Dr. A.Teumer for his work on the quality control and normalization scripts.


ALSPAC The UK Medical Research Council (MRC) and Wellcome (Grant ref: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and EW will serve as guarantors for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website ( Methylation data in the ALSPAC cohort were generated as part of the UK BBSRC funded (BB/I025751/1 and BB/I025263/1) Accessible Resource for Integrated Epigenomic Studies (ARIES, EW was partially funded by the Bath Institute for Mathematical Innovation. EW is also funded by the European Union’s Horizon 2020 research and innovation programme (grant nº 848158) and by CLOSER, whose mission is to maximise the use, value and impact of longitudinal studies. CLOSER was funded by the Economic and Social Research Council (ESRC) and the Medical Research Council (MRC) between 2012 and 2017. Its initial five year grant has since been extended to March 2021 by the ESRC (grant reference: ES/K000357/1). The funders took no role in the design, execution, analysis or interpretation of the data or in the writing up of the findings. GENR The general design of the Generation R Study is made possible by financial support from Erasmus MC, University Medical Center Rotterdam, Erasmus University Rotterdam, the Netherlands Organization for Health Research and Development (ZonMw) and the Ministry of Health, Welfare and Sport. The EWAS data was funded by a grant from the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) Netherlands Consortium for Healthy Aging (NCHA; project nr. 050-060-810), by funds from the Genetic Laboratory of the Department of Internal Medicine, Erasmus MC, and by a grant from the National Institute of Child and Human Development (R01HD068437). A. Neumann and H. Tiemeier are supported by a grant of the Dutch Ministry of Education, Culture, and Science and the Netherlands Organization for Scientific Research (NWO grant No. 024.001.003, Consortium on Individual Development). A. Neumann is also supported by a Canadian Institutes of Health Research team grant. The work of H. Tiemeier is further supported by a NWO-VICI grant (NWO-ZonMW: 016.VICI.170.200). The work of CC has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement No 707404.This project received funding from the European Union’s Horizon 2020 research and innovation programme (733206, LifeCycle; 633595, DynaHEALTH; 848158, EarlyCause, 874739, LongITools)) and from the European Joint Programming Initiative “A Healthy Diet for a Healthy Life” (JPI HDHL, NutriPROGRAM project, ZonMw the Netherlands no.529051022 and PREcisE project ZonMw the Netherlands no.529051023).

Author information

Authors and Affiliations



AN, EW and CC developed the study design and drafted the manuscript. AN and EW performed statistical analysis on the GenR and ALSPAC data respectively and wrote the omicsR2 package. EW and CC supervised the study and share senior authorship. JFF, JBP and HT advised on research design and statistical analysis. JFF manages the DNA methylation data in GenR. VWJD is GenR director and oversaw data collection. All authors revised the manuscript critically. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alexander Neumann.

Ethics declarations

Ethics approval and consent to participate

Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. All parents gave informed consent for their children's participation. The Generation R Study is conducted in accordance with the Declaration of Helsinki. Study protocols were approved by the Ethics Committee of Erasmus MC.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Neumann, A., Pingault, JB., Felix, J.F. et al. Epigenome-wide contributions to individual differences in childhood phenotypes: a GREML approach. Clin Epigenet 14, 53 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: