Epigenetic gestational age acceleration: a prospective cohort study investigating associations with familial, sociodemographic and birth characteristics

Background Gestational age at delivery is associated with health and social outcomes. Recently, cord blood DNA methylation data has been used to predict gestational age. The discrepancy between gestational age predicted from DNA methylation and determined by ultrasound or last menstrual period is known as gestational age acceleration. This study investigated associations of sex, socioeconomic status, parental behaviours and characteristics and birth outcomes with gestational age acceleration. Results Using data from the Avon Longitudinal Study of Parents and Children (n = 863), we found that pre-pregnancy maternal overweight and obesity were associated with greater gestational age acceleration (mean difference = 1.6 days, 95% CI 0.7 to 2.6, and 2.9 days, 95% CI 1.3 to 4.4, respectively, compared with a body mass index < 25 kg/m2, p < .001). There was evidence of an association between male sex and greater gestational age acceleration. Greater gestational age acceleration was associated with higher birthweight, birth length and head circumference of the child (mean differences per week higher gestational age acceleration: birthweight 0.1 kg, 95% CI 0.1 to 0.2, p < .001; birth length 0.4 cm, 95% CI 0.2 to 0.7, p < .001; head circumference 0.2 cm, 95% CI 0.1 to − 0.4, p < .001). There was evidence of an association between gestational age acceleration and mode of delivery (assisted versus unassisted delivery, odds ratio = 0.9 per week higher gestational age acceleration, 95% CI 0.7, 1.3 (p = .05); caesarean section versus unassisted delivery, odds ratio = 0.6, 95% CI 0.4 to 0.9 per week higher gestational age acceleration (p = .05)). There was no evidence of association for other parental and perinatal characteristics. Conclusions The associations of higher maternal body mass index and larger birth size with greater gestational age acceleration may imply that maternal overweight and obesity is associated with more rapid development of the fetus in utero. The implications of gestational age acceleration for postnatal health warrant further investigation. Electronic supplementary material The online version of this article (10.1186/s13148-018-0520-1) contains supplementary material, which is available to authorized users.


Background
Preterm birth (≤ 37 weeks gestation) is associated with numerous health consequences such as increased mortality [1], hypertension [2,3], insulin resistance [4] and respiratory problems in later life [5,6]. Indeed, each additional gestational day at birth is associated with improved medical and neuropsychological outcomes in childhood [7]. Gestational age (GA) at delivery is typically determined via early obstetric ultrasound or last menstrual period (LMP), with ultrasound methods considered the more reliable [8], 'gold standard' procedure.
Recently, DNA methylation (DNAm) has been used to predict GA at delivery [9,10]. This method builds on work that used DNAm to predict chronological age [11] and subsequent work, showing that the differences between predicted and chronological age are associated with disease outcomes. Those with DNAm-predicted ages that exceeded their chronological ages (age acceleration, AA) have a higher risk of cancer incidence [12][13][14], Alzheimer's disease [15] and mortality [15][16][17][18][19][20]. Of note, the term AA is commonly used in the literature to describe both positive and negative differences (i.e. predicted ages above or below chronological ages), which could be misleading, but we use the term to be consistent with previous literature.
In a similar way, the gestational epigenetic clocks developed by Bohlin et al. [10] and Knight et al. [9] can be compared with actual GA to determine gestational age acceleration (GAA). There is little existing research on the association of potential predictors of GAA or the potential outcomes of GAA.
The aim of this study was to apply a previously published model for predicting GA from DNAm [10] and use this model to estimate GAA in order to (i) explore potential predictors of GAA by assessing the association of a broad range of socioeconomic variables and parental characteristics with GAA and (ii) explore potential outcomes of GAA by assessing associations of GAA with delivery and postnatal factors, using data from the Accessible Resource for Integrated Epigenomic Studies (ARIES) project, a subsample of child-mother pairs from the Avon Longitudinal Study of Parents and Children (ALSPAC).

Study sample characteristics
The characteristics of the 863 participants from the ARIES cohort included in our analysis are displayed in Tables 1  and 2, and Additional file 1: Table S1 describes the differences between these participants and the full ALSPAC cohort from which ARIES is a subsample. Additional file 1: Table S2 shows the full range of GAs of the 863 participants included in the analysis.

Modelling of GA estimates
GA was estimated from ARIES cord blood DNAm using the model of Bohlin et al. [10]. Correlation between estimated and reported GA was high (correlation coefficient r = 0.65) though not as high as that reported in the original publication (r = 0.81). Some reduction, however, was expected because the Bohlin et al. [10] model was trained and tested in distinct subsets of the same Norwegian cohort. We elected not to use the model of Knight et al. [9] because we were less confident that it would produce meaningful GAA estimates given its low correlation with GA in ARIES (r = 0.37) [21].
Associations of gender, socioeconomic and parental factors with GAA Females had lower GAA than males by 0.8 days after adjusting for sex and cell type proportion (mean difference [MD] = − 0.8 days; 95% CI − 1.4, − 0.1, p = .024; Table 3).
Maternal pre-pregnancy overweight and obese status was associated with higher GAA compared with maternal pre-pregnancy body mass index (BMI) of < 25 kg/m 2 (MD = 1.6 days, 95% CI 0.7, 2.6 for overweight; MD = 2.9 days, 95% CI 1.3, 4.4 for obese, p < .001, see Table 4) after adjusting for sex, cell type proportion and parental socioeconomic factors. As GAA calculated using Bohlin et al. methods [10] is correlated with birthweight [21], we further adjusted these models including birthweight as a covariate to assess whether birthweight may be driving this effect. The results were not substantially different when additionally adjusting for birthweight (Additional file 1: Table S3).
No clear associations were found for parental education, relationship status, smoking, alcohol consumption, depression or age, nor with housing tenure, financial difficulties, parity or pregnancy complications with GAA (Tables 3 and 4).
There were no substantial differences between the imputed and observed data sets (Additional file 1: Table S4).   Adjusting for cell type proportion did not substantially alter our results (see Additional file 1: Tables S5 to S7 for unadjusted results) and neither did adjustment for socioeconomic status (SES) and other potential confounders.

Discussion
Our analyses indicated that male sex, higher maternal pre-pregnancy BMI and vaginal delivery are associated with higher GAA. Our results also indicated that higher GAA was associated with birth size (birthweight, birth length and head circumference). There was no clear evidence of any associations of GAA with parental education, relationship status, smoking, alcohol consumption, depression or age, nor with housing tenure, financial difficulties, parity, pregnancy complications or APGAR scores. Unlike in AA research, where AA has been associated with negative outcomes such as all-cause mortality and Alzheimer's disease [15][16][17][18][19][20], it is currently unclear whether accelerated GA at birth is beneficial or detrimental to a fetus or newborn. From previous research using ARIES data, we know sex, birthweight, caesarean section and maternal BMI have also been associated with AA [22]. A recent study has also shown associations of GAA with birth size and sex [23]. However, in their main analysis, associations with birth size were in the opposite direction to our analysis when the raw GAA-GA difference was used as the outcome, i.e. higher GAA was associated with smaller size at birth. When calculating GAA using the residuals from a regression of DNAm-predicted GA on reported GA, as GAA was calculated in this study, there was no clear association between GAA and size at birth. In further contrast, we did not replicate their associations with maternal age, APGAR scores (at 1 min) and pregnancy complications (pre-eclampsia). The discrepancies in the results could be related to the focus on raw differences rather than residuals in the main analysis of Girchenko et al. [23]. We did not estimate GAA as the difference between DNAm-predicted GA and reported GA because the confounding effect of GA is not accounted for in this approach, whereas the residual-based approach ensures GAA is uncorrelated with GA.
Another explanation for the discrepancies between our findings and the findings of Girchenko et al. [23] is their use of the Knight et al. [9] GA prediction model rather than the Bohlin et al. [10] model applied in this study. We have previously noted several key methodological differences in the derivation of the Knight et al. [9] and the Bohlin et al. [10] models that influence the accuracy of the GA prediction in this cohort [21], such as the inclusion of preterm infants in the test set of the Knight et al. model [9] which is inappropriate for a data set with few pre-term births (as in this study). Additionally, the number of CpGs (148) included in the Knight et al. [9] training model was close to the sample size of the training set (207) and the model was then tested with a much larger sample size which may have resulted in an overfitting of the model. In contrast, the number of CpGs (96) in the Bohlin et al. [10] training model was much lower than the sample size of the training set (1068). Thus, the Bohlin et al. [10] model provided us with the best estimate of GAA for this cohort [21], which was reflected in the stronger correlation found between reported GA in the ARIES cohort and the model predictions of GA compared to the Knight et al. [9] model. The Bohlin et al. [10] model for prediction of GA from DNAm performed well in our data, adding support to the notion that DNAm could potentially be used as a marker of GA in data sets where GA has not been measured.
In this study, we were able to apply the epigenetic clock created by Bohlin et al. [10] in ARIES data. Despite the original model being trained only on a Norwegian cohort, the methods of Bohlin et al. [10] transferred to a UK cohort with considerable accuracy (r = 0.65). Additionally, the use of ARIES and ALSPAC data (a large and rich Results are from multiply imputed data; coefficients are mean differences adjusted for sex (except for when sex is the exposure) and cell type proportion source of longitudinal data for children and their families) allowed us to assess the associations of GAA with a wide range of socioeconomic, parental and perinatal factors. The longitudinal nature of ARIES also enabled comparisons between GAA and AA at multiple ages.
Although the ARIES sub-sample is a more affluent sample of the full ALSPAC cohort, our results were robust to adjustment for SES. This is in line with some evidence that such differences are unlikely to severely bias association studies [24][25][26][27]. To account for missing data, we used multiple imputation, which maximised our statistical power. The results were consistent when using complete-case observed data.
Following Gervin et al.'s [28] methods, the regression analyses were adjusted for cell type proportions, even though GA is associated with variation in cell type proportions and this adjustment could therefore potentially bias the results. However, comparing the data before and after adjustment, there were no substantial differences in the results. Additionally, only weak associations were found between GA and cell type composition in this study (Additional file 1: Figure S1). Another potential issue with adjusting for cell type composition arises from deriving cord blood references from full-term births only; cord blood cell counts may be inaccurate for cord blood methylation profiles of preterm infants. However, a small proportion of the participants in our study were pre-term (3%) so it is unlikely that this will have substantially impacted the results.
Interestingly, there is little overlap between probes used in the Bohlin et al. [10] model of GA predication and the Horvath [11] model of age prediction, with only one CpG site (cg08965235 in the latent-transforming growth factor beta-binding protein 3 gene) overlapping between the models. In contrast to the Horvath [11] model, which compares accurate measures of chronological age and epigenetic predictions, the Bohlin et al. [10] model compares potentially inaccurate LMP/ultrasound estimates to epigenetic predictions. Consequently, inaccurate GA estimates may have impacted upon the GAA estimates in this research, especially as the majority of the GA estimates in ALSPAC are derived from LMP, since ultrasound estimation of GA was not common at the time of recruitment. Additionally, the GA predictions using the Bohlin et al. [10] model were more accurate using ultrasound methods rather than LMP. This may mean that our estimates may not be as accurate as if ultrasound GAs had been used and this could explain the discrepancy in accuracy between the Bohlin et al. [10] test data and the ARIES data, as detailed by Simpkin and colleagues [21]. Results are from multiply imputed data; coefficients are mean differences (MD) adjusted for sex and cell type proportion (model 1) and additionally for parental social class, education, housing tenure and financial difficulties (model 2). Parental depression was measured using the Edinburgh Postnatal Depression Scale. The pregnancy complication analysis additionally adjusted for all other parental behaviour covariates BMI body mass index Results are from multiply imputed data; coefficients are mean differences (MD) or odds ratios (OR) adjusted for sex and cell type proportion in model 1 and additionally adjusted for parental social class, education, smoking, alcohol use, depression, body mass index, age and relationship status as well as housing tenure, financial difficulties and parity in model 2. APGAR scores are based on Appearance, Pulse, Grimace, Activity, and Respiration at birth

Conclusions
Our results suggest that higher maternal BMI is strongly associated with higher GAA and that higher GAA is strongly associated with larger size at birth (birthweight, birth length and head circumference). In addition, we found weaker associations of sex and delivery method with GAA. Our results may indicate that having a BMI over 25 kg/m 2 is associated with more rapid development of the fetus in utero. The implications of GAA for postnatal growth, development and health warrant further investigation.

Study population
This study used DNAm data generated under the auspices of the ALSPAC [29,30]. ALSPAC recruited 14,541 pregnant women with expected delivery dates between April 1991 and December 1992. Of these initial pregnancies, there were 14,062 live births and 13,988 children who were alive at 1 year of age. The study website contains details of all the data that are available through a fully searchable data dictionary (http://www.bris.ac.uk/ alspac/researchers/data-access/data-dictionary).
As part of the ARIES [31] project (http://www.ariesepigenomics.org.uk), a sub-sample of 1018 ALSPAC child-mother pairs had DNAm measured using the Infinium HumanMethylation450 BeadChip (Illumina, Inc.) [32]. The ARIES sub-sample was selected based on availability of DNA samples at three time points (birth, mean 7.5 years and mean 15.5 years). DNAm was measured three times in ALSPAC offspring, from cord blood at birth and from peripheral blood at approximately ages 7 and 17.
Laboratory methods, quality control and pre-processing All DNAm wet-lab and pre-processing analyses were performed at the University of Bristol as part of the ARIES project. Following extraction, DNA was bisulphite converted using the Zymo EZ DNA MethylationTM kit (Zymo, Irvine, CA). Infinium HumanMethylation450 BeadChips were used to measure genome-wide DNAm levels at over 485,000 CpG sites. The arrays were scanned using an Illumina iScan, with initial quality review using GenomeStudio. The level of DNAm is expressed as a 'beta' value (β value), ranging from 0 (no cytosine methylation) to 1 (complete cytosine methylation). β values are reported as percentages. Several quality control steps were included in the laboratory pipeline which are described in detail elsewhere [33].

Epigenetic GA prediction
Using a recently published model [10], we derived epigenetic gestational age (EGA) from cord blood DNAm. The Bohlin et al. [10] model was chosen over the Knight et al. [9] model due to its much stronger correlation with GA in ARIES (r = 0.65 compared to r = 0.37). This epigenetic clock for GA at delivery uses 96 CpG sites to predict GA from cord blood methylation. We obtained GAA as the residuals from a regression of EGA on observed GA. GA was gathered from clinical records and determined by LMP for the majority; however, on some occasions, this measure was updated following a dating ultrasound. It is not known for which individual GA was based on LMP or ultrasound but as updating GA based on ultrasound was not common practice at the time of the measurement, the numbers are likely to be low. To be consistent with previous literature, we have used the terms ' AA' and 'GAA' to describe both positive and negative differences (i.e. predicted ages above or below chronological/gestational ages). A positive GAA corresponds to an EGA that was higher than actual GA and vice versa.

Socioeconomic, parental and perinatal characteristics
Socioeconomic factors included housing tenure, social class, parental education and financial difficulties. Parental factors included parental smoking, alcohol use, mental health, relationship status, BMI and age. Finally, perinatal variables considered were child's sex, birthweight, birth length, head circumference and APGAR score at 5 min as well as the occurrence of any pregnancy complications and the delivery type. All variables were measured through questionnaires at different times during pregnancy (socioeconomic and parental variables), by trained ALSPAC staff shortly after birth (anthropometry at birth) or from obstetric records (pregnancy complications, child's sex and APGAR score). Full details of measurement of these factors are in Additional file 1.

Statistical analysis
Sex, SES, parental behaviours and pregnancy complications were analysed as potential determinants of GAA. Associations between these factors and GAA were assessed in linear regression models with GAA as the outcome. Models with parental behaviours as the exposure were adjusted for SES variables as confounders. Birth size, delivery method and APGAR score were considered as potential outcomes of GAA. Associations of GAA with these factors were assessed using linear or multinomial logistic regression as appropriate, with GAA as an exposure and parental behaviour and SES variables included as potential confounders. We performed the analysis in this way due to the temporal ordering of the variables, although we do not necessarily hypothesise a direct causal effect of GAA on these outcome variables. The associations were analysed in two models: (1) adjusted for sex and cell type proportion and (2) with additional adjustment for potential confounders, as appropriate for the specific model. The Gervin et al. [28] methods were used for cell-type proportion estimations. Due to missingness in the observed data set, analyses were completed using 100 multiply imputed data sets (see Additional file 1: Table S8 for further information). There were no substantial differences between the analysis of the observed data and the multiply imputed data sets.