DNA methylation links prenatal smoking exposure to later life health outcomes in offspring

Background Maternal smoking during pregnancy is associated with adverse offspring health outcomes across their life course. We hypothesize that DNA methylation is a potential mediator of this relationship. Methods We examined the association of prenatal maternal smoking with offspring blood DNA methylation in 2821 individuals (age 16 to 48 years) from five prospective birth cohort studies and perform Mendelian randomization and mediation analyses to assess whether methylation markers have causal effects on disease outcomes in the offspring. Results We identify 69 differentially methylated CpGs in 36 genomic regions (P value < 1 × 10−7) associated with exposure to maternal smoking in adolescents and adults. Mendelian randomization analyses provided evidence for a causal role of four maternal smoking-related CpG sites on an increased risk of inflammatory bowel disease or schizophrenia. Further mediation analyses showed some evidence of cg25189904 in GNG12 gene mediating the effect of exposure to maternal smoking on schizophrenia-related outcomes. Conclusions DNA methylation may represent a biological mechanism through which maternal smoking is associated with increased risk of psychiatric morbidity in the exposed offspring. Electronic supplementary material The online version of this article (10.1186/s13148-019-0683-4) contains supplementary material, which is available to authorized users.


Background
Maternal smoking during pregnancy is associated with increased risk for pre-term birth, fetal growth restriction, and low birth weight [1][2][3], as well as neurodevelopmental impairments and respiratory and cardiovascular diseases later in life [4][5][6][7][8]. Despite these well-known risks, many women who commence pregnancy as smokers continue to smoke throughout gestation. According to a recent meta-analysis, the global prevalence of maternal smoking during pregnancy varies widely from a few percentages up to nearly 40% in Ireland [9]. Thus, cigarette smoking continues to be one of the most important modifiable risk factors for the health of mothers and their children.
Cigarette smoke is a potent environmental modifier of DNA methylation [10]. In support of this, an epigenome-wide meta-analysis of 13 birth cohort studies identified over 6000 differentially methylated CpGs in the cord blood of newborns exposed to prenatal smoking [11]. Several smaller studies have suggested that some of these methylation changes may persist across childhood and adolescence into adulthood [12][13][14][15]. However, questions remain concerning whether such DNA methylation changes endure across the life course and whether they play a mediating role in linking prenatal smoke exposure to later life health outcomes.
Here, we combine data from five prospective birth cohort studies to investigate associations between prenatal smoking exposure and offspring blood DNA methylation in 2821 adolescents and adults. We first examine the associations of prenatal smoking exposure with DNA methylation in each cohort and then meta-analyze the results across all studies. We focus on the > 6000 CpG sites previously identified in the cord blood of newborns exposed to prenatal smoking [11]. We further (i) assess the impact of current smoking by the participant on DNA methylation, (ii) explore the dose-dependent effects of prenatal smoking exposure on methylation at key CpG sites, (iii) examine the potential intrauterine effect of smoking exposure on offspring DNA methylation by using paternal smoking as a negative control, (iv) assess the persistence of DNA methylation changes by investigating longitudinal associations from 30 to 48 years of age, and (v) conduct Mendelian randomization (MR) and mediation analyses to examine the potential causal effects of DNA methylation changes on disease outcomes in the offspring (Fig. 1). Our results show that prenatal smoking has persistent effects on the offspring epigenome and provide evidence for a causal role of DNA methylation in adverse health effects that may arise from exposure to tobacco smoke in utero.

Cohort-specific characteristics of the study participants
We analyzed the association of prenatal smoking exposure with blood DNA methylation in altogether 1366 adolescents (age 16 to 18 years) and 1455 adults (age 30 to 31 years). Of these, 1145 were from two independent Northern Finland Birth cohorts (NFBC1966 and NFBC1986), 257 were from the Isle of Wight Birth Cohort (IOWBC), and 1419 from two Avon Longitudinal Study of Parents and Children cohorts (ALSPAC mothers and ALSPAC children). Additional file 1, Additional file 2, and Additional file 3 show the characteristics of each study cohort. Overall, 18.4% of the NFBC 1966 and 13.2% of the NFBC 1986 were prenatally exposed to maternal smoking. The corresponding figures were 11.8% for ALSPAC children, 28.7% for ALSPAC mothers, and 16.3% for IOWBC.

DNA methylation meta-analysis
We found evidence for 69 differentially methylated CpGs in 36 genomic regions ( Table 1). All of these CpG sites showed directionally concordant effects with previously reported associations in newborns [11], e.g., hypermethylation of cg04180046 in MYOG1 and cg05549655 in CYP1A1 and hypomethylation of cg05575921 in AHRR and cg14179389 in GFI1 in the exposed offspring compared with their unexposed counterparts.

Sensitivity and downstream analyses
To examine whether offspring's own smoking had influenced the results, we repeated the main analysis including only those individuals who had never smoked regularly in their life. The results were similar, in both Dagger symbol denotes CpG sites identified previously in the cord blood of newborns exposed to maternal smoking in utero [11]. Asterisk denotes methylation data for persistence analysis direction and magnitude, across all 36 genomic regions as in the full meta-analysis (Fig. 2), indicating that the association between maternal smoking and blood DNA methylation was not mediated through offspring's own smoking behavior. We then examined the dose-response relationship between maternal smoking and blood DNA methylation in the offspring. Methylation differences between the exposed and unexposed offspring became larger with increased smoking intensity across most CpG sites, e.g., each additional three cigarettes smoked per day during pregnancy was associated with 0.23 standard deviation (SD) increase in methylation level in cg05549655 in CYP1A1 gene (Table 2). Figure 3 shows the visual representations of the dose-response effect of maternal smoking on offspring blood DNA methylation of top CpGs in four top loci.
To assess potential unmeasured confounding and to establish a causal intrauterine effect between maternal smoking and the offspring DNA methylation, we used paternal smoking as a negative control. Maternal smoking and paternal smoking showed similar directions of effect; however, the effect estimates for exposure to paternal smoking were considerably smaller ( Table 2). Adjusting for paternal smoking had no significant effect on maternal smoking estimates (Additional file 4).
We performed a longitudinal analysis to examine whether the maternal smoking-associated alterations in DNA methylation persisted from early adulthood (age 30-31 years) into midlife (age 46-48 years) in the NFBC 1966 and ALSPAC mothers' cohorts. We found no evidence for change in direction or magnitude of associations in blood DNA methylation between the two time points (Fig. 4), suggesting that DNA methylation levels The analyses were conducted separately in each participating cohort adjusted for study-specific covariates as necessary, and combined using inverse-variance weighted fixed-effects meta-analysis Chr chromosome, β effect size estimate, SE standard error remain relatively stable for several decades after prenatal exposure to maternal smoking.

Mendelian randomization analysis
We estimated the causal effects of DNA methylation changes on disease outcomes using MR. We extracted the effect sizes of SNP-CpG associations for the 69 differentially methylated CpGs available in the Accessible Resource for Integrated Epigenomic Studies (ARIES) mQTL database [16] (http://www.mqtldb.org/) and found strong instruments for 15 CpG sites. Of these 15 CpG sites, three (cg15578140 in microRNA 548f-3 (MIR548F3), cg09935388 in Growth Factor Independent Protein 1 (GFI1), cg04598670 (unknown gene)) showed potential causal associations with inflammatory bowel diseases and one (cg25189904 in Guanine Nucleotide Binding Protein Gamma 12 (GNG12)) with schizophrenia (P FDR < 0.05, Table 3).

Mediation analysis
We then sought to test whether methylation changes in these four CpGs mediated the association between maternal smoking and disease outcomes. However, since the prevalence of inflammatory bowel disease is relatively low in the general population, we assessed the associations of maternal smoking and CpGs on irritable bowel syndrome (IBS), which is a constellation of functional gastrointestinal disorder symptoms. These data were obtained from self-administered questionnaires in NFBC1966 at 46 years [17]. Prevalence of schizophrenia is also low in the general population. Therefore, instead of diagnosed schizophrenia, we used personality trait scales measuring schizotypal and affective symptoms as an outcome. Such personality scales were derived from questionnaires available in the NFBC 1966 data at 31 years, and they can be used to identify subjects with latent personality with genetic vulnerability for schizophrenia [18]. We found Fig. 2 Comparison of meta-analysis effect size estimates and their 95% confidence intervals in all participants (x-axis) and never-smokers (y-axis) for the 36 top CpG sites. All effect size estimates are adjusted for study-specific covariates as necessary and meta-analyzed using inverse-variance weighted fixed-effects model The analyses were conducted separately in each participating cohort, adjusted for study-specific covariates as necessary, and combined using inverse-variance weighted fixed-effects meta-analysis. The effect size estimates for the dose-response analysis represent the difference in blood DNA methylation (in standard deviation units) per three additional cigarettes smoked per day in pregnancy Chr chromosome, β effect size estimate, SE standard error evidence for cg25189904 mediating the association between exposure to maternal smoking and Bipolar II Scale (P = 0.024) and Hypomanic Personality Scale (P = 0.018) ( Fig. 5a and b). The estimated mediated proportions were 30% and 28%, respectively (Additional file 5). We did not find evidence for a mediating effect of blood DNA methylation on IBS (P > 0.3 for all CpGs, Additional file 5).

Discussion
We combined data from five studies in adolescents and adults to examine the association between maternal smoking during pregnancy and blood DNA methylation in the offspring from 16 to 48 years of age. We identified 69 differentially methylated CpGs in 36 genomic regions. The top differentially methylated CpG sites showed a clear dose-response relationship with number of cigarettes smoked during pregnancy. The associations observed in adulthood were robust to adjustment for multiple potential confounding factors and persisted into middle age with no significant change in direction and magnitude of associations.
Mendelian randomization and mediation analyses suggested that alterations in DNA methylation may link maternal smoking during pregnancy to increased risk of psychiatric morbidity and potentially with inflammatory bowel disease in the exposed offspring. Fig. 4 Longitudinal analysis of association between exposure to maternal smoking and offspring blood DNA methylation. Effect size estimates (adjusted for study-specific covariates and meta-analyzed using inverse-variance weighted fixed-effects model) and their 95% confidence intervals at age 30-31 years (red) and age 46-48 years (blue) for top CpG sites and P values for the test of equality of the effect size estimates The findings of our study confirm and extend the results of earlier reports by demonstrating that maternal smoking during pregnancy is associated with alterations in offspring blood DNA methylation not only in newborns [11,19,20], children, and adolescents [12,13], but also in adults, several decades following the exposure. The similarity in differentially methylated CpG sites and the consistency in direction of methylation changes between our study and earlier EWAS imply that the smoke exposure-induced methylation changes may be somawide and persist throughout life. However, the effects of smoking may also be targeted to specific regions of the epigenome, as indicated by the observations that both prenatal smoke exposure and active smoking affect the methylation patterns of same gene regions, e.g., AHRR and CYP1A1, which are involved in chemical detoxification [10]. Because of these similar effects, the methylation changes found in people exposed to prenatal smoking may also reflect current or past smoking by the people themselves or some other passive smoking exposure. Adjusting for offspring active smoking did not substantially change the results in the present study. However, parental smoking is known to associate with their offspring's smoking behavior also via genetic predisposition [21,22] and thus own smoking may serve as a mediator on the path between maternal smoking and DNA methylation. Therefore, simply adjusting for own smoking can lead to erroneous conclusions about the direct effects of maternal smoking [23]. We therefore performed a sensitivity analysis including only offspring who themselves had never smoked in their life and found that the associations were similar across all CpG sites as in the full meta-analysis.
We also used paternal smoking as a negative control by comparing the associations of maternal smoking during pregnancy and paternal smoking with offspring methylation and found that the effect estimates were substantially greater for maternal smoking, and adjusting for paternal smoking had virtually no effect on maternal smoking estimates. This indicates it is unlikely that the associations between maternal smoking and offspring methylation were attributable to post-natal passive smoking exposure or some unmeasured confounding. These results together with the finding of a clear dosedependent relationship of methylation with increased smoking intensity during pregnancy suggest a direct biological effect of in utero exposure to cigarette smoke on DNA methylation.
The longitudinal analysis showed that differentially methylated CpGs observed around age 30 persisted into middle age (around age 48) without significant change in direction or magnitude of methylation levels. This corroborates the findings of recent smaller studies, which found several differently methylated CpGs in middle-aged women exposed to maternal smoking in utero [14,15], and suggests that some of the prenatal smoking exposure-associated methylation changes are largely irreversible and unaffected by age and/or environmental exposures later in life. To assess whether such persistent changes in DNA methylation are causally implicated with disease, we performed a Mendelian randomization analysis using summary data from large genome-wide association studies [24]. We found evidence for potential causal associations for three CpGs (cg15578140, cg09935388, cg04598670) with inflammatory bowel disease and one CpG (cg25189904) with schizophrenia. To strengthen the evidence for these potentially causal associations, we also performed a formal mediation analysis in the NFBC1966 cohort and found evidence for differential methylation in cg25189904 mediating the association between maternal smoking and Bipolar II Scale and Hypomanic Personality Scale, explaining 30% and 28% of the total effect, respectively. These results corroborate the findings of previous observational studies that maternal smoking during pregnancy is associated with increased risk of psychiatric morbidity in the exposed offspring [25][26][27][28]. However, we found no evidence for mediating effect of differential methylation cg15578140, cg09935388, and cg04598670 in the association of maternal smoking and irritable bowel syndrome. Such discrepant results could be due to relatively small sample size in the mediation analysis, or because the irritable bowel syndrome is not a good proxy for inflammatory bowel disease, or because the causal effect estimates for inflammatory bowel disease in the MR analysis were biased due to, for example, pleiotropic effects of genetic instruments on the outcome. Thus, additional studies are needed to assess whether prenatal smoking is associated with increased risk of inflammatory bowel disease in the exposed offspring and whether alterations in DNA methylation mediates this association.
Our results may provide insights into potential mechanisms linking prenatal smoking exposure to psychiatric disorders. Experimental studies suggest that GNG12 is an important regulator of inflammatory signaling in microglia cells, which are the resident macrophages of the central nervous system [29]. A role of inflammation in the etiology of schizophrenia and psychotic illness has been suggested [30,31], and in line with this, a large meta-analysis of 2424 cases and over 1.2 million controls indicated that childhood central nervous system infections are associated with nearly twofold risk of schizophrenia in adulthood [32]. Our DNA methylation data were from the whole blood while the pathogenic processes for psychiatric disorders, including schizophrenia, occur primarily in brain tissue. We believe that methylation in blood mirrors the corresponding sites in diseaserelevant tissues [33]. Such mirror sites can occur if the exposure occurs during early stages of prenatal development, thus affecting multiple tissues [33]. Therefore, blood DNA methylation may act as a marker for differential DNA methylation in the primary disease tissue that is mediating the effects of intrauterine smoke exposure. There is support justifying the use of blood samples to discover genes related to brain phenotypes and diseases [34]. However, further studies are needed to validate our findings and investigate the biological relevance of GNG12 in the corresponding tissue.
Our study has both strengths and limitations. The large sample size of males and females and similar ages from different cohorts enabled us to obtain precise estimate of the long-term effects of maternal smoking on DNA methylation. Several downstream analyses and use of paternal smoking as a negative control allowed us to distinguish the associations from potential confounding, and the follow-up analysis from young adulthood to middle age allowed us to examine the persistence of methylation changes. The limitations are that we did not have tissue-specific DNA methylation data as indicated above and that maternal smoking was determined from self-reported questionnaires. As self-reports may be biased by under-reporting or recall bias, our findings may underestimate true effects. In the ALSPAC mothers' cohort, the adult offspring reported their mothers' smoking, although this could also be subject to recall bias. False reporting may also concern the adolescents in our study since they might have been reluctant to disclose their true smoking behavior, although in the IOWBC adolescent smoking was confirmed by urinary cotinine measurement. Another limitation is that the subjects in the ALSPAC children and ALSPAC mothers' cohorts are related individuals. However, excluding either one of the related ALSPAC data sets did not notably affect the results (data not shown).

Conclusions
Maternal smoking during pregnancy has long-lasting effects on offspring epigenome. DNA methylation may represent a biological mechanism through which maternal smoking is associated with increased risk of psychiatric morbidity and potentially inflammatory bowel disease in the exposed offspring.

Study cohorts Northern Finland Birth Cohort 1966
The Northern Finland Birth Cohort 1966, previously described in detail [35,36], targeted all pregnant women, residing in the two northernmost provinces of Finland with expected dates of delivery between 1 January and 31 December 1966. Over 96% of eligible women participated in the study, thus comprising 12,055 mothers followed prospectively on average from 16th gestational week and 12,058 live-born children. In 1997, at offspring age of 31 years, all cohort participants with known addresses were sent a postal questionnaire on health and lifestyle and those living in Northern Finland or Helsinki area were invited to a clinical examination which included blood sampling. In total, both questionnaire and clinical data were collected for 6007 participants. DNA was successfully extracted for 5753 participants from fasted blood samples [37]. In 2012, all individuals with known address in Finland were sent postal questionnaires and an invitation for clinical examination. Both questionnaire and clinical data was collected for 5539 participants. DNA methylation at 31 years was extracted for 807 randomly selected subjects of whom both questionnaire and clinical data with cardio-metabolic measures were available at both 31 and 46 years. Of these individuals, DNA methylation data at 46 years was extracted for 766 subjects.

Northern Finland Birth Cohort 1986
The Northern Finland Birth Cohort 1986 includes all mothers (prospective data collection from 10th gestational week) with children whose expected date of delivery fell between July 1, 1985, andJune 30, 1986, in the two northernmost provinces of Finland (99% of all births during that time) [38]. The cohort consists of 9362 women and 9432 live-born children. In 2001, all individuals with known address received a postal questionnaire on health and lifestyle and invitation to a clinical examination. DNA were extracted from fasting blood samples, and DNA methylation was measured for 546 randomly selected subjects with full data available.
In both NFBC cohorts, complete data included singleton births and subjects with complete set clinical followup and DNA methylation data, excluding subjects with missing information and twins. A written informed consent for the use of the data including DNA was obtained from all study participants and their parents. Ethical approval for the study was received from Ethical Committee of Northern Osthrobothnia Hospital District and Oulu University, Faculty of Medicine.

Isle of Wight Birth Cohort
Isle of Wight Birth Cohort is a general population-based birth cohort recruited on the Isle of Wight in 1989 to assess the role of heredity and environment on development of allergic disorders and allergen sensitization. The details of this birth cohort have been described in previous reports [39]. In brief, both the Isle of Wight and the study population are 99% Caucasian. Ethics approvals were obtained from the Isle of Wight Local Research Ethics Committee (now named the National Research Ethics Service, NRES Committee South Central-Southampton B) at recruitment and for the 1, 2, 4, 10, and 18 years follow-up. Exact age at 18-year follow-up was calculated from the date of blood sample collection for the 18-year follow-up and the date of birth. DNA methylation in peripheral blood samples was analyzed from randomly selected subjects (n = 257) at the 18-year follow-up.

Avon Longitudinal Study of Parents and Children
Pregnant women resident in the former county of Avon, UK, with expected dates of delivery 1 April 1991 to 31 December 1992 were invited to take part in the study. The initial number of pregnancies enrolled is 14, 541 (for these at least one questionnaire has been returned or a "Children in Focus" clinic had been attended by 19 July 1999). Of these initial pregnancies, there was a total of 14,676 fetuses, resulting in 14,062 live births and 13,988 children who were alive at 1 year of age [40,41].
The Accessible Resource for Integrated Epigenomic Studies (ARIES) is a sub study of ALSPAC, which includes 1018 mothers and their children for whom methylation data has been created [42]. The ARIES participants were selected based on the availability of DNA samples at two time points for the women (antenatal [mean age 30 years] and at follow-up [mean age 48 years] when the offspring were adolescents) and three time points for their offspring (neonatal, childhood [mean age 7.5 years], and adolescence [mean age 17.1 years]). A web portal allows openly accessible browsing of aggregate ARIES DNA methylation data (ARIES-Explorer) (http://www.ariesepigenomics.org.uk/). Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool: http://www.bristol. ac.uk/alspac/researchers/our-data/. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.

Definition of maternal smoking during pregnancy
In NFBCs and ALSPAC studies, expectant mothers were asked whether they had smoked cigarettes before or at the beginning of the pregnancy, how many years they had smoked, the number of cigarettes smoked per day, and whether they had changed their smoking habits during the pregnancy. Offspring were considered to be prenatally exposed to cigarette smoking if mother reported smoking regularly (at least one cigarette per day) from pregnancy week 8 onwards. The ALSPAC mothers were also asked whether their mothers had smoked and were asked whether they had smoked when they were pregnant with them. In the IOWBC, maternal smoking status in pregnancy was self-reported and defined as any smoking in pregnancy or no smoking during pregnancy.

Measurement of DNA methylation
Methylation of genomic DNA was quantified using the Illumina HumanMethylation450 array (ALSPAC, ARIES, IOWBC, and NFBC1966 at age 31, NFBC1986) or Illumina EPIC array (NFBC1966 at age 46) according to the manufacturer's instructions. Bisulfite conversion of genomic DNA was performed using the EZ DNA methylation kit according to the manufacturer's instructions (Zymo Research, Orange, CA).

Quality control of methylation data
In NFBCs and IOWBC, quality control and quantile normalization for DNA methylation data were adapted from the CPACOR pipeline [43]. Illumina Background Correction was applied to the intensity values, a detection P value threshold was set at P < 10 −16 , and samples with call rate < 98% were excluded. Quantile normalization was done separately for six probe-type categories, and these normalized intensity values were used to calculate the methylation beta value at each CpG site, ranging between 0 (no methylation) and 1 (full methylation). Probes with call rate < 95% were excluded from the analyses. A principal component analysis (PCA) was carried out for array control probes, and the first 30 principal components (PCs) were used as explanatory variables in the subsequent regression models [43]. White blood cell subpopulation estimates were obtained using the software provided by Houseman et al. [44], and these estimates were also added as covariates in the regression models. In ARIES, the DNA methylation wet-laboratory and preprocessing analyses were performed as previously described [42]. In brief, samples from all time points were distributed across slides using a semi-random approach to minimize the possibility of confounding by batch effects. Samples failing quality control (average probe P value ≥ 0.01, those with sex or genotype mismatches) were excluded from further analysis and scheduled for repeat assay, and probes that contained < 95% of signals detectable above background signal (detection P value < 0.01) were excluded from the analysis. Methylation data were pre-processed using R software, with background correction and subset quantile normalization performed using the pipeline described by Touleimat and Tost [45].

Meta-analysis of 6073 CpG sites in five studies
Study design and analytical flow of the study are shown in Fig. 1, and the data availability for each analysis is presented in Table 4. All analyses were conducted using R software [46]. Linear regression was used to examine the association between sustained maternal smoking during pregnancy (from pregnancy week 8 onwards) and offspring peripheral blood DNA methylation at 6073 CpG sites that were previously identified to be differentially methylated in newborns exposed to maternal smoking in utero in recent epigenome-wide association study (EWAS) (false discovery rate-corrected P value < 0.05) [11]. The final model was adjusted for studyspecific covariates as necessary (offspring's sex, BMI, smoking status, and social class for IOWBC; additionally first four genetic PCs for NFBC cohorts; offspring age, maternal age, and social class for ALSPAC cohorts). The model was run independently in each study, and the results were then meta-analyzed over all five studies (NFBC1986 (age 16 years), NFBC1966 (age 31 years), IOWBC (age 18 years), ALSPAC mothers (age 30 years) and ALSPAC children (age 17 years)) using an inverse variance weighted fixed-effects model. Statistical significance level was set at P < 1 × 10 −7 , which corresponds approximately to a Bonferroni-corrected significance level of 0.05 for 450,000 independent tests. Such a conservative threshold was robust, and thus, the significant probes were considered worthy of further examination in a series of sensitivity and downstream analyses. The leading CpG site from each gene region (1-Mb window centered on the CpG site with the strongest association) was selected for these analyses.
We note that ALSPAC children were part of the earlier study from where the 6073 CpG sites were selected [11]. However, the earlier study examined DNA methylation in the cord blood, whereas the current study uses blood DNA methylation data from the same cohort at 17 years. If the associations with exposure to maternal smoking in cord blood DNA methylation were due to confounding, we would not expect the signal to persist until adolescence. Furthermore, removal of the ALSPAC children from the meta-analysis made no material difference to the effect size estimates (data not shown).

Sensitivity analyses
Impact of offspring's own smoking on their DNA methylation To assess the impact of participants' own smoking on methylation level by maternal smoking exposure, the In the ALSPAC mothers' cohort, smoking behavior was queried at two time points. At age 30 years, women were asked whether they had smoked regularly before pregnancy. At age 48 years, women were asked whether they were current or former smokers, and in case of the latter, whether they had smoked every day. From these data, a dichotomous variable for any smoking for each of the time points was derived. In the IOWBC, participant's own smoking status was defined as having ever or never smoked asked via a questionnaire administered at age 18 years. The model was run independently in each study with the same covariates as above (excluding adjustment for offspring's smoking as all individuals were non-smokers) and meta-analyzed using an inversevariance weighted fixed-effects model.

Impact of a mother's smoking intensity on offspring DNA methylation
Further analyses were performed to investigate whether the intensity of maternal smoking during pregnancy had a differential impact on the level of offspring blood DNA methylation. For this, the association between the number of cigarettes smoked per day during pregnancy and offspring blood DNA methylation was assessed in the NFBC studies. The association with the number of cigarettes smoked and offspring blood DNA methylation was assessed using linear regression with the same covariates as in the main analysis and meta-analyzed using an inverse variance weighted fixed-effects model.

Negative control design to distinguish intrauterine effects from confounding
Potential unmeasured confounding was examined in the NFBC studies by using paternal smoking status during pregnancy as a negative control. This method compares the associations of maternal and paternal smoking during pregnancy with offspring methylation outcomes. Use of paternal smoking as a negative control is based on the assumption that the biological effects of paternal smoking on intrauterine exposure are negligible compared to the effects of maternal smoking during pregnancy. If there is an intrauterine effect of cigarette smoke exposure, the associations are expected to be stronger for maternal smoking than paternal smoking behavior. If effects are of similar magnitude, the associations between maternal smoking during pregnancy and offspring methylation are likely attributable to unmeasured confounding, either by shared environmental or genetic factors [47]. The association with exposure to paternal smoking and offspring blood DNA methylation was assessed using linear regression with the same covariates as in the main analysis and meta-analyzed using an inverse-variance weighted fixed-effects model.

Persistence of DNA methylation into adulthood
We also examined whether the methylation changes associated with maternal smoking persisted into middle age. DNA methylation data were available at two time points in NFBC 1966 (age 31 years and 46 years) and ALSPAC mother (age 30 years and 48 years). Generalized least squares were used to examine the longitudinal change in association between exposure to maternal smoking and blood DNA methylation. DNA methylation at each time point was regressed on the technical and white blood cell covariates, and the corresponding residuals were used as the outcome. Study-specific covariates (offspring sex, smoking, BMI, and social class at each time point in NFBC1966; maternal age, social class, and offspring age and smoking status at each time point in ALSPAC) were added in the model. Time point of measurement and its interaction with the exposure were added as additional terms to the regression model, and the model residuals were allowed to be correlated within each individual and be heteroskedastic between time points. The effect estimates at both time points can be derived from this model, and the test for equality of the estimates at both time points is equivalent to testing the interaction term being equal to zero [48]. The analyses were conducted separately in NFBC1966 and ALSPAC mothers and meta-analyzed using an inverse-variance weighted fixed-effects model.

Mendelian randomization analysis for the effect of DNA methylation on disease outcomes
We next sought to assess the potential causal relationship between DNA methylation as the exposure and 106 different diseases as outcomes available through the MR-Base platform (available at http://www.mrbase. org/) using two-sample Mendelian randomization (MR) . The two-sample MR approach uses gene-exposure and gene-outcome associations from different data sources of comparable populations and allows the interrogation of summary estimates available from large genome-wide association study (GWAS) consortia [24]. If instrumental variable assumptions for the genes associated with the exposure are fulfilled [49], then MR estimates can give evidence for a causal effect of exposure on the outcome. We first looked up proxy single nucleotide polymorphisms (SNPs) for each of the 69 top maternal smoking-associated CpG sites in the publicly available ARIES database containing methylation quantitative trait loci (mQTL) at four different life stages (birth, childhood, adolescence, middle age) in human blood [42]. We selected SNPs associated with each CpG at P < 10 −7 at any of the other four time points. After clumping SNPs (using 1-Mb window and R 2 < 0.001) and pruning the CpG sites to one per locus, we found strong instruments for 15 CpG sites (Additional file 6). These SNP-CpG associations were consistent across all time points (Additional file 7), except rs4306016-cg01825213 association, which was excluded from the final MR analysis. We selected the SNP-CpG and SNP-disease effect sizes at middle age and aligned these to the same allele. MR effect estimates were then calculated using Wald ratio or, in case of cg04598670, which had two SNP instruments available, inverse-variance weighted method. The resulting effect estimate represents the change in outcome per unit increase in the exposure.

Mediation analysis
The CpGs that showed evidence for causal relationship with disease outcomes in the MR analysis were tested for mediation in the association between maternal smoking during pregnancy and disease outcomes using the NFBC1966 data at 31 years and 46 years. We performed model-based causal mediation analysis using R package "mediation" [50] by first estimating both the effect of maternal smoking on the CpG site and the effect of CpG site on the outcome, adjusted for exposure to maternal smoking (Fig. 6). Both of these effects were additionally adjusted for sex, offspring's own smoking, and technical covariates. We generated the estimates for the total effect, average direct effect, and average causal mediation effect using quasi-Bayesian Monte Carlo method based on normal approximation with 2000 simulations, with robust standard errors. The proportion that the mediating CpG explains of the association between maternal smoking and disease outcome was calculated as described [51].