Birthweight DNA methylation signatures in infant saliva

Low birthweight has been repeatedly associated with long-term adverse health outcomes and many non-communicable diseases. Our aim was to look-up cord blood birthweight-associated CpG sites identified by the PACE Consortium in infant saliva, and to explore saliva-specific DNA methylation signatures of birthweight. DNA methylation was assessed using Infinium HumanMethylation450K array in 135 saliva samples collected from children of the NINFEA birth cohort at an average age of 10.8 (range 7–17) months. The association analyses between birthweight and DNA methylation variations were carried out using robust linear regression models both in the exploratory EWAS analyses and in the look-up of the PACE findings in infant saliva. None of the cord blood birthweight-associated CpGs identified by the PACE Consortium was associated with birthweight when analysed in infant saliva. In saliva EWAS analyses, considering a false discovery rate p-values < 0.05, birthweight as continuous variable was associated with DNA methylation in 44 CpG sites; being born small for gestational age (SGA, lower 10th percentile of birthweight for gestational age according to WHO reference charts) was associated with DNA methylation in 44 CpGs, with only one overlapping CpG between the two analyses. Despite no overlap with PACE results at the CpG level, two of the top saliva birthweight CpGs mapped at genes associated with birthweight with the same direction of the effect also in the PACE Consortium (MACROD1 and RPTOR). Our study provides an indication of the birthweight and SGA epigenetic salivary signatures in children around 10 months of age. DNA methylation signatures in cord blood may not be comparable with saliva DNA methylation signatures at about 10 months of age, suggesting that the birthweight epigenetic marks are likely time and tissue specific.


Background
The existence of a relationship between intrauterine or early life exposures and health during the lifecourse has come to attention in the 1990s [1] and is nowadays recognized as the developmental origins of health and diseases (DOHAD) hypothesis [2]. The intrauterine life is a critical period for adverse exposures to exert their effect [3], as fetal organs start developing and are sensitive to environmental stimuli that may cause an indelible imprint on future development and function.
In a hostile uterine environment caused by insults, for example poor nutrition, the fetus, to slow down its growth rate in order to match the nutrient supply, responds by developing adaptations including downregulation of metabolic or organs function. The adaptive process, however, may cause irreversible changes in the development of some tissues and organs and predispose an individual to a higher risk of diseases not only early in life, but during the entire lifecourse [2].
The Pregnancy and Childhood Epigenetics (PACE) Consortium conducted so far the largest cord blood epigenome-wide DNA methylation study of birthweight using 8825 neonatal blood samples from 24 birth cohorts [18] and found that 914 CpGs, located in or near 729 genes, were associated with birthweight treated as a continuous variable. In the same study, methylation variation in 51 CpG sites was associated with high birthweight, as compared to normal birthweight, and 4 CpGs appeared to be associated with low versus normal birthweight. In additional analyses conducted on blood from 7278 children at later ages, only 1.3% of the 914 birthweightassociated differentially methylated CpGs in cord blood remained associated in childhood (2-13 years; n = 2756 children from ten studies), one in adolescence (16)(17)(18) years; n = 2906 from six studies), and none in adulthood (30-45 years; n = 1616 from three studies).
Although it seems that cord blood DNA methylation markers of birthweight do not persist at later ages, the associations observed at birth are extensive and it is important to confirm their persistence or variations over time. Also, DNA methylation profiles are tissue specific, and it would be optimal to analyse the DNA profile linked to birthweight in different tissues. Most tissues are not accessible with non-invasive methods, and blood is typically used as a surrogate, with the assumption that, being a universal body fluid, it may capture epigenetic changes of target tissues [19]. However, blood samples collection may be difficult in large population studies, as parents may be less prone to expose their children, and especially infants, to vein puncture for blood donation. Saliva and nasal brushes are easily accessible tissues, but to date they have been much less studied [20,21]. Replication in saliva of blood epigenetic signatures of exposures and/or outcomes and identification of saliva-specific methylation markers could make saliva an optimal candidate for studies in infants and children. Thanks to its non-invasive collection method, it would allow obtaining repeated samples in short time periods, which is necessary requisite for monitoring changes in DNA methylation over time.
Our aim was to perform a look-up, in infant saliva samples, of birthweight-related DNA methylation variation identified in cord blood, and to conduct, to our knowledge, the first saliva epigenome-wide association study (EWAS) of birthweight, to identify methylation markers that may be specific for saliva collected in infancy.

Study population
The data were derived from an epigenome-wide casecontrol study on wheezing nested within the NIN-FEA birth cohort [22]. The NINFEA study is an Italian web-based multi-purpose mother-child cohort, aimed at exploring the relationship between early-life exposures and long-term health outcomes [23]. Members of the cohort are children born from approximately 7500 pregnant women who between 2005 and 2016 volunteered to participate in the study, had Internet access, and had enough knowledge of Italian to complete webbased questionnaires. The children are followed up with six questionnaires completed by their mothers 6 and 18 months after delivery, and when they turn 4, 7, 10, and 13 years of age. When children were aged approximately 6 months, mothers were asked to donate their and their children's saliva samples using a mailed sponge Oragene ™ DNA self-collection kits (OG-250; DNA Genotek, Inc, Ottawa, Ontario, Canada). Approximately half of the participating mothers donated their and their child saliva samples, which are stored in a biobank at −80 ºC.
The case-control study was designed as EWAS of early childhood wheezing, consisting of 72 cases with at least one episode of wheezing between 6 and 18 months of age, and 72 infants without wheezing matched to cases by sex, age, and seasonality/calendar year of saliva sampling. Cases and controls were selected from a subpopulation of singletons, residents in the City of Turin, Italy, and born to mothers who did not report having asthma active during pregnancy. The baseline NINFEA questionnaire, completed at any time during pregnancy, was used to derive information on maternal and pregnancy factors, while child-related variables were collected at the first follow-up questionnaire completed approximately 6 months after delivery. Although information on children ethnic background was not available in the NINFEA cohort, almost the entire study population has both parents born in Italy, and only few study children have one of the parents born in other European countries. Therefore, the ethnic background of the children included in the study is, if not entirely, largely European. In this study, we used the following variables: maternal age at delivery (years), maternal education (low = no education, primary or secondary school vs. high = university degree or higher), parity (nulliparous vs. at least one previous pregnancy > 22 gestational weeks), maternal pre-pregnancy body mass index (BMI; kg/m 2 ), sex, gestational age at birth (weeks), birthweight (grams), small for gestational age (yes vs. no), age at saliva sampling (continuous in months) (see Table 1 for detail).
The Illumina Infinium ® HumanMethylation450 BeadChip (Illumina,Inc, San Diego, CA, USA) was employed to evaluate DNA methylation status of over 485,000 probes in saliva samples. Details on pre-processing of samples and data quality control can be found in the Additional File 1 (Methods, DNA methylation measurement, data pre-processing, and quality control). Quality controls and probes filtering led to the exclusion of three samples and 63,218 probes, leading to a total of 141 saliva samples and 421,782 probes for analyses.
The NINFEA study was approved by Ethical Committee of the San Giovanni Battista and CTO/CRF/Maria Adelaide Hospital of Turin, and all participating mothers gave informed consent at enrolment and at saliva donation.

Statistical analyses
For all the analyses, we pooled together cases and controls, leading to a total of 141 subjects. The analyses were, however, based on 135 subjects, as 6 (4.3%) infants had missing values in at least one of the variables included in multivariable models. Methylation levels were analysed as β-values (ratio of methylated probe intensity to overall intensity, representing 0 to 100% methylation at each probe). Although, for normality assumption, log 2-transformed β-values (M-values) may perform better when DNA methylation is used as an outcome, the interpretation of coefficients may be less intuitive. We, therefore, used β-values for more intuitive biological interpretation and easier comparability with other studies, in particular with the PACE birthweight EWAS. All the analyses were performed using R statistical computing software (version 3.6.0) and RStudio (version 1.2.1335) [24].

Look-up of PACE findings in infant saliva
Out of the 914 birthweight-associated CpGs identified by the PACE Consortium, 891 (97.5%) were available in the NINFEA study after quality checks and probes filtering. We used the same confounding variables of the PACE analysis, but, differently than in the PACE study, we used birthweight as the exposure and DNA methylation variation as the outcome to meet the biological temporality from birth to infancy. Both birthweight and DNA methylation were modeled as continuous variables. We used robust linear regression models adjusted for maternal age at delivery, maternal education, parity, maternal prepregnancy BMI, child sex, gestational age at birth, age at saliva sampling, batch, and case-control status of the original nested case-control study (wheezing between 6 and 18 months of age). Using vcovHC function in the sandwich R package [25], we calculated heteroscedasticity-consistent standard errors.
Although maternal smoking during pregnancy affects both birthweight [26] and offspring DNA methylation [27], we did not adjust for smoking, as it was rather infrequent in our study sample (2% prevalence). In order to account for residual technical variability and for cell-type heterogeneity, we performed surrogate variables analysis (sva) [28] and estimated 7 surrogate variables that were also included as covariates in the model. Cell composition was additionally estimated using the reference-based projections for saliva proposed by Zheng [29], and given a high correlation between the epithelial tissue component estimate and the first sva component (rho = 0.99), we only used the 7 estimated surrogate variables in the analyses. p-Values adjusted for multiple comparisons were calculated using the Bonferroni correction and the Benjamini and Hochberg false discovery rate (FDR), while histograms and quantile-quantile (QQ) plots were used to graphically compare the observed distribution of p-values versus the expected uniform distribution under the null hypothesis. Given that the direction of the association was determined by the PACE study, we also calculated one sided p-values for each CpG. It has been shown that CpGs associated with a certain trait tend to be highly correlated and that this correlation affects standard procedures for multiple testing, such as Bonferroni and Benjamini and Hochberg corrections. In the NINFEA saliva samples, the correlation between the 891 CpGs identified in the PACE EWAS was 0.48, which is indeed much higher than the reported saliva genome-wide mean correlation of 0.12 [30]. We, therefore, reported the permutation p-values that take into account the distribution of p-values under the null hypothesis and are suggested as a gold standard for the settings where the underlying correlation between CpGs is high.

Epigenome-wide association analyses
Two epigenome-wide association analyses were conducted, first with birthweight as a continuous exposure variable and, second, with small for gestational age (SGA) as a binary exposure variable. The latter was defined as the lowest 10 th percentile of the World Health Organization birthweight for sex and gestational age charts [31], with the remaining population above the 10th percentile as the reference group. As in our study population only 7 children (5%) were classified as large for gestational age (based on the 90 th percentile), we had no power to analyse this trait separately and opted for a more conservative approach by including them in the reference group.
In both EWAS analyses, we used robust linear regression models adjusted for sex, age at saliva sampling, gestational age, maternal age, parity, maternal pre-pregnancy BMI, maternal education, batch, estimated surrogate variables, and case-control status of the original nested case-control study. p-Values adjusted for multiple comparisons were calculated using the Benjamini and Hochberg false discovery rate (FDR), and Volcano plots were used to visually present the results. In the EWAS of SGA, as a sensitivity analysis we provisionally excluded children born pre-term (< 37 gestational weeks at birth).
To assess whether the age at saliva sampling could have influenced the findings on the top CpG sites identified in the EWAS analyses, we tested the associations of the age at saliva sampling as a continuous variable (in months) with the methylation levels in the top CpG sites using the robust linear regression models adjusted for sex, batch and cell type composition estimated with the referencebased projections for saliva proposed by Zheng [29].

CpGs annotation and functional analysis
Gene Ontology (GO) and Kyoto Encyclopedia of gene and Genomes (KEGG) enrichment analyses were carried out to identify possible functional pathways in the saliva birthweight-associated CpGs set.
In order to compare previously reported associations of epigenome-wide birthweight-associated CpGs and our own results, we searched for findings reported in the EWAS Catalog (http:// www. ewasc atalog. org; accessed on 12 February 2021) and EWAS Atlas (https:// bigd. big. ac. cn/ ewas/ tools; accessed on 12 February 2021), looking for both CpG and gene level match.
After this first step based on the overlap between our results and those of other EWAS on the same trait, we further accessed EWAS Atlas to examine whether CpGs identified in our study were previously associated with traits other than birthweight.
We also looked in the GWAS Catalog for traits associated with specific single-nucleotide polymorphisms (SNPS) on genes on which these CpGs mapped. Table 1 shows the characteristics of the study population. The mean maternal age was 35 years; 72% of the mothers had a high educational level. The mean age at saliva sampling was 10.8 months (median 10.4, range 7-17). The mean birthweight was 3242 g, 5% of the children were born pre-term and 15.6% of children were born small for gestational age, respectively).

Look-up of PACE findings in infant saliva
Of the 891 CpG sites associated with birthweight in the PACE study, 52 (5.8%) had a one sided p-value < 0.05 in our study (Fig. 1, Table 2). However, none of the CpGs survived the Bonferroni correction (p < 5.61 × 10 −5 ) or had a FDR < 0.05 (Table 2, Additional file 2: Table S1). There was a 47% concordance in the direction of the coefficients between the PACE and our study (binomial sign test p-value = 0.11, 95% confidence intervals (CIs) 0.44; 0.51). When we used the permutation test to take into account strong mean pairwise correlation between the DNA methylation values of the 891 CpG sites, the minimum permutation p-value was 0.56.

EWAS for continuous birthweight
Out of the 421,782 probes analysed, 8.9% (N = 37,365) were associated with birthweight with a nominal p-value < 0.05. After correction for multiple testing, 44 CpG sites had a FDR < 0.05 (Table 3 and Fig. 2a). Their coefficient estimates ranged from -0.31 to 0.57, which corresponds to a methylation increase of 0.57% with a 100 gr increase in birthweight. The largest effect was observed for cg02727104 located 13 kb down SOHLH2.
None of the 44 saliva birthweight-associated CpGs overlapped with the birthweight related CpGs identified in cord blood in the PACE study under the Bonferroni threshold. When considering a nominal p-value < 0.10 in the PACE study (N = 78,489 CpGs), there was an overlap in 6 CpG sites, but only one with the same direction of the effect (cg22896429) (Additional file 3: Table S1). At gene level, there was an overlap in 12 genes identified in PACE under the FDR-corrected p-value threshold (with at least one CpG with the same direction of the effect). The average absolute pairwise Pearson's correlation coefficient between the β-values of the identified 44 CpGs was 0.23, which is higher than the mean pairwise genome-wide correlation coefficient in the NINFEA saliva samples but lower than the mean pairwise correlation coefficient between 891 CpGs identified by the PACE Consortium [30].

EWAS for small for gestational age
We found 44 CpGs associated with SGA at a FDR of less than 0.05 (Table 4 and Fig. 2b). The largest coefficient showed a 4.1% difference in methylation when comparing small with non-small for gestational age (cg12322146, located 125 kb down RBMS3); the largest negative association was -2.3% (cg03066788, located at PNOC). Only one CpG (cg18072629, located at GATA3) overlapped with the 44 CpGs found to be    Table S1). At gene level, there was an overlap in 9 genes identified in PACE under the FDR-corrected p-value threshold (with at least one CpG with consistent direction of the effect). Findings were practically unchanged when we excluded preterm infants from the EWAS analysis (data not shown). We found no association between age at saliva sampling and methylation status for any of the top CpGs associated with continuous birthweight or with SGA in our two EWAS analyses.
Out of 44 CpG sites associated with continuous birthweight in our EWAS, 28 (64%) showed a nominal p-value < 0.05 in the EWAS for SGA; all with a direction of the effect that was consistent with the results for birthweight. Likewise, 26 (59%) out of 44 SGArelated CpGs reveal nominal p-value < 0.05 in the EWAS of continuous birthweight, with a consistent direction of the effect.

CpGs Annotation and functional analysis
None of the two sets of CpGs identified in our study, the 44 birthweight-associated CpGs and the 44 SGA-associated CpGs, showed functional enrichment of GO or KEGG terms.

Discussion
In this study, we investigated the association between birthweight and methylation patterns in saliva samples taken at around 10 months of age. The cord blood methylation signatures of birthweight, found by a large study of the PACE Consortium, were not confirmed in infant saliva.
We identified 87 infant saliva-specific signatures of birthweight or SGA, of which two overlap with PACE results at gene level with the same direction of effect (MACROD1 and RPTOR). Interestingly, single-nucleotide polymorphisms in these genes have been previously associated with obesity, adult BMI, BMI-adjusted waist-hip ratio, high-density lipoprotein cholesterol and visceral adipose tissue levels. Moreover, expanding the look-up in the full PACE results, 13 CpGs find correspondence with a nominal p-value < 0.10 but only 3 have the same direction of the effect, namely cg22896429, cg06234201, cg15847996.
DNA methylation variation at some of the 87 loci identified in our study has been previously associated with multiple traits, such as insulin resistance, colorectal cancer, obesity, and gestational diabetes mellitus.
Moreover, the SGA-associated locus cg26615232 maps within USH2A, which variant has been associated with birthweight in a study [63] that performed GWAS metaanalyses of fetal genetic variants in 321,223 individuals of European ancestry.
It has been repeatedly shown that DNA methylation at many sites is not temporally stable and that each tissue has its unique epigenetic landscape that likely reflects its specific function and response to environmental exposures [21,64]. For example, a cross-sectional study on 1019 infants [65] compared the associations between preterm birth and genome-wide DNA methylation profiles using both cord tissue and cord blood samples. The results highlighted differences between the two tissues in DNA methylation variation associated with preterm birth, with only a minority of overlapping CpGs. In DNA from cord tissue, DNA methylation analysis showed enrichment of differentially methylated regions in genes involved in molecular pathways related to fetal growth and development (i.e. Wnt signaling, bone remodeling, and extracellular matrix organization), while in cord blood immune response pathways (i.e. regulation of T cell differentiation, inositol lipid-mediated signaling, and regulation of RNA stability) were enriched. Therefore, it is reasonable to speculate that saliva and blood, which have different functions, include different cell types and have different embryonic origin, and different mechanisms of exposure and responses to environmental factors do not share the same DNA methylation response to fetal growth, birthweight, and their risk factors. Although our results suggest different cord blood and saliva methylation patterns related to birthweight, we cannot exclude that the relatively small sample size of our study contributed to the lack of overlap with the PACE results, due to a reduced power to detect small effects.
In addition to the tissue specificity of DNA methylation, there are also age-related changes (birth vs. infancy) that could explain differences between our and the PACE Consortium findings.
In most tissues, DNA methylation may also vary substantially with time, especially during periods of life associated with high plasticity and fast development. Consistently, the PACE study found that differential methylation associated with birthweight in neonates persisted only minimally across childhood and disappeared by adulthood [18].
This result is consistent with another longitudinal study in which birthweight-and gestational age-related DNA methylation changes were investigated in cord blood and peripheral blood at ages 7 and 17 in more than 900 children [66]. Across the majority of CpG sites that showed differential methylation in cord blood, a pattern of fast evolution was observed during early childhood that stops with adolescence, providing evidence for the lack of persistence of early life methylation differences. This is evident also in two studies that analysed infant saliva samples. A longitudinal study with repeated saliva samples collected at birth and at 1 year of age from 50 preterm and 40 infants born at term showed that DNA methylation at the differentially methylated region (DMR) of IGF2 and FKBP5 at birth was lower in preterm infants compared with term infants, but these differences did not persist at 1 year of age [67]. Also, another study on 214 infant saliva samples (62 collected at 6 weeks of age, 30 collected at 52 weeks of age and 61 collected at both ages) showed an age-dependent variation of DNA methylation, with a clear difference between saliva DNA methylation at 6 and 52 weeks of age [68].
In our study, we could not distinguish between the tissue-and the time-related differences in DNA methylation of birthweight-associated CpG sites, as we had no repeated saliva samples or blood samples collected simultaneously with saliva. It would be interesting in future studies with repeat samples and also with different tissues, to investigate whether the differences between our and PACE findings are due to the time-and tissuedependent DNA methylation changes, as reported by previous studies both for birthweight and gestational age. Even though the age at saliva sampling varied between 7 and 17 months in our study, age was not associated with DNA methylation variation in any of the CpG sites associated with continuous birthweight and SGA, suggesting that the observed associations are unlikely influenced by the different age of saliva collection. It should be, however, noted that the age range of our population was rather narrow (7-17 months), so we cannot exclude that age could influence DNA methylation in these CpGs when considered across childhood/adolescence.
We supplemented the analysis of birthweight with the analysis on SGA and found different CpG sites associated with the two exposures, with only 1 overlapping CpG when considering FDR-corrected p-values. Although this is in line with the PACE analyses, where the 4 CpGs associated with low birthweight (< 2500 g) did not overlap with the 914 associated with continuous birthweight, approximately 60% of CpGs identified in our two EWAS overlapped at the nominal p-value of 0.05, with seemingly the same direction.
The main strength of our study was the possibility to analyse DNA methylation in saliva samples, which are easy to collect at any age in childhood and are, therefore, good candidates for future large DNA methylation studies. Moreover, while it is probably unfeasible to collect repeated blood samples over relatively short time periods, and especially in the context of large population studies based on children, it is possible to use saliva to monitor changes in DNA methylation over time. As salivary DNA methylation is poorly studied, especially in newborns and infants, it would be important to understand if, and for which specific traits and exposures, salivary and blood DNA methylation can be used interchangeably, and when they clearly show distinct methylation signatures. Previous DNA methylation studies using saliva samples focused on neurobehavioural conditions [69][70][71], respiratory traits [22], and cancer research [72], but to our knowledge the associations between birthweight and saliva DNA methylation has not been studied so far.
Although the small sample size of our study, especially in comparison with the PACE study, may have had an impact on EWAS analyses, after an FDR-correction, we identified some novel CpGs associated with birthweight, which are likely to be saliva-specific birthweight signatures. These findings need replication in independent saliva EWAS, and we cannot exclude that additional saliva DNA methylation variations related to offspring birthweight may be identified by larger studies. We argue that the sample size had less impact on our look-up analysis of PACE findings in saliva, as the number of tests performed in the look-up analysis was much lower than the number of tests of the two EWAS.
Our study population was not selected at random from the entire NINFEA cohort, but on the presence/ absence of infant wheezing between 6 and 18 months of age, which was a selection factor for the original case-control study. As SGA is a well-known risk factors for wheezing [73,74], this selection led to a relatively high prevalence of SGA infants (15.6%; 21% in wheezing cases and 10% in controls). To account for this selection, all the analyses were adjusted also for infant wheezing. Moreover, SGA definition is based on an arbitrary population-based reference cut-off, while, biologically, size for gestational age can be considered as a continuous trait. Therefore, the high prevalence of SGA in our study population allowed us to capture the lowest quintile of the size for gestational age distribution, providing more power for EWAS.
Finally, it should be noted that the prevalence of maternal smoking during pregnancy was rather low in our study population (2% compared with 7.7% of maternal smoking during pregnancy and with 14% of maternal smoking just before pregnancy in the entire cohort) [4]. In addition to random variation, the selection process may have played a role also in this case, although with an unexpected direction. Therefore, our findings are less generalizable to the population of women who smoke during pregnancy, but remain applicable to all non-smoking women and those who stopped smoking before pregnancy, which represent the majority of pregnant women.

Conclusion
In conclusion, our study provides an indication of the birthweight and small for gestational age epigenetic salivary signatures in children around 10 months of age and suggests that DNA methylation signatures of birthweight likely differ between cord blood and infant saliva. Further insights are needed to understand whether these differences are due to biological differences between the two tissues or could be attributed to age-related DNA methylation changes.