Skip to main content

DNA methylation sites in early adulthood characterised by pubertal timing and development: a twin study



Puberty is a highly heritable and variable trait, with environmental factors having a role in its eventual timing and development. Early and late pubertal onset are both associated with various diseases developing later in life, and epigenetic characterisation of pubertal timing and development could lead to important insights. Blood DNA methylation, reacting to both genotype and environment, has been associated with puberty; however, such studies are relatively scarce. We investigated peripheral blood DNA methylation profiles (using Illumina 450 K and EPIC platforms) of 1539 young adult Finnish twins associated with pubertal development scale (PDS) at ages 12 and 14 as well as pubertal age (PA).


Fixed effect meta-analysis of the two platforms on 347,521 CpGs in common identified 58 CpG sites associated (p < 1 × 10−5) with either PDS or PA. All four CpGs associated with PA and 45 CpGs associated with PDS were sex-specific. Thirteen CpGs had a high heritability (h2: 0.51–0.98), while one CpG site (mapped to GET4) had a high shared environmental component accounting for 68% of the overall variance in methylation at the site. Utilising twin discordance analysis, we found 6 CpG sites (5 associated with PDS and 1 with PA) that had an environmentally driven association with puberty. Furthermore, genes with PDS- or PA-associated CpGs were consistently linked to various developmental processes and diseases such as breast, prostate and ovarian cancer, while methylation quantitative trait loci of associated CpG sites were enriched in immune pathways developing during puberty.


By identifying puberty-associated DNA methylation sites and examining the effects of sex, environment and genetics, we shed light on the intricate interplay between environment and genetics in the context of puberty. Through our comprehensive analysis, we not only deepen the understanding of the significance of both genetic and environmental factors in the complex processes of puberty and its timing, but also gain insights into potential links with disease risks.


Puberty is a necessary developmental phase between childhood and adulthood, enabling sexual reproduction. In addition to pubertal timing and pubertal development differing between the sexes, they are highly variable between individuals of the same sex. Early pubertal development is associated with an increased risk of many diseases in later life, such as breast, endometrial and prostate cancer [1,2,3]. On the other hand, late pubertal development is associated with asthma in both sexes and cervical cancer in females [3]. Nevertheless, the mechanisms behind these links remain elusive. Thus, understanding puberty from a genetic and epigenetic perspective may provide further insight into this important period of life and its links with disease risk in later life.

Pubertal timing and development are highly influenced by genetic factors, (twin-based heritability estimates range from 37 to 91%) [4,5,6,7]. Studies have identified numerous genetic loci associated with pubertal timing [1, 8, 9], with a large proportion of the implicated genes involved in neural processes. These identified genetic signals partially explained some diseases, such as breast and prostate cancers, linked to the timing of puberty [1, 8, 9]. Additionally, recent evidence indicates a shift towards an earlier onset of puberty in girls, suggesting that key nongenetic factors play a role in pubertal timing [10, 11]. Indeed, prenatal growth, diet, maternal smoking, psychological distress in childhood, and exposure to endocrine-disrupting chemicals such as phytoestrogen have been associated with early onset of puberty [12]. In addition, although mainly explained through genetic overlap with pubertal timing, high BMI has been associated with early puberty [13, 14]. Epigenetic marks react to the environment and may provide a mechanism for the environment to act on genome function and ultimately lead to phenotypic consequences. Indeed, DNA methylation, the most studied epigenetic mark, has been associated with pubertal onset [15,16,17]. As the focus of these studies was mainly the change in methylation before and after pubertal timing, there is still a need to identify consistent CpG sites associated with pubertal timing or development in (early) adulthood, where the methylation profile relevant to puberty is more stable than that within early adolescence and could therefore be more reliably linked to disease risk in adulthood. Considering the high heritability of puberty and the wide heritability range of DNA methylation at each CpG site [18], it is necessary to investigate the variance components of DNA methylation sites associated with puberty and to analyse them in the context of methylation quantitative trait loci (meQTLs) [19].

In this epigenome-wide association study (EWAS), we profiled DNA methylation in the blood of young adults to further understand pubertal timing, pubertal development and potential diseases they associate with. We investigated genome-wide DNA methylation and its association with pubertal timing and pubertal development in both sexes. As puberty differs between males and females, we additionally stratified the analyses by sex. We also employed twin study designs to evaluate heritability and within-pair differences and to explore qualitative and quantitative sex differences in DNA methylation at the associated CpG sites. The twin analyses coupled with meQTLs were utilised to gain insight into the genetic and environmental factors affecting methylation values of CpG sites associated with puberty.

Materials and methods


This study was based on the participants of two longitudinal and comprehensive birth cohorts of Finnish twins, namely the FinnTwin12 (twins born in 1983–1987) and FinnTwin16 study (twins born in 1974–1979). Both cohorts have been described in detail elsewhere [20, 21]. In brief, FinnTwin12 was initiated when the twins were 11–12 years of age, and the follow-up questionnaires were sent at ages 14, 17, and 24, while the FinnTwin16 baseline questionnaire was at the age of 16, with follow-up at the ages of 17, 18, 25, and 35. A subsample of these twins participated in onsite visits, where biological samples were collected, and anthropometric measures were taken (N = 1539, 54.6% female). The current study included both monozygotic (MZ) and dizygotic (DZ) twin pairs who had completed the questionnaires regarding puberty (see below) and had blood DNA methylation data (Illumina 450 K or EPIC) in early adulthood available.

Data on puberty

The pubertal development scale (PDS) was measured twice in the FinnTwin12 cohort, at ages 12 and 14 years, based on the discrete self-reported markers of puberty: growth spurt, body hair, skin changes, voice change (boys)/breast development (girls) and facial hair (boys)/menarche (girls). At the age of 12, growth spurt was coded as 1 = growth spurt has not taken place, 2 = growth spurt in beginning, 3 = growth spurt is rapidly underway; body hair as 1 = no body hair, 2 = some body hair, 3 = definite body hair; skin changes as 1 = no, 2 = some, 3 = definite skin changes; voice change as 1 = no, 2 = voice is beginning to change, 3 = voice change is underway; breast development as 1 = no, 2 = breast growth is beginning, 3 = breasts are clearly growing; facial hair as 1 = no, 2 = facial hair is beginning to grow, 3 = facial hair is clearly growing; and menarche as 1 = no, 4 = yes. At the age of 14, an additional 4 = “the developmental phase has been completed” was added to each marker of puberty (range 1–4) but facial hair growth and menarche, which were coded as at age 12. PDS was calculated by summing the markers of puberty to obtain a total score, which was then divided by five to maintain the individual item value range [22]. Hence, the higher the mean score the further puberty has progressed. The scores cannot be transposed in Tanner stages but serve to identify individual variation in pubertal development. All individuals with any missing value on PDS variables were excluded from the analysis. PDS data were not available for the FinnTwin16 cohort. Self-reported pubertal age (PA), defined as the age of menarche (girls) or voice break (boys), was obtained from FinnTwin12 participants at the age of 17 and FinnTwin16 participants at the age of 16. All individuals with missing or inconsistent information on PA were excluded. Moreover, individuals who reported that they did not have menarche/voice break yet were coded as the event happening at the age of 17 [13].

Other phenotypic data

For each twin included in the analysis, we also extracted the following items: sex, zygosity, age at questionnaire wave as well as age, smoking status and alcohol consumption at the time of blood sampling. Smoking status was categorised into current, former (abstinent for at least six months) and never smokers, while alcohol consumption was calculated in units of ethanol g/day during the past week, as described previously [23].

DNA methylation data

Blood samples for DNA methylation analyses were collected during in-person visits in adulthood (age range: 21–33.7 and 23.3–42.7 for FinnTwin12 and FinnTwin16, respectively). High molecular weight DNA extracted from whole blood by standard protocols was bisulfite converted with the EZ-96 DNA Methylation-Gold Kit (Zymo) according to the manufacturer’s instructions. Methylation was quantified using Illumina Infinium HumanMethylation450 and EPIC BeadChip platforms that cover more than 450,000 and 850,000 CpG sites, respectively.

The DNA methylation data were pre-processed in the R package meffil [24]. Bad quality samples were excluded based on the following criteria: i) sex mismatch, ii) median methylation vs. unmethylated signal > 3 standard deviations (SD), iii) failed control probe metrics and if > 20% of probes per sample had iv) detection p value > 0.05 and v) bead number < 3. To remove technical variation between the samples, functional normalisation including the control probe principal components was performed, followed by bad quality probe removal: i) probes with detection p value > 0.05 in more than 20% samples, ii) bead number < 3 in more than 20% samples, iii) sex chromosome probes and iv) cross-reactive and ambiguously mapped probes as well as probes on polymorphic CpGs [25, 26]. BMIQ normalisation implemented in the R package wateRmelon [27] was then performed to adjust for type 2 probe bias. The same pipeline was used for 450 K and EPIC data. The data pre-processing resulted in 390,304 and 765,385 probes in 450 K and EPIC data, respectively. Thirty-nine and 85 samples were excluded after pre-processing in 450 K and EPIC data, respectively. Beta-values representing DNA methylation were used in all analyses.

Statistical analysis

All analyses were performed in R software (version 4.1.1) [28].

EWAS designs

The EWAS analyses were performed using the limma package [29]. For each design, we created a linear model using the “lmFit” function, and the model statistics were computed using the “eBayes” function. The EWAS on PDS was performed separately on individuals who reported the PDS items at ages 12 and 14. Both sex-stratified and combined models for PDS were performed. The EWAS on PA was performed on the merged cohorts of individuals who reported their PA at age 17 in FinnTwin12 and 16 in FinnTwin16. The EWAS on PA was performed separately in males and females. All EWAS designs are reported in the Supplementary Methods (see Additional file 1). For both 450 K and EPIC data, the cell-type proportions were calculated using Flow-Sorted.Blood. EPIC [30] R package, which is based on a modified version of the Houseman algorithm [31].

We evaluated various combinations of covariates by utilising the AIC to determine the best model fit for the data. This was done using the “selectModel” function from the limma package. For each EWAS run, we evaluated the test statistics inflation and bias using the BACON package [32]. Additionally, for visual investigation of inflation and bias, QQ plots and test statistic histograms were used, respectively. The QQ and Manhattan plots were created using the qqman package [33].

All EWAS models were corrected for smoking, alcohol consumption, age (at the time of blood sampling), cell-type proportions, date of BeadChip run and row of the array slide. Additionally, the family ID was included as a random effect to account for the relatedness of twins within a pair. The models that included both sexes were corrected for sex, and the models on PA were corrected for the cohort. The addition of family ID as a random effect was done using the “duplicateCorrelation” function to estimate the intra-block correlations of methylation data with family ID as the blocking vector and by including the mentioned covariates. The estimated correlations were then included in the “lmFit” function where the family ID was again included as a blocking vector.

EWAS meta-analysis

The EWAS results from the 450 K and EPIC platforms, based on the same design, were merged using a meta-analysis. A fixed effect meta-analysis with inverse-variance weights was performed using the “rma” function from the metafor package [34]. The standardised effect sizes and sampling variances were calculated using the “escalc” function. The computed metric was the partial correlation coefficient [35].

The input parameters were the t statistic, sample size and number of covariates in the regression model. The output parameters were the standardised meta-analysed effect size, p value, Q heterogeneity estimate and its corresponding p value. We corrected the heterogeneity p values using the Benjamini‒Hochberg (BH) method. All CpG sites with a corrected Q p value < 0.01 were considered heterogeneous, and a random effect meta-analysis was performed on such CpG sites.

For the discovery of CpG sites, we used two significance cut-offs on the p values obtained from the meta-analysis: the cut-off recommended for the 450 K platform 2.4 × 10−7 [36] and a suggestive p value cut-off at 1 × 10−5 [37]. We annotated all CpG sites with a standardised effect size larger than the absolute value of 0.13 using the IlluminaHumanMethylation450kanno.ilmn12.hg19 package [38]. This effect size cut-off value corresponds to the maximum effect size at the 99th percentile across the analysed eight models.

Analyses of genetic and environmental influences on CpG sites

To follow-up on results from the EWAS, we ran four sets of analyses to further understand the genetic and environmental influences on CpG sites significantly associated with puberty. These analyses, described in detail below, include univariate twin modelling, discordant twin pair analyses, meQTL analysis and analysis of sex differences in CpG sites. Broadly speaking, each analysis provides information on the sources of variation underlying methylation at each CpG site.

Univariate twin modelling

CpG sites associated with PDS or PA with a p value lower than the suggestive cut-off (p < 1 × 10−5) that met the model assumptions and with methylation value SD > 0.05 were assessed by twin modelling (see details in Additional file 1: Methods). Univariate twin modelling was performed by the openMx R package [39] with age at blood sampling and sex included as covariates to estimate variance components for additive genetic effects (A), common environmental effects (environmental exposures and experiences within each twin pair that act to make twins within a pair more similar to each other—C), genetic dominance effects (D), and unique environmental effects (nonshared exposures and experiences that act to make twins within a pair less similar to each other—E). All potential models (i.e. ACE, ADE, AE, CE, DE, E) were compared to determine the best-fitting set of variance components based on the log-likelihood test.

Discordant twin pair analyses

CpG sites associated with PDS or PA with a p value lower than the suggestive cut-off (p < 1 × 10−5) were assessed by discordant twin pair analysis. The discordant twin pair design tests whether cotwins who differ in pubertal age or pubertal development also differ in methylation value at each CpG site while naturally controlling for all genetic and environmental confounders shared by the twins within a pair. Discordant twin pair analyses offer the most powerful statistical approach when conducted within MZ twin pairs since the twins in a pair share 100% of their genomes. Here, we additionally pooled MZ and DZ twin pairs together to increase power due to increased sample size compared to the within-pair analyses among MZ pairs only.

We performed a mixed effects model that decomposes the individual-level effect identified in the EWAS into between-pair and within-pair effects. We included a random effect of family and fixed effects of age at blood sampling, sex, and platform as covariates. The between-pair predictor was defined as the average of the exposure within a twin pair. The resulting between-pair effect quantifies how pubertal age and development on average relate to methylation. The within-pair predictor (i.e. discordance) was defined as each twin’s value on the exposure subtracted from their cotwin’s value. The resulting within-pair effect quantifies the degree to which the twin with higher pubertal age or development also has higher or lower methylation, as compared to their cotwin, averaged across pairs and controlling for shared confounds. All pairs were included in the analysis without setting any threshold for within-pair discordance, as they still contribute information to the between-pair effect. Discordant twin pair analyses were conducted in the same subsample for which significant EWAS results were identified.

meQTL analysis

In the context of this study, on the one hand, meQTL SNPs could represent a genetic mechanism that explains the high heritability of a CpG site and, on the other hand, could additionally provide functional information on the SNP-trait association. We used the Genetics of DNA Methylation Consortium (GoDMC) meQTL database [40, 41] to explore whether the CpG sites associated with PDS or PA are established meQTL. For each CpG site, we extracted all genome-wide significant (p < 5 × 10−8) meQTLs. In addition, for each CpG site for which we could estimate heritability, we evaluated the correlation between the number of meQTLs and heritability using spearman correlation.

Sex differences

Sex differences at the PDS- or PA-associated CpG sites were tested in two ways: differences in methylation values and differences in twin model parameters between the sexes. To test for the differences in DNA methylation value medians between males and females within CpG sites associated with PDS or PA, we performed a Wilcoxon two-sample test using the “wilcox.test” function and corrected the p values using the Bonferroni method. CpG sites with a p value < 0.01 were considered differentially methylated between the sexes. Potential sex differences were also evaluated for all CpGs included in the univariate twin modelling analyses. We explored quantitative and qualitative sex differences, where quantitative sex differences refer to cases where the same genes influence a trait (here, methylation at a particular CpG site) in both sexes but in different magnitudes, and qualitative sex differences refer to cases where different sets of genes influence a trait for each sex (i.e. genes specific to one sex). Importantly, this analysis cannot identify which specific genes influence the trait in each sex, only whether the set of genes as a whole is the same or different between sexes.

An omnibus test was performed to explore whether there were any sex differences in any parameter in the model. The omnibus test compares a model in which every parameter is freely estimated for males and females to a nested model in which all parameter estimates are simultaneously fixed to be equal for both sexes. For all CpG sites with a significant omnibus test (i.e. there are sex differences present in the model), we evaluated each parameter individually to determine which differed between the sexes. The specific sex differences were evaluated by comparing the model in which one parameter was constrained to equality between sexes to a model in which the parameter was freely estimated. P values were generated from likelihood ratio tests and were corrected using the Bonferroni method (function “padjust” in R). The significance cut-off for these adjusted p values was < 0.01 for all tests. A significant p value indicates that males and females do differ on that parameter, whereas a non-significant p value indicates no sex differences.

Pathway analysis

Ingenuity pathway analysis (IPA) software from QIAGEN Inc. [42] was used to assess whether the PDS- and PA-associated CpG sites and meQTLs were enriched in specific gene ontologies or pathways. The IPA was performed on gene annotated CpG sites with a standardised effect size >|0.13|, using the “Human Methylation 450 v1-2” and the “Ingenuity Knowledge Base” as the reference gene set for CpGs and meQTL SNPs, respectively. The analyses were based only on Homo sapiens, and we interpreted the results based on the p value of the associations (significance cut-off was 0.05) with a focus on the canonical pathways and linked diseases or biological functions (“Diseases or function annotation”).


To identify blood DNA methylation profiles in early adulthood associated with PDS or PA, we performed sex-stratified meta-EWAS on Finnish twins with sample sizes stratified by the array platforms shown in Additional file 3: Table S1. Furthermore, we also performed EWAS models on PDS using the full sample to determine sex-independent CpG sites. The workflow of the current study is presented in Fig. 1. The average age at which the methylation data were obtained was 22.7 years (SD: 1.7) for FinnTwin12 and 28.1 years (SD: 4.3) for the FinnTwin16 cohort. The PDS was obtained from the FinnTwin12 cohort at mean ages of 11.4 (SD: 0.3) and 14.0 (SD: 0.1), while the PA was obtained at mean ages of 16.2 years (SD: 0.1) and 17.6 years (SD: 0.2) from the FinnTwin16 and FinnTwin12 cohorts, respectively. The PDS and PA of the selected individuals (individuals with methylation data included in the study) did not significantly differ from the complete FinnTwin12 and FinnTwin16 cohorts [43]. Detailed descriptive statistics on PDS, PA and other relevant variables are seen in Table 1. A total of 347,521 CpG sites were included in the EWAS meta-analyses, with no evidence of inflation or bias in any of the EWAS models (across both platforms, the bias test statistic ranged from − 0.213 to 0.142, while the inflation test statistic ranged from 0.9 to 0.995). Overall low heterogeneity was observed between the two platforms (< 0.07% heterogeneous CpGs across all models).

Fig. 1
figure 1

Workflow diagram of the study. Through a meta-analysis of 450 K and EPIC platforms on 347,521 CpGs, we identified CpG sites associated with PA or PDS at age 12 or 14 (p < 0.00001). We performed univariate twin modelling to assess heritability and to define the proportion of variance attributed to unique and shared environments and investigated potential sex differences in the significant CpG sites. We identified enriched pathways and linked diseases among CpG sites with an absolute effect size > 0.13 by IPA

Table 1 Characteristics of pubertal development scale (PDS) in FinnTwin12, and pubertal age (PA) and EWAS covariates in FinnTwin12 and FinnTwin16

Puberty-associated CpG sites

At the age of 12, PDS was associated with 9 CpG sites (p < 1 × 10–5) in both sexes combined, while 13 and 10 were exclusively associated with PDS in males and females, respectively (Table 2, Additional file 2: Figs. S1 and S2). Two CpG sites in females (cg07581365, cg06988547) had a lower p value than the threshold of 2.47 × 10−7. At the age of 14, no CpG sites were associated with PDS in the model including both sexes (Additional file 2: Figs. S1 and S2). However, 12 and 10 CpG sites were associated with PDS in males and females, respectively (Table 2 and Fig. 2). One CpG site in males (cg04239863) and one in females (cg20599748) were significantly associated with PDS (p < 2.47 × 10−7). A total of 12 of the 54 CpG sites associated with PDS showed significant heterogeneity between the 450 K and EPIC platforms (Table 2), indicating lower reliability of the observed associations for those CpG sites. After performing a random effect meta-analysis for these CpG sites, none remained significantly associated with PDS.

Table 2 CpG sites significantly associated (p < 1 × 10–5) with pubertal development scale (PDS) and pubertal age (PA)
Fig. 2
figure 2

Manhattan plots of the meta-analyses. The Manhattan plots show significant associations between CpG methylation and PDS at age 14 in A males and B females. The red line indicates a significance cut-off of 2.4 × 10–7 as recommended for the 450 K platform, while the blue line indicates a suggestive significance cut-off at 1.0 × 10–5. Embedded (top left corner of Manhattan plots) are the QQ plots of meta-analysis p values

Four CpG sites were associated with PA with p < 1 × 10−5: two unique CpGs in males and two in females (Table 2, Additional file 2: Figs. S1 and S2). One of the CpG sites associated with PA in males (cg06096446) had a p < 2.47 × 10−7. None of the CpGs associated with PA were heterogeneous between the two platforms.

Genetic and environmental effects underlying the associations

First, as puberty itself as well as DNA methylation are highly heritable, we wanted to investigate the heritability and shared/nonshared environmental components explaining the variance of the 58 CpG sites associated with PDS or PA in early adulthood. We identified 14 CpGs for which both assumptions of the twin models were met, and the CpG had methylation SD > 0.05 (Additional file 3: Table S2). With respect to the univariate models, 13 out of 14 CpG sites were best modelled with only additive genetic (A) and unique environmental components (E) (Table 3). Heritabilities for the AE CpG methylation values ranged from 0.51 to 0.98, indicating strong additive genetic influences on these CpGs (average heritability was 0.79). Only one of the 14 CpGs (cg15030014, associated with PDS at age 12 in both sexes) also required the inclusion of a shared environmental component (C), indicating strong familial resemblance on this CpG due to environmental sources in addition to additive genetic influences (A = 0.24, C = 0.68).

Table 3 Univariate and sex differences twin modelling of 14 CpG sites meeting the twin modelling assumptions

Second, as only 14/58 CpG sites met the twin modelling criteria, we applied a discordant twin pair study design to identify PDS- and PA-associated CpG sites with their methylation likely driven by nongenetic factors. Mixed effects models were run to decompose the effect identified in the EWAS into within-pair and between-pair associations. Significant within-pair associations are consistent with a nongenetic effect (i.e. an environmental effect consistent with causality). Here, CpGs that were considered differentially methylated within the pairs had a significant within-pair difference in the pooled analysis (MZ and DZ) as well as a suggestive difference in the same direction (p < 0.10) in the MZ-only analyses. Full results are available in Additional file 3: Table S3. We identified six differentially methylated CpGs, five of which were associated with PDS-12 and one with PA. For three CpGs, the MZ twin with a higher PDS-12 score also had higher methylation than their cotwin (cg14609721, cg03468249, cg05797363). For two CpG sites, the MZ twin with a higher PDS-12 score had lower methylation than their cotwin (cg08091771, cg10669219). Finally, for one CpG, the MZ twin with older PA also had higher methylation than their cotwin (cg06096446). These effects suggest an environmentally driven relationship between early pubertal development and DNA methylation. This effect is consistent with a causal relationship but cannot determine the direction of effect (i.e. whether methylation causes earlier puberty or if early puberty causes methylation) due to the timing of methylation data relative to pubertal development.

Third, to gain additional insight into the genetic basis of DNA methylation at the 58 CpG sites associated with PDS and PA, we looked for cis and trans-meQTLs from the GoDMC database affecting methylation at these sites. Thirty-nine of the 58 CpG sites had cis-meQTLs, and only one CpG had trans-meQTLs (Fig. 3). The number of associated meQTLs ranged from 1 to 1676 at these 39 CpG sites and a strong correlation between the number of meQTLs and heritability of the CpG was observed (ρ = 0.88, p value = 4.1e−05). Additionally, we found 3 CpG pairs whose methylation was affected by a subset of common SNPs (Fig. 3).

Fig. 3
figure 3

Circos plot showing the chromosomal locations of the 58 CpGs associated with PDS or PA. The inner circle shows the standardised effect sizes of the association (scale ranging from − 0.3 to 0.3) with the dashed line representing 0. The shape of the points plotted on the effect size scale determines the variable analysed (PDS or PA) in the model in which the CpG was found, while the colour of the CpG site identifier shows whether the model was performed on males, females or both sexes. The 8 trans-meQTLs of CpG cg08883485 are depicted with the black line inside the circos plot (pink arrowhead pointing to the CpG site and purple arrowhead pointing to the genomic region of the 8 trans-meQTLs). The CpGs with common cis-meQTLs are shown with black square brackets

Sex differences in CpG sites

As we observed overall stronger associations in the sex-stratified models compared to models including both sexes, and as there were no common significant CpG sites between the EWAS models in males vs females, we aimed to characterise potential sex differences and similarities in the identified CpG sites. Therefore, we performed a Wilcoxon two-sample test on CpG sites associated with PDS or PA and investigated sex differences by exploiting the properties of opposite-sex dizygotic twin pairs. Based on all included samples in this study, 21 CpG sites were differentially methylated between males and females (Table 2). Three differentially methylated CpGs were obtained from models that included both sexes (associated with PDS at the age of 12).

To investigate sex differences utilising the twin modelling method, we first examined qualitative sex differences (i.e. whether different sets of genes impact CpG methylation in males and females). We also performed an omnibus model on all 14 CpGs included in the twin analyses to determine whether there are any quantitative sex differences (i.e. whether parameter estimates differ between males and females). Qualitative sex differences existed for none of the 14 CpGs, indicating that the same additive genetic influences impact methylation of CpG sites associated with puberty in both males and females and that there are no sex-specific genes influencing their methylation. We obtained ten CpG sites (9 based on AE models and 1 ACE model) with quantitative sex differences (Table 3, Additional file 3: Table S4). Three CpGs exhibited significant differences in variance compositions. For both cg09179916 and cg02614024 (associated with PA in females and PDS at 14 in females, respectively), the additive genetic component was larger in females, and the nonshared environmental component was larger in males, but total variance did not differ between the sexes. For cg15030014 (associated with PDS at age 12 in both sexes), both the additive genetic component and the total phenotypic variance were larger in males. In general, we identified sex differences in both methylation and its underlying variation in CpGs associated with puberty in one or both sexes.

Enrichment analysis

We obtained relevant canonical pathways and associated diseases/functions using IPA software on PDS- and PA-associated CpG sites with an effect size larger than the absolute value of 0.13 (see Additional file 3: Tables S5 and S6 for the CpG sites). CpGs associated with PDS at the age of 14 in males were enriched in sperm motility, axonal guidance signalling and synaptogenesis signalling canonical pathways, while in females, the pathway results were less conclusive. Full results of the canonical pathways from IPA for all models can be seen in Additional file 3: Table S7.

CpGs associated with PA in males were enriched in adipogenesis, choline degradation and thyroid hormone biosynthesis canonical pathways, while in females, they were enriched in tyrosine degradation and choline biosynthesis. Across all sex-specific models, associated CpGs were linked to physiological system development, such as growth of muscle tissue or development of neural cells in PDS-14 models or connective tissue development and tissue morphology in PA (Additional file 3: Table S8). Breast, prostate and ovarian cancers were universally found to be linked to CpGs in the sex-stratified models on PDS both at ages 12 and 14. Intestinal and thyroid cancers were highly significantly linked diseases in CpGs associated with PDS at age 14 in males. In females, the highly significant diseases were cancer of the head (umbrella term) and gastric cancer (Additional file 3: Table S8).

To explore whether the SNPs underlying the methylation associations with puberty are enriched in any relevant processes and functions, we performed IPA on meQTLs of all 39 CpG sites. IPA was performed on 13,787 unique meQTLs, and they were enriched in several immune pathways, such as antigen presentation and the T-helper 1 and T-helper 2 pathways, usually driven by genes linked to the human leukocyte antigen (HLA) complex (e.g. HLA-DQA2 and HLA-DPA1). The multiple sclerosis signalling pathway was one of the highly enriched canonical pathways among the meQTLs (Additional file 3: Table S9). Concerning the diseases and functions category of IPA, several diseases were highlighted, such as insulin-dependent diabetes mellitus, rheumatoid arthritis, and endocrine, thyroid and breast cancers.


In a sizable cohort of Finnish twins, we identified methylation values at 58 CpG sites to be positively or negatively associated with PDS or PA when assessed among individuals in early adulthood. We observed both sex-specific associations and associations common to both sexes, with overall stronger associations in sex-stratified models. On average, we identified stronger and more numerous associations with PDS than PA. Moreover, relatively few CpGs overlapped between any of the models.

We considered PDS at age 14 of particular importance; compared to the models at age 12, it is generally closer to pubertal development peak [44], while PA was reported retrospectively, making it more prone to reporting errors. Additionally, CpG sites with methylation associated with PDS at age 14 were frequently located in development-related genes. For instance, male PDS-related hypomethylated cg19238380 is located in the LMNA gene, which plays a critical role in the normal development of the peripheral nervous system and skeletal muscle [45, 46]. Additionally, the CpG site cg03551401, where methylation is negatively associated with PDS in females, is located on the ADCY8 gene, which contributes to several brain functions, including learning, memory, and drug addiction [47]. Depending on their genomic location, there are numerous ways the methylation of CpG sites can affect the function or expression of the respective gene. For example, several reported CpGs in this study are mapped to enhancer or promoter regions, and could in theory affect the regulation of the respective gene [48]. Similarly, CpG sites found on the first intron of a gene could also have an impact on the function of the respective gene through affecting transcription factor binding [49]. However, the potential effects of reported CpG sites on their respective genes are to be elucidated and quantified in the future studies. Any explicit functional conclusions are beyond the scope of the current study where methylation was quantified years after puberty in adulthood, and the methylation at the highlighted CpG sites should be rather considered as biomarker for pubertal development.

Three CpGs, with an effect size larger than the absolute value of 0.13 but below the specified p value thresholds, were also found in two studies that investigated genome-wide DNA methylation profiles in peripheral blood before and after puberty [16, 17]. The three CpG sites are cg01794929, cg21361322 and cg15028548 (the former two mapped to the SELM gene and the latter to ABI3BP) and were associated in our study with PDS at 14 in females, PDS at 12 in females and PDS at 14 in males, respectively. Notably, the main difference between our study and the two mentioned studies is that we explored the methylation profiles of young adults explained by pubertal timing and development, while they investigated methylation profiles before and after puberty. This difference is reflected in the observed opposite directionality of effect sizes for the 3 CpG sites.

Overall, canonical pathway enrichment results when both sexes were considered together were less conclusive than sex-specific pathway analyses. Sperm motility, axonal guidance signalling and synaptogenesis signalling pathways were the most strongly enriched canonical pathways from the CpGs associated with PDS at age 14 in males. Additionally, an antagonistic enrichment related to choline and tyrosine metabolism was found between males and females among the CpG sites associated with PA. Furthermore, although not the diseases with the strongest enrichment, breast, ovarian and prostate cancers were linked to CpG sites across all models, which is consistent with studies showing associations between early puberty and these diseases [1, 9]. Previously reported diseases related to early puberty, such as endometrial cancer [1, 9] or cardiovascular diseases [3], were also linked to CpG sites in some of the models in the current study, such as in the sex-stratified PDS at age 14. Importantly, the significant canonical pathways and diseases do not necessarily indicate a direct functional effect of the underlying DNA methylation sites associated with puberty. Rather, they represent a general list of pathways or diseases that could be linked to puberty through the identified DNA methylation biomarkers.

In addition to characterising the genes relevant to the identified CpG sites, we also examined their underlying sources of variation via twin modelling, as both puberty [4,5,6,7] and DNA methylation are known to be heritable traits [18, 19]. The CpG sites reported in this study, which were predominantly best modelled with additive genetic and unique environmental sources of variation, were mostly found to be highly heritable. Generally, methylation values of these CpGs were more heritable than pubertal development or PA themselves, with heritability ranging between 0.51 and 0.98 for AE CpGs, while for age at menarche or age at voice change, it ranges from 0.50 to 0.59 [4, 5, 50]. Methylation at only one CpG site (cg15030014), located in the GET4 gene, showed strong influence from the shared environment. Interestingly, GET4 has an enhancer with a SNP (rs9690350) mapped to the gene region of PDGFA, which is associated with pubertal onset [51]. The same CpG, as well as 47 other significant CpGs, are associated with age [52], which supports our findings on associations with pubertal timing. Additionally, they could be considered markers for epigenetic age acceleration in this critical period of life. It is unclear to what extent the high heritability of CpGs observed here may reflect genetic mechanisms that influence pubertal development, given that the blood sampling occurred after the completion of puberty.

To assess in more detail the potential genetic effects on puberty-associated CpG sites, we performed meQTL analyses. Across 39 CpG sites, 13,787 unique meQTLs were found, with a strong correlation between the number of meQTLs and heritability of individual CpGs, as was found in Min and colleagues in 2021 [40]. A total of 4507 unique SNPs associated with age of menarche in two large-scale genome-wide association studies (GWAS) [1, 9] were identified as meQTLs in our study. They could provide a putative mechanistic explanation for PDS- or PA-associated SNPs identified in the GWA studies, as DNA methylation may impart a mechanistic link for these SNPs and explain their association with age at menarche. A nonzero proportion of SNPs reported by the two studies was found in all our designs, barring PDS at age 14 in males and females combined. The percentage of meQTLs associated with CpGs from the eight models, also found among the 4507 SNPs, ranged from 21.1 to 43.2% (26–1289 meQTLs). The meQTLs of highly heritable CpGs also had a substantial proportion of common meQTLs with GWAS on puberty, ranging from 1 to 36.3% (16–465 meQTLs). Hence, these meQTLs could, in part, reflect genetic mechanisms that influence pubertal development.

Pathway-level analyses on the meQTLs of CpG sites associated with puberty revealed an enrichment in HLA complex-related immune pathways such as antigen presentation and T-helper pathways. A positive association between HLA complex heterozygosity and later pubertal development was reported and was hypothesised to be an evolutionary trade-off between immunocompetence reflected in pathogen resistance and sexual maturation [53]. Moreover, an increase in antigen presentation and T-helper responses has been reported as part of immune changes during pubertal development [54]. Thus, pathway analyses on meQTLs underlying the methylation associations with puberty demonstrate their involvement in immune processes fully developed during puberty, which might be relevant in terms of susceptibility to various diseases later in life. Furthermore, the IPA of meQTLs revealed some of the same linked diseases as our IPA of differentially methylated CpG sites, such as breast and thyroid cancers. Interestingly, the enriched immune pathways were also associated with the central nervous system and could be linked to susceptibility to multiple sclerosis [54], which was strongly enriched in the meQTL IPA analysis.

To rule out the effects of shared genotypes and environment on the observed associations, we performed within-pair analyses. We observed 6 CpG sites with their methylation differing within the pairs, indicating a potentially causal relationship between PDS or PA and methylation at these CpG sites. Two of these CpGs were positively related to PDS, and three were negatively related to PDS, while only one CpG site (cg06096446) with high heritability, whose mapped gene (TULP4) is associated with height [55], was positively related to PA. The same CpG site has also been associated with alcohol consumption and disorders in pregnancy [56, 57]. One of the negatively associated CpGs with a significant within-pair methylation difference (cg08091771) has been associated with maternal BMI and nitrogen dioxide exposure [58, 59] and is mapped on PTPRG, which is associated with precocious puberty [60]. These effects are consistent with a causal environmentally mediated relationship between earlier pubertal development and methylation, although our analyses cannot conclusively determine the direction of causality (i.e. whether methylation causes earlier puberty or whether earlier puberty causes methylation). The rest of the puberty-associated CpG sites did not differ within the pairs, suggesting that the relationship between PDS/PA and CpG methylation is due to genetic or environmental confounding.

Twin modelling results on quantitative sex differences and differential methylation results were concordant: all CpGs identified as having sex differences in the mean structure were also differentially methylated between the sexes. Furthermore, for CpGs with sex differences in variance, the sex with a larger variance from additive genetic sources was also the sex for which the EWAS association was identified. For example, cg09179916 was associated with PA in females, and females had a larger additive genetic component. This convergence of results could suggest a mechanistic role, but further analyses are necessary to determine if methylation at these CpG sites has a functional role related to puberty. In addition to the mechanistic role, the observed sex differences emphasise the importance of sex-stratified models for epigenetic studies of puberty.

We found an overall high heritability across the PDS- and PA-associated CpG sites, which for 6 highly heritable CpGs (h2 > 0.8) was likely due to direct SNP effects observed as meQTLs. However, the methylation discordance within twin pairs revealed that, for some CpG sites, despite their high heritability, environmental factors play an essential role in shaping their methylation, possibly indicating the presence of gene‒environment interactions. Finally, the observed sex differences indicate that the same genetic mechanisms, but with different magnitudes in males and females, affect the heritability of CpG sites associated with puberty, giving additional importance to investigating meQTLs in this context.

Our study has both strengths and limitations. The key strengths of our study are the relatively large sample size as well as the inclusion of twin pairs with longitudinal phenotype data on which we were able to separate genetic effects from environmental effects on methylation at the puberty-associated CpG sites. Importantly, previous studies reported no substantial difference in PA between twins and singletons [7, 61]. As the EWAS was performed on blood samples several years after puberty was completed, the main limitation of this study is that it is unclear whether the methylation profile associated with PDS or PA was formed before, during, or after puberty. Although not revealing clear causal characteristics, the methylation profile in the peripheral blood of young adults is associated with pubertal timing and development, making the associated genes, enriched pathways and linked diseases potential intriguing subjects for future investigation. Furthermore, incorrect reporting of pubertal age, especially in males, could be a source of noise and inconsistency within the analyses [62]. In addition, while the self-reported PDS is not the gold standard for measuring pubertal development, it is considered reliable [63]. Furthermore, clinical assessments, such as Tanner staging [64, 65], are too costly for study designs and sample sizes such as ours. Another limitation was the use of two different methylation profiling platforms, reflecting the technological development of array-based methylation analysis. Despite an overall low heterogeneity between the 450 K and EPIC platforms, significant heterogeneity was observed for some of the CpG sites associated with PDS. Since the variance and distribution of PDS and PA did not differ between the samples analysed by the two platforms, the observed heterogeneity could have been caused by technical differences between the platforms and/or the smaller sample size of the EPIC cohort, which mandates that the identified associations at these heterogeneous CpG sites are interpreted with caution. In all reported analyses we used the methylation beta-values, instead of the M-values. Although heteroscedasticity of beta values at highly methylated or unmethylated CpG sites have been observed [66], simulation studies and real data examples have demonstrated that generally there is no significant differences in differential methylation analysis results when using beta-values vs M-values [67].


By identifying CpG sites associated with PDS and PA, we found potential genes, pathways and diseases relevant to puberty in an epigenetic context and complemented the previously reported genetic characterisation of pubertal timing. Our comprehensive analyses on puberty-related CpGs, such as twin modelling, assessment of environmentally driven associations or genetics underlying the association, highlight the role of genetics and unique environment in pubertal timing and development. In addition, we provide evidence for DNA methylation being a putative mechanistic link between genotypes and puberty, as well as puberty-related diseases. Furthermore, our within-pair effects are consistent with a causal environmentally mediated relationship between pubertal development and methylation. To further advance our understanding, it will be essential to utilise longitudinal data from multiple cohorts, such as the Adolescent Brain Cognitive Development Study. By doing so, we could acquire additional insights into the chronological methylation patterns of CpGs associated with PDS and PA, enabling us to deepen our knowledge of the intricate mechanisms underlying puberty and its implications for health and disease.

Availability of data and materials

The Finnish Twin Cohort datasets used in the current study are located in the Biobank of the Finnish Institute for Health and Welfare, Helsinki, Finland. All biobanked data are publicly available for use by qualified researchers following a standardised application procedure (


  1. Day FR, Thompson DJ, Helgason H, Chasman DI, Finucane H, Sulem P, et al. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat Genet. 2017;49(6):834–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Collaborative Group on Hormonal Factors in Breast Cancer. Menarche, menopause, and breast cancer risk: individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. Lancet Oncol. 2012;13(11):1141–51.

    Article  PubMed Central  Google Scholar 

  3. Day FR, Elks CE, Murray A, Ong KK, Perry JRB. Puberty timing associated with diabetes, cardiovascular disease and also diverse health outcomes in men and women: the UK Biobank study. Sci Rep. 2015;5(1):11208.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Anderson CA, Duffy DL, Martin NG, Visscher PM. Estimation of variance components for age at menarche in twin families. Behav Genet. 2007;37(5):668–77.

    Article  PubMed  Google Scholar 

  5. Hur YM. Common genetic influences on age at pubertal voice change and BMI in male twins. Twin Res Hum Genet. 2020;23(4):235–40.

    Article  PubMed  Google Scholar 

  6. Silventoinen K, Haukka J, Dunkel L, Tynelius P, Rasmussen F. Genetics of pubertal timing and its associations with relative weight in childhood and adult height: the Swedish Young Male Twins Study. Pediatrics. 2008;121(4):e885–91.

    Article  PubMed  Google Scholar 

  7. Kaprio J, Rimpelä A, Winter T, Viken RJ, Rimpelä M, Rose RJ. Common genetic influences on BMI and age at menarche. Hum Biol. 1995;67(5):739–53.

    CAS  PubMed  Google Scholar 

  8. Day FR, Bulik-Sullivan B, Hinds DA, Finucane HK, Murabito JM, Tung JY, et al. Shared genetic aetiology of puberty timing between sexes and with health-related outcomes. Nat Commun. 2015;6(1):8842.

    Article  CAS  PubMed  Google Scholar 

  9. Perry JRB, Day F, Elks CE, Sulem P, Thompson DJ, Ferreira T, et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature. 2014;514(7520):92–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Papadimitriou A. The evolution of the age at menarche from prehistorical to modern times. J Pediatr Adolesc Gynecol. 2016;29(6):527–30.

    Article  PubMed  Google Scholar 

  11. Eckert-Lind C, Busch AS, Petersen JH, Biro FM, Butler G, Bräuner EV, et al. Worldwide secular trends in age at pubertal onset assessed by breast development among girls: a systematic review and meta-analysis. JAMA Pediatr. 2020;174(4):e195881.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Fisher MM, Eugster EA. What is in our environment that effects puberty? Reprod Toxicol. 2014;44:7–14.

    Article  CAS  PubMed  Google Scholar 

  13. Silventoinen K, Jelenkovic A, Palviainen T, Dunkel L, Kaprio J. The association between puberty timing and body mass index in a longitudinal setting: the contribution of genetic factors. Behav Genet. 2022;52(3):186–94.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Cousminer DL. Pubertal timing and body mass: genes involved. Curr Opin Endocr Metab Res. 2020;14:117–26.

    Article  Google Scholar 

  15. Han L, Zhang H, Kaushal A, Rezwan FI, Kadalayil L, Karmaus W, et al. Changes in DNA methylation from pre- to post-adolescence are associated with pubertal exposures. Clin Epigenet. 2019;11(1):176.

    Article  CAS  Google Scholar 

  16. Almstrup K, Lindhardt Johansen M, Busch AS, Hagen CP, Nielsen JE, Petersen JH, et al. Pubertal development in healthy children is mirrored by DNA methylation patterns in peripheral blood. Sci Rep. 2016;6(1):28657.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Thompson EE, Nicodemus-Johnson J, Kim KW, Gern JE, Jackson DJ, Lemanske RF, et al. Global DNA methylation changes spanning puberty are near predicted estrogen-responsive genes and enriched for genes involved in endocrine and immune processes. Clin Epigenet. 2018;10(1):62.

    Article  Google Scholar 

  18. van Dongen J, Nivard MG, Willemsen G, Hottenga JJ, Helmer Q, Dolan CV, et al. Genetic and environmental influences interact with age and sex in shaping the human methylome. Nat Commun. 2016;7(1):11115.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Villicaña S, Bell JT. Genetic impacts on DNA methylation: research findings and future perspectives. Genome Biol. 2021;22(1):127.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Kaidesoja M, Aaltonen S, Bogl LH, Heikkilä K, Kaartinen S, Kujala UM, et al. FinnTwin16: a longitudinal study from age 16 of a population-based finnish twin cohort. Twin Res Hum Genet. 2019;22(6):530–9.

    Article  PubMed  Google Scholar 

  21. Rose RJ, Salvatore JE, Aaltonen S, Barr PB, Bogl LH, Byers HA, et al. FinnTwin12 cohort: an updated review. Twin Res Hum Genet. 2019;22(5):302–11.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Petersen AC, Crockett L, Richards M, Boxer A. A self-report measure of pubertal status: reliability, validity, and initial norms. J Youth Adolesc. 1988;17(2):117–33.

    Article  CAS  PubMed  Google Scholar 

  23. Stephenson M, Bollepalli S, Cazaly E, Salvatore JE, Barr P, Rose RJ, et al. Associations of alcohol consumption with epigenome-wide DNA methylation and epigenetic age acceleration: individual-level and co-twin comparison analyses. Alcohol Clin Exp Res. 2021;45(2):318–28.

    Article  CAS  PubMed  Google Scholar 

  24. Min JL, Hemani G, Davey Smith G, Relton C, Suderman M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics. 2018;34(23):3983–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Chen Y, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8(2):203–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Zhou W, Laird PW, Shen H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2017;45(4):e22.

    PubMed  Google Scholar 

  27. Pidsley R, Wong CCY, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013;14(1):293.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. R Core Team (2022). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna; 2022.

  29. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47–e47.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Salas LA, Koestler DC. FlowSorted.Blood.EPIC [Internet]. Bioconductor; 2018 [cited 2022 Nov 16]. Available from:

  31. Salas LA, Koestler DC, Butler RA, Hansen HM, Wiencke JK, Kelsey KT, et al. An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray. Genome Biol. 2018;19(1):64.

    Article  PubMed  PubMed Central  Google Scholar 

  32. The BIOS Consortium, van Iterson M, van Zwet EW, Heijmans BT. Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol. 2017;18(1):19.

    Article  PubMed Central  Google Scholar 

  33. Turner SD. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. JOSS. 2018;3(25):731.

    Article  Google Scholar 

  34. Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;5(36):1–48.

    Google Scholar 

  35. Aloe AM, Thompson CG. The synthesis of partial effect sizes. J Soc Soc Work Res. 2013;4(4):390–405.

    Article  Google Scholar 

  36. Saffari A, Silver MJ, Zavattari P, Moi L, Columbano A, Meaburn EL, et al. Estimation of a significance threshold for epigenome-wide association studies. Genet Epidemiol. 2018;42(1):20–33.

    Article  PubMed  Google Scholar 

  37. Neumann A, Walton E, Alemany S, Cecil C, González JR, Jima DD, et al. Association between DNA methylation and ADHD symptoms from birth to school age: a prospective meta-analysis. Transl Psychiatry. 2020;10(1):398.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Hansen KD. IlluminaHumanMethylation450kanno.ilmn12.hg19 [Internet]. Bioconductor; 2017 [cited 2022 Nov 23]. Available from:

  39. Neale MC, Hunter MD, Pritikin JN, Zahery M, Brick TR, Kirkpatrick RM, et al. OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika. 2016;81(2):535–49.

    Article  PubMed  Google Scholar 

  40. Min JL, Hemani G, Hannon E, Dekkers KF, Castillo-Fernandez J, Luijk R, et al. Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat Genet. 2021;53(9):1311–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Genetics of DNA Methylation Consortium. Cis and trans meta-analysis results from genome-wide scans of 420,509 DNA methylation sites [Internet]. [cited 2023 Feb 26]. Available from:

  42. Krämer A, Green J, Pollard J, Tugendreich S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics. 2014;30(4):523–30.

    Article  PubMed  Google Scholar 

  43. Dick DM, Rose RJ, Pulkkinen L, Kaprio J. Measuring puberty and understanding its impact: a longitudinal study of adolescent twins. J Youth Adolesc. 2001;30(4):385–99.

    Article  Google Scholar 

  44. Wehkalampi K, Silventoinen K, Kaprio J, Dick DM, Rose RJ, Pulkkinen L, et al. Genetic and environmental influences on pubertal timing assessed by height growth. Am J Hum Biol. 2008;20(4):417–23.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Muchir A. Identification of mutations in the gene encoding lamins A/C in autosomal dominant limb girdle muscular dystrophy with atrioventricular conduction disturbances (LGMD1B). Hum Mol Genet. 2000;9(9):1453–9.

    Article  CAS  PubMed  Google Scholar 

  46. De Sandre-Giovannoli A, Chaouch M, Kozlov S, Vallat JM, Tazir M, Kassouri N, et al. Homozygous defects in LMNA, encoding lamin A/C nuclear-envelope proteins, cause autosomal recessive axonal neuropathy in human (Charcot-Marie-Tooth disorder type 2) and mouse. Am J Hum Genet. 2002;70(3):726–36.

    Article  PubMed  PubMed Central  Google Scholar 

  47. The UniProt Consortium, Bateman A, Martin MJ, Orchard S, Magrane M, Agivetova R, et al. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480-9.

    Article  Google Scholar 

  48. Angeloni A, Bogdanovic O. Enhancer DNA methylation: implications for gene regulation. Essays Biochem. 2019;63(6):707–15.

    Article  CAS  PubMed  Google Scholar 

  49. Anastasiadi D, Esteve-Codina A, Piferrer F. Consistent inverse correlation between DNA methylation of the first intron and gene expression across tissues and species. Epigenet Chromatin. 2018;11(1):37.

    Article  Google Scholar 

  50. Kretsch N, Mendle J, Harden KP. A twin study of objective and subjective pubertal timing and peer influence on risk-taking. J Res Adolesc. 2016;26(1):45–59.

    Article  PubMed  Google Scholar 

  51. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards suite: from gene data mining to disease genome sequence analyses. CP in Bioinformatics [Internet]. 2016 Jun [cited 2023 Mar 10];54(1). Available from:

  52. Battram T, Yousefi P, Crawford G, Prince C, Sheikhali Babaei M, Sharp G, et al. The EWAS catalog: a database of epigenome-wide association studies. Wellcome Open Res. 2022;31(7):41.

    Article  Google Scholar 

  53. Arnocky S, Hodges-Simeon C, Davis AC, Desmarais R, Greenshields A, Liwski R, et al. Heterozygosity of the major histocompatibility complex predicts later self-reported pubertal maturation in men. Sci Rep. 2021;11(1):19862.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Ucciferri CC, Dunn SE. Effect of puberty on the immune system: relevance to multiple sclerosis. Front Pediatr. 2022;2(10):1059083.

    Article  Google Scholar 

  55. Sollis E, Mosaku A, Abid A, Buniello A, Cerezo M, Gil L, et al. The NHGRI-EBI GWAS catalog: knowledgebase and deposition resource. Nucleic Acids Res. 2023;51(D1):D977–85.

    Article  CAS  PubMed  Google Scholar 

  56. Kazmi N, Sharp GC, Reese SE, Vehmeijer FO, Lahti J, Page CM, et al. Hypertensive disorders of pregnancy and DNA methylation in newborns. Hypertension. 2019;74(2):375–83.

    Article  CAS  PubMed  Google Scholar 

  57. Dugué PA, Wilson R, Lehne B, Jayasekara H, Wang X, Jung CH, et al. Alcohol consumption is associated with widespread changes in blood DNA methylation: analysis of cross-sectional and longitudinal data. Addict Biol. 2021;26(1):e12855.

    Article  PubMed  Google Scholar 

  58. Sharp GC, Salas LA, Monnereau C, Allard C, Yousefi P, Everson TM, et al. Maternal BMI at the start of pregnancy and offspring epigenome-wide DNA methylation: findings from the pregnancy and childhood epigenetics (PACE) consortium. Hum Mol Genet. 2017;26(20):4067–85.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. de Lichtenfels FCAJ, van der Plaat DA, de Jong K, van Diemen CC, Postma DS, Nedeljkovic I, et al. Long-term air pollution exposure, genome-wide DNA methylation and lung function in the LifeLines cohort study. Environ Health Perspect. 2018;126(2):027004.

    Article  Google Scholar 

  60. Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, et al. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database. 2016;2016:baw100.

    Article  PubMed  PubMed Central  Google Scholar 

  61. Willemsen G, Odintsova V, de Geus E, Boomsma DI. Twin-singleton comparisons across multiple domains of life. In: Khalil A, Lewi L, Lopriore E, editors. Twin and higher-order pregnancies [Internet]. Cham: Springer International Publishing; 2021 [cited 2023 Jan 12]. p. 51–71. Available from:

  62. Rasmussen AR, Wohlfahrt-Veje C, de Renzy-Martin KT, Hagen CP, Tinggaard J, Mouritsen A, et al. Validity of self-assessment of pubertal maturation. Pediatrics. 2015;135(1):86–93.

    Article  PubMed  Google Scholar 

  63. Koopman-Verhoeff ME, Gredvig-Ardito C, Barker DH, Saletin JM, Carskadon MA. Classifying pubertal development using child and parent report: comparing the pubertal development scales to tanner staging. J Adolesc Health. 2020;66(5):597–602.

    Article  PubMed  Google Scholar 

  64. Tanner JM, Whitehouse RH. Clinical longitudinal standards for height, weight, height velocity, weight velocity, and stages of puberty. Arch Dis Child. 1976;51(3):170–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. van den Berg SM, Setiawan A, Bartels M, Polderman TJC, van der Vaart AW, Boomsma DI. Individual differences in puberty onset in girls: bayesian estimation of heritabilities and genetic correlations. Behav Genet. 2006;36(2):261.

    Article  PubMed  Google Scholar 

  66. Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinform. 2010;11(1):587.

    Article  CAS  Google Scholar 

  67. Li D, Xie Z, Le Pape M, Dye T. An evaluation of statistical methods for DNA methylation microarray data analysis. BMC Bioinform. 2015;16(1):217.

    Article  Google Scholar 

Download references


We express our gratitude to Teemu Palviainen for his assistance with data management and curation and to Mikaela Hukkanen for her advice and discussions on IPA. We also thank Dr. Giovanna Chiorino for her invaluable advice and insightful discussions. Finally, we thank the FIMM Technology Centre supported by HiLIFE and Biocenter Finland for DNA methylation data generation services, as well as all FinnTwin12 and FinnTwin16 participants and their families included in this study.


Open Access funding provided by University of Helsinki (including Helsinki University Central Hospital). This work was supported by the European Union’s Horizon 2020 Research and Innovation Programme, Marie Skłodowska-Curie (grant number 859860). Data collection in the FinnTwin12 and FinnTwin16 cohorts have been supported by the Wellcome Trust Sanger Institute, the Broad Institute, ENGAGE – European Network for Genetic and Genomic Epidemiology, FP7-HEALTH-F4-2007, grant agreement number 201413, National Institute of Alcohol Abuse and Alcoholism (grants AA-12502, AA-00145, and AA-09203 to R J Rose; AA15416 and K02AA018755 to D M Dick; R01AA015416 to Jessica Salvatore), the Academy of Finland (grants 100499, 205585, 118555, 141054, 264146, 308248 to JK, and grants 328685, 307339, 297908 and 251316 to MO, and the Centre of Excellence in Complex Disease Genetics grants 312073, 336823, and 352792 to JK), and Sigrid Juselius Foundation (to MO).

Author information

Authors and Affiliations



MO, JK, and ES conceptualised the study. ES performed the data analysis, was involved in the visualisation of results, and wrote the first draft of the manuscript. SMZ performed the twin analyses and participated in writing. MKY was involved in data analysis and visualisation. AH performed data pre-processing and normalisation and contributed to data analysis. MO and JK acquired the data, supervised the work, and participated in writing. All authors critically revised the manuscript for important intellectual content and read and approved the final manuscript.

Corresponding author

Correspondence to Miina Ollikainen.

Ethics declarations

Ethics approval and consent to participate

The Finnish Twin Cohorts’ data collection and analysis were approved by the Ethics Committee of the Helsinki University Central Hospital (Dnro 249/E5/01, 270/13/03/01/2008, 154/13/03/00/2011). Written informed consent was provided by the participants prior to data collection.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. Supplementary methods on EWAS designs and twin modelling.

Additional file 2

. Figure S1. Manhattan plots on p values of meta-analysed EWAS models on 450K and EPIC platforms on PDS at age 12 A) in males and B) females, C) combined, on PDS at age 14 in D) males and females combined, and on PA in E) males and F) females. Figure S2. QQ plots on p values of meta-analysed EWAS models on 450K and EPIC platforms on PDS at age 12 A) in males and B) females, C) combined, on PDS at age 14 in D) males and females combined, and on PA in E) males and F) females.

Additional file 3

. Table S1. Sample sizes and final numbers of complete twin pairs included in 450K and EPIC platforms for EWAS designs on PDS and PA. Table S2. Details on univariate twin modelling results on the 14 CpG sites. The estimates for total phenotypic variance and mean, as well as the coefficients of the covariates (age and sex), are shown. Table S3. Individual associations and within-twin pair methylation differences of the 58 CpG sites identified in the EWAS. Table S4. Results of qualitative and quantitative sex differences for omnibus tests and specific parameters on CpG sites. Table S5. CpG sites associated with pubertal development scale (PDS) with a standardised effect size > |0.13| stratified by the model. Table S6. CpG sites associated with pubertal age (PA) in males or females with a standardised effect size > |0.13|. Table S7. CpG sites associated with pubertal development scale (PDS) or pubertal age (PA) with a standardised effect size > |0.13| enriched in canonical pathways from the Ingenuity Pathway Analysis. Table S8. CpG sites associated with pubertal development scale (PDS) or pubertal age (PA) with a standardised effect size > |0.13| enriched in diseases and functions from the Ingenuity Pathway Analysis. Table S9. IPA results on meQTLs (p < 5 x 10-8) of CpG sites associated with pubertal development scale (PDS) or pubertal age (PA). Both the Ingenuity Canonical pathways and diseases/functions annotations are included.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sehovic, E., Zellers, S.M., Youssef, M.K. et al. DNA methylation sites in early adulthood characterised by pubertal timing and development: a twin study. Clin Epigenet 15, 181 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: