Skeletal muscle methylome and transcriptome integration reveals profound sex differences related to muscle function and substrate metabolism

Nearly all human complex traits and diseases exhibit some degree of sex differences, with epigenetics being one of the main contributing factors. Various tissues display sex differences in DNA methylation; however, this has not yet been explored in skeletal muscle, despite skeletal muscle being among the tissues with the most transcriptomic sex differences. For the first time, we investigated the effect of sex on autosomal DNA methylation in human skeletal muscle across three independent cohorts (Gene SMART, FUSION, and GSE38291) using a meta-analysis approach, totalling 369 human muscle samples (222 males and 147 females), and integrated this with known sex-biased transcriptomics. We found 10,240 differentially methylated regions (DMRs) at FDR < 0.005, 94% of which were hypomethylated in males, and gene set enrichment analysis revealed that differentially methylated genes were involved in muscle contraction and substrate metabolism. We then investigated biological factors underlying DNA methylation sex differences and found that circulating hormones were not associated with differential methylation at sex-biased DNA methylation loci; however, these sex-specific loci were enriched for binding sites of hormone-related transcription factors (with top TFs including androgen (AR), estrogen (ESR1), and glucocorticoid (NR3C1) receptors). Fibre type proportions were associated with differential methylation across the genome, as well as across 16% of sex-biased DNA methylation loci (FDR < 0.005). Integration of DNA methylomic results with transcriptomic data from the GTEx database and the FUSION cohort revealed 326 autosomal genes that display sex differences at both the epigenome and transcriptome levels. Importantly, transcriptional sex-biased genes were overrepresented among epigenetic sex-biased genes (p value = 4.6e−13), suggesting differential DNA methylation and gene expression between male and female muscle are functionally linked. Finally, we validated expression of three genes with large effect sizes (FOXO3A, ALDH1A1, and GGT7) in the Gene SMART cohort with qPCR. GGT7, involved in antioxidant metabolism, displays male-biased expression as well as lower methylation in males across the three cohorts. In conclusion, we uncovered 8420 genes that exhibit DNA methylation differences between males and females in human skeletal muscle that may modulate mechanisms controlling muscle metabolism and health. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-021-01188-1.


Introduction
Various diseases, including but not limited to muscular dystrophy, cardiomyopathies, and cardiovascular disease [1], display sex differences in prevalence, onset, Open Access *Correspondence: Nir.Eynon@vu.edu.au † Sarah Voisin and Nir Eynon are co-senior authors 1 Institute for Health and Sport (iHeS), Victoria University, PO Box 14428, Melbourne, VIC 8001, Australia Full list of author information is available at the end of the article progression, or severity. To improve treatment for such diseases, it is crucial to uncover the molecular basis for the sex differences and their consequences on organ function [2]. Sexually differentiated traits and phenotypes stem from a combination of factors, including genetics (gene variants-by-sex interactions [3], XY chromosome complements [4][5][6][7], genomic imprinting [8]), the hormonal milieu [9,10], and gene regulation [11], with the latter likely contributing the most [1].
Recently, a large-scale study from the Genotype-Tissue Expression (GTEx) consortium unravelled mRNA expression differences between the sexes that are not driven by sex chromosomes, across all tissues. Skeletal muscle was particularly divergent between the sexes, as gene expression profiles in this tissue could predict sex with high specificity and sensitivity [4]. These transcriptomic differences underpin the numerous physiological differences in skeletal muscle between males and females, such as differences in substrate metabolism [12][13][14]. For example, females oxidize more lipids and less carbohydrates and amino acids during endurance exercise, and albeit depending on training status, tend to have a higher proportion of type I (slow-twitch) muscle fibres [15], all of which inherently contribute to enhanced fatigueresistance in female skeletal muscle [16]. As such, females exhibit higher mRNA and protein levels of lipid oxidation-related genes than males [13]. Interestingly, the top gene set corresponding to sex-biased genes in the GTEx study corresponded to targets of the epigenetic writer polycomb repressive complex 2 (PRC2) and its associated epigenetic mark (H3K27me3) [4]. This suggests that the sex-specific deposition of epigenetic marks may be one of the sources of sex differences in gene expression. Epigenetics is a regulatory system that influences gene expression and is modulated by the genetic sequence and environmental stimuli [17]. DNA methylation is currently the best-characterized epigenetic modification and has been shown to differ between males and females in various tissues, such as pancreatic islets [18], blood [19,20], and more recently cultured myoblasts and myotubes [21]. While there is ample evidence for transcriptomic sex differences in skeletal muscle [4,11,12,[22][23][24], it is unclear whether sex differences exist in the DNA methylome of skeletal muscle tissue, and what factors contribute to these differences. Epigenome-wide association studies (EWAS) are ideal for investigating the impact of sex on genome-wide DNA methylation profiles.
Differences in skeletal muscle fibre-type proportions and circulating levels of sex hormones may contribute to epigenomic and transcriptomic differences between the sexes. Males and females are exposed to differing levels of sex hormones across the lifespan [25], which are primarily ascribed to the reproductive function, but their importance to non-reproductive functions, such as skeletal muscle [26], is becoming more apparent [27,28]. Sex steroid hormones engage through their specific ligand-receptors [4,11,29] and influence transcription and phenotype in a tissue-and sex-specific manner [9,[30][31][32][33]. Not only can these receptors be differentially expressed between sexes [25], they also show sex-biased gene targeting patterns due to intrinsic differences in sex hormone levels [11,34]. Human muscle consists of three distinct muscle fibre types: type I (slow-twitch oxidative), type IIa (fast-twitch oxidative), and type IIx (fast-twitch) fibres. Males tend to have a higher proportion of fasttwitch type IIa muscle fibres in various muscle groups compared to females [35,36]. The different fibre types exhibit different methylation patterns [37], as well as different contractile and metabolic properties [38]. Thus, differences in fibre type proportions may underlie sex differences in skeletal muscle DNA methylation.
We performed a large-scale EWAS meta-analysis to explore sex differences in the DNA methylome of human skeletal muscle tissue, using three datasets from our own laboratory and open-access databases (n = 369 individuals; 217 males and 152 females). We established a list of sites (CpG) and regions showing DNA methylation differences between males and females, and explored their genomic context. We then investigated whether transcription factor binding, muscle fibre type distribution, and circulating sex hormone levels explained the observed sex differences in DNA methylation. Next, we integrated the sex-biased DNA methylation with known sex-biased mRNA expression from the GTEx consortium, and inferred the potential downstream effects on skeletal muscle function. Lastly, we validated our findings with transcriptomic data from one cohort used in the meta-analysis and targeted qPCR from another cohort.

Males show profound genome-wide autosomal hypomethylation compared with females in human skeletal muscle
The DNA methylation meta-analysis was conducted on 369 individuals from three datasets (217 males and 152 females). We focused exclusively on the 22 autosomes as sex chromosomes represent a particular case whereby epigenetically driven X-chromosome inactivation takes place exclusively in females. All of the Gene SMART participants were apparently healthy, while the FUSION participants were either healthy or diagnosed with type 2 diabetes mellitus (T2D), and the GSE38291 consisted of monozygotic twins discordant for T2D (Table 1).
We found 56,813 differentially methylated positions (DMPs, single CpG sites) between males and females, spread across the 22 autosomes, at a meta-analysis false discovery rate (FDR) < 0.005 (Fig. 1, Additional file 3: Table S2). Ninety-four percent of DMPs were hypomethylated in males compared with females (Fig. 1a). Henceforth throughout the text, "hypo-DMPs" and "hyper-DMPs" refer to positions which are differentially methylated in males compared to females. On average, DNA methylation levels differed by + 2.8% (hyper-DMPs) and − 3.5% (hypo-DMPs) between males and females, with the largest effect sizes reaching + 15.2% and − 35.7%. To determine the degree to which sex explained the variability in DNA methylation at the sex-specific loci, we performed a principal component analysis (PCA) on these loci in each cohort independently, and ran a linear model between sex and the principal component (PC, also referred to as "dimension") which visually distinguished sex ( Fig. 1b; see "Methods"). Participants clustered according to sex when only focusing on the 56,813 DMPs, suggesting that sex explained a substantial amount of variance at the DMPs. In the FUSION (PC1), Gene SMART (PC2), and GSE38291 (PC2) datasets, sex explained 30, 53, and 66% of variability in the relative PC at the sex-DMPs, respectively. Participants did not cluster according to sex when looking at the whole methylome (Additional file 1: Figure S4), which is expected given that ~ 10% of the autosomal genome displayed sexspecific DNA methylation.
Each data set had a unique study design that required considering different independent variables known to affect DNA methylation, such as age [39] and type 2 diabetes (T2D) [40] in the analysis. We included each of these factors in the relevant linear models, but noted that sex was associated with T2D in the FUSION dataset (Table 1), meaning that male participants from the FUSION cohort more commonly had T2D than females. Therefore, it is possible that the sex-related signal captured in this dataset was partially confounded by T2D. We repeated the meta-analysis excluding T2D participants from the FUSION cohort, but results remained unchanged (Additional file 1: Figure S7).
Fibre type proportions, but not circulating hormones, were associated with differential methylation at loci with sex-specific DNA methylation Males typically show a greater proportion of type II muscle fibres compared with females [15], and type II fibres exhibit hypomethylation compared to type I fibres [37]. Therefore, we hypothesized that the observed DNA methylation sex differences, specifically the hypomethylation in males, may be a result of differing fibre type distributions between males and females. We first estimated type I fibre proportions in the Gene SMART cohort via immunohistochemistry (Additional file 1: Figure S2D) and the FUSION cohort via RNAseq (data available for each dataset in Additional file 14: Table S13). In both the Gene SMART and FUSION cohorts, females had higher proportions of type I fibres than males (Additional file 1: Figure S2B/C). We could not directly add fibre type proportions to the linear model as a covariate, since fibre type proportions are not a confounder (i.e. a factor that influences both sex and DNA methylation independently), but may be a direct downstream effect of sex, in turn affecting DNA methylation. Adding fibre type proportions in the model would therefore distort the association between sex and DNA methylation. To overcome this issue, we stratified the cohorts by sex, added fibre type proportions to the model as a covariate and identified DNA methylation patterns associated with fibre type proportions. We then meta-analysed the results to find CpGs robustly associated with fibre type proportions across both cohorts and all sexes (see "Methods"). We identified 16,275 CpGs associated with fibre type proportions (Additional file 1: Figure S2A). When restricting the analysis to the loci exhibiting sex-biased DNA methylation, 8805 (15.5%) of those were associated with fibre type proportions (FDR < 0.005). Effect sizes ranged from − 0.28% to + 0.30% DNA methylation difference per % increase in type I fibre content (Fig. 2a). Differentially methylated positions (DMPs) with sex in skeletal muscle. a Volcano plot of DNA methylation differences between males compared to females. Each point represents a tested CpG (633,645 in total) and those that appear in colour are DMPs at a meta-analysis false discovery rate < 0.005; red DMPs are hypermethylated in males compared with females; blue DMPs are hypomethylated in males compared with females. The x-axis represents the amount of DNA methylation difference between the sexes, and the y-axis represents statistical significance (higher = more significant). Two DMPs that were present in all three studies and showed the largest effect size are labelled with the respective CpG and boxplots of β-values from each study appear to the right (hyper-DMP) and left (hypo-DMP). b Principal component analysis plots of the methylation values at the DMPs; each point on the graph represents an individual; males denoted in green, females denoted in orange We then aimed to determine whether circulating sex hormone levels underlie the observed DNA methylation sex differences. We analysed estrogen (as estradiol, E2), testosterone (T), free testosterone (Free T), and sex-hormone binding globulin (SHBG) levels using mass spectrometry (Free T derived from calculation) of blood serum in males and females in the Gene SMART cohort. Males and females significantly differed in all four hormone levels (Additional file 13: Table S12). To avoid collinearity with sex, we separated males and females for the association between sex-differential DNA methylation and circulating hormones. We assessed whether each of the four hormone levels was associated with DNA methylation across all of the CpGs and across the sex-DMPs in each sex by adjusting the linear model for a given hormone. In both males and females, circulating free testosterone, testosterone, estrogen, and SHBG levels were not highly associated with DNA methylation (less than five DMPs; FDR < 0.005) of neither all of the CpGs tested nor the sex-DMPs previously identified (Additional file 1: Figure S3). To limit the potentially confounding effect of fluctuating ovarian hormone levels on DNA methylation, female muscle biopsies were collected in the early follicular phase of the menstrual cycle (days 1-7) and blood serum was tested for follicle stimulating hormone (FSH), luteinizing hormone (LH), and progesterone (as well as E2 as previously mentioned). None of the four hormones (when adjusting for each hormone separately or when adjusting for the first two principal components; see 'methods'), were associated with differential methylation of all the CpGs or the sex-DMPs. This suggests that variations in ovarian hormone levels in the early follicular phase did not confound our results.

Males show profound genome-wide autosomal hypomethylation compared with females in human skeletal muscle
Since the association between DNA methylation and gene expression depends on the genomic context, we inspected the genomic distribution of the DMPs to gain insights into their potential function [41]. We compared the distribution of hyper-, hypo-, and non-DMPs (CpGs included in the analysis which did not present sex-biased DNA methylation) among the various chromatin states in human skeletal muscle using the Roadmap Epigenomics Project [42]. DMPs were not randomly distributed in the chromatin states (χ 2 p value < 2.2 × 10 −16 , Fig. 3a); specifically, hypo-DMPs were enriched in enhancers and depleted in transcription start sites (Additional file 1: Figure S1A), while hyper-DMPs were not enriched in any chromatin states given their scarcity. It should be noted that the Roadmap Epigenomics Project characterizes both male and female skeletal muscle chromatin states regions, and there are 536 regions across 369 unique genes where male and female chromatin states differ (across many tissues including skeletal muscle) [43].  Figure S1A). b Distribution of sex-DMPs and non-DMPs at loci whose chromatin states differ between male and female skeletal muscle. Purple denotes all DMPs (hypo and hyper combined) and black denotes non-DMPs. c Distribution of sex-DMPs and non-DMPs in relation to CpG islands. Asterisks represent a greater contribution to the significant relationship between DMP status and CpG island location (Additional file 1: Figure S1B). d Bee swarm plot showing enrichment of transcription factor binding sites (TFBSs)(−(log 10 (p value) using Fisher's exact tests) on the y-axis for differentially methylated positions (DMPs) according to UniBind [45]. The names of the top 10 enriched TFs are denoted by the colour key; brown denotes non-significant TFs. The various data sets for the same TFs are graphed with the corresponding colour Therefore, we performed the chromatin state enrichment analysis on both the male and female chromatin state annotation in skeletal muscle, which yielded equivalent findings. We noted that DMPs were overrepresented in loci whose chromatin states differ between males and females (38.7% of DMPs vs. 32.4% of non-DMPs are in chromatin states that differ between males and females), which means that the odds of a DMP being located in a sex-differing chromatin state increased by a factor of 1.3 compared with a non-DMP (OR = 0.76, 95% confidence interval = 0.75-0.77, Fisher test p value < 2.2e−16) (Fig. 3b). DMPs were also enriched in CpG island shores and depleted in CpG islands (χ 2 p value < 2.2e−16) (Fig. 3c, Additional file 1: Figure S1B). DMPs were overrepresented among CpGs previously reported to be differentially methylated between male and female myoblasts (766 CpGs; p value < 0.0001) as well as myotubes (644 CpGs; p value < 0.0001) (see "Methods"). Given the role of transcription factors (TFs) in regulating chromatin accessibility and thus effecting downstream gene expression [44], as well as the recent studies identifying sex differences in TF targeting patterns [4,11], we next tested whether DMPs were enriched for the experimentally validated binding sites (TFBSs) of 268 TFs from 518 different cell and tissue types [45,46]. The DMPs were overrepresented in the binding sites of 41 TFs (p value < 0.005, Fig. 3d, Additional file 15: Table S14), including hormone-related TFs such as androgen (AR), estrogen (ESR1), and glucocorticoid (NR3C1) receptors.

Gene set enrichment analysis of differentially methylated genes
We next performed Gene set enrichment analysis (GSEA) on the DMGs, as GSEA using epigenomic features may reveal distinct enriched pathways that may not display gene expression differences [11,43]. We performed GSEA on both the DMRs and DMPs (Fig. 4). GSEA on the DMRs revealed enrichment of several Gene Ontology (GO) terms, one Reactome pathway ("muscle contraction"), but no Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Additional file 11: Table S10) (FDR < 0.005). However, GSEA on the DMPs revealed enrichment across all three databases (Additional file 6: Table S5, Additional file 8: Table S7 and Additional file 10: Table S9). Most of the enriched GO terms are biological process (BP) terms, many of which relate to anatomical structure development as well as many muscle-related processes. Furthermore, DMPs were enriched for GO terms related to substrate metabolism such as lipid, protein, and carbohydrate derivative metabolic processes. Nine-hundred and twenty-five genes of the 1407 genes involved in KEGG metabolic pathways were differentially methylated, representing many aspects of substrate metabolism, although the pathway was only significant when analysing the DMPs.

Genes with sex-biased methylation exhibit sex-biased DNA expression in human skeletal muscle
Differentially methylated genes (DMGs) were determined by identifying the genes annotated to differentially methylated regions (DMRs), as DMRs remove spatial redundancy (CpG sites ~ 500 base pairs apart are typically highly correlated [47]), and may provide more robust and functionally important information than DMPs [48,49]. We identified 10,240 DMRs (Stouffer, harmonic mean of the individual component FDRs (HMFDR), and Fisher p value < 0.005). These DMRs were annotated to 8420 unique autosomal genes (including non-coding genes) as annotated by Zhou et al. [50] (see "Methods"; Additional file 4: Table S3).
To gain insights into the potential downstream effects of sex-biased DNA methylation on gene expression, we integrated results from the EWAS meta-analysis of sex with genes whose mRNA expression levels are known to differ between males and females. We used version 8 of the Genotype-Tissue Expression (GTEx) database which contains 803 RNA-sequencing profiles in human skeletal muscle (n = 543 males and n = 260 females). There were 2689 sex-differentially expressed genes (DEGs) on the autosomes in skeletal muscle (accessed from GTEx portal on 08/26/2020). Of the 2689 DEGs, 973 (~ 36%) were in common with DMGs from our cohorts (Fig. 5, Additional file 3: Table S2), including the gene Gamma-Glutamyltransferase 7 (GGT7) (Fig. 6). We confirmed an enrichment of DMRs across sex-biased genes (hypergeometric test p value = 4.6 × 10 −13 ), suggesting that the overlap between sex-differentially methylated genes and sex-differentially expressed genes is larger than what would be expected by chance alone. To gain insight on the relationship between DNA methylation and gene expression of sex-biased genes, we assessed the direction of correlation between DMRs that are annotated to either promoter (TssA and TssAFlnk) or enhancer (Enh and EnhG) regions and their given gene expression (Fig. 5c, d). Sixtytwo and 59% of DMRs in promoter and enhancer regions, respectively, were inversely correlated with gene expression. (From GTEx transcriptome data, similar results were yielded with the FUSION transcriptome data.) The inverse correlation between DNA methylation at both promoter and enhancer regions with gene expression was more than would be expected to occur by random chance (10,000 random permutations; p value < 0.0001 and p value = 0.0009, respectively; Additional file 1: Figure S5). Landen et al. Clin Epigenet (2021) 13:202 Table S5, Additional file 7: Table S6, Additional file 8: Table S7, Additional file 9: Table S8, Additional file 10: Table S9, Additional file 11: Table S10. b Sankey diagram of muscle contraction-related pathways across the three GSEA databases tested and genes within those pathways that were both differentially methylated and expressed (in GTEx and FUSION) between males and females. Numbers next to pathways denote the number of enriched genes in the pathway; numbers next to genes denote the number of pathways (from the ones displayed) that the gene belongs to. Diagram created using SankeyMATIC Landen et al. Clin Epigenet (2021) 13:202

Validation of GTEx sex-biased genes in the cohorts used for methylation analysis
We sought to validate the sex-biased gene expression obtained from GTEx in a subset of the samples used for methylation analysis since the DMGs and DEGs analyses were obtained from different muscle groups. (The DMGs of the current study are from the vastus lateralis, while the GTEx DEGs are from the gastrocnemius.) Although both are skeletal muscle tissue from the leg, there may be differences in muscle phenotypes in differing muscle groups [51]. Analysis of RNA sequencing data from the FUSION cohort revealed 3751 autosomal genes with sex-biased expression (FDR < 0.005). The FDR threshold we chose for the FUSION gene expression data was more stringent than the GTEx local false sign rate threshold (lfsr < 0.05), yet, ~ 34% of the genes which were both DEGs in GTEx and DMGs were also DEGs in the FUSION cohort, totalling 326 genes (hereinto referred to as 'overlapping genes') (Fig. 5a). Given that both the GTEx and FUSION cohorts include participants of relatively older ages, we sought to confirm the mRNA levels in the younger cohort in the analysis (the Gene SMART) for three genes that displayed sex differences at both the mRNA and DNA methylation levels (GGT7, FOXO3, and ALDH1A1) ( Table 2, Additional file 12: Table S11, Additional file 1: Figure S6).

DNA methylation and gene expression of GGT7, FOXO3 and ALDH1A1 consistently differ between males and females in human skeletal muscle
Three-hundred twenty-six genes exhibited differential methylation in the meta-analysis and differential expression among the GTEx and FUSION cohorts, termed 'overlapping genes' . Of those genes, we tested three for gene expression levels, GGT7, Forkhead Box O3 (FOXO3), and Aldehyde Dehydrogenase 1 Family Member A1 (ALDH1A1), in the younger cohort included in the DNA methylation analysis (Gene SMART) given the effect that age has on skeletal muscle gene expression [52]. These three genes showed a large effect size in gene expression and DNA methylation, displayed moderate gene expression levels in skeletal muscle relative to other tissues, and/or contained numerous DMPs and DMRs (  (Fig. 6).

Discussion
We conducted a large-scale meta-analysis of DNA methylation differences between males and females in skeletal muscle, and integrated them with transcriptomic data. We revealed that males display profound genome-wide hypomethylation compared with females, and that hormone-related TFBSs and muscle fibre type proportions underlie the observed DNA methylation sex differences. We also showed that many sex-biased genes found in GTEx exhibit sex-biased DNA methylation, which was partially confirmed in the FUSION cohort. We then validated the gene expression (qPCR) levels of three genes with large DNA methylation and expression differences between the sexes across cohorts, and confirmed that GGT7 and ALDH1A1 had higher gene expression in males. Finally, we showed that the DMGs are overwhelmingly involved in muscle contraction, as well as substrate metabolism and anatomical structure-related pathways.
In the present study, the overwhelming majority (94%) of the DMPs were hypomethylated in males. Interestingly, global autosomal hypomethylation in males has been observed in various other tissues [53], including blood [54,55] and pancreatic islets [18]. We aimed to uncover some of the biological mechanisms at the root of these epigenetic sex-differences by assessing the contributions of fibre type proportions and sex hormone levels to the observed DNA methylation sex differences. We hypothesized that differences in fibre type proportions between sexes may partly explain our findings [55][56][57], as studies report that type I fibres are hypermethylated compared with type II fibres [37], and as females tend to have a higher proportion of type I fibres than males [15]. Consistent with this, we observed that females had higher proportions of type I muscle fibres than males and that type I fibre content was mostly associated with DNA hypermethylation. Importantly, 16% of the loci exhibiting sex-biased DNA methylation were also associated with fibre type proportions. This suggests that at those CpGs, differences in DNA methylation between the sexes are due to the inherent sex differences in fibre type proportions. Nonetheless, the vast majority of the loci that exhibit sex-biased DNA methylation (84%; 48,008 CpGs) differ regardless of the sex differences in fibre type proportions. A recent study on the FUSION cohort, adjusted for fibre type proportions, found that it explains a substantial portion of the variability in DNA methylation for many metabolic phenotypes of interest [58]. Skeletal muscle DNA methylation analyses are performed on whole muscle due to the cost and technical limitations of isolating muscle cell types. Differing non-muscle cell types may be present in a muscle biopsy sample, and it is currently unknown how much of the muscle DNA methylation profile may actually be representing other cell types [59][60][61]. Bioinformatics deconvolution methods have not yet been developed for bulk skeletal muscle DNA methylation. Considering that the DNA methylation differences between cell types are large [62], future studies should aim at determining DNA methylation patterns of the different muscle fibre and cell types so that bulk muscle DNA methylation data can be adjusted for the appropriate cell and fibre proportions. None of the circulating sex hormones were associated with differential methylation across all CpGs, nor across the sex-DMPs in males or females. However, the range of each hormone within each sex may not be large enough to draw out the effect of varying levels of each hormone on the methylome. In the current study, hormone levels were measured from blood, while DNA methylation was measured from skeletal muscle. DNA methylation patterns are highly tissue-specific [63,64], and sex hormone levels in the circulation are not necessarily correlated with those intramuscularly. Moreover, intramuscular, and not circulating, sex hormone levels may be correlated with muscular function [65,66]. A recent review emphasizes the importance of measuring intramuscular sex hormone levels when assessing muscle-related properties in females [67].
The enrichment of hormone-related TFBS among the sex-DMPs suggests that lifelong exposure to differing hormone levels significantly contributes to the observed sex differences in skeletal muscle DNA methylation. Although, the alternative is possible, in which DNA methylation differences influence sex hormone levels. In Unibind, ChIP-seq data in skeletal muscle were limited to one TF (CTCF), so the enrichment of TFBSs among sex DMPs may have limited functional significance in skeletal muscle. Nonetheless, many of the TFs that showed strong enrichment in the present study, such as AR [68,69], ESR1 [69], and SMAD3 [70], are expressed in skeletal muscle and have important roles in muscle phenotype. Of note, multiple testing assumptions cannot be met with Unibind; therefore, no correction can be made, and therefore, false positives may emerge. Two recent studies leveraging the GTEx database identified sex differences in TF targeting patterns across several human tissues, including skeletal muscle, which contribute to sex-biased gene regulatory networks [11] and gene expression [4]. Differences in sex hormone levels between developing males and females are already evident in utero [71], making it challenging to design an experiment in humans that disentangles the effect of long-term hormonal exposure from biological sex, and other related factors, on cell function. Studies have utilized menopausal females [72] and transgender people [73] receiving hormone replacement therapy (HRT) to investigate the influence of longterm exposure to sex hormones on various phenotypes and risk of diseases. For example, HRT for one postmenopausal monozygotic twin and not the other has positive effects on regulation of muscle contraction and myonuclei organization, suggesting that estrogen has direct effects on muscle function [74]. Nevertheless, uncovering the genomic regions that display sex-differential methylation as well as contain hormone-responsive TFBSs, provides insight on which genomic regions, hormones, and TFs are discerning male and female skeletal muscle.
The enrichment for muscle contraction-related pathways among the DMGs across GO, KEGG, and Reactome suggests that the sex-differential DNA methylation has functional relevance in skeletal muscle function. Furthermore, enrichment of substrate metabolism pathways among the DMGs suggests that the observed sex differences in carbohydrate, lipid, and protein metabolism [13,75] may have a molecular, more specifically an epigenetic, basis. This is corroborated with results from transcriptomic studies, which report that skeletal muscle femalebiased genes are enriched for pathways involved in fatty acid metabolism, while male-biased genes are enriched for pathways involved in protein catabolism [22].
Although not well understood, the sex chromosome complement may also influence autosomal DNA methylation patterns. In cultured fibroblasts, the presence of Sex-determining Region Y (SRY) is associated with lower autosomal methylation levels [76][77][78]. Additionally, a higher number of the X chromosomes, in the absence of SRY, leads to increased methylation levels at a specific sex-differentially methylated autosomal region [78]. This could be attributed to allele dosage compensation, a female-specific process that silences one of the X chromosomes in a cell [79,80]. Approximately one-third of genes 'escape' inactivation, remain transcriptionally active in XX cells, [80][81][82], and have been suggested to affect autosomal DNA methylation via their histone marks [78,83]. Moreover, females with Turner syndrome (partially/fully missing one X) and monosomy X have lower global methylation than XX females, but higher than XY males [84,85]. The effect of sex chromosome complement on autosomal DNA methylation in skeletal muscle has yet to be explored. Finally, genetic variants (copy number variants (CNVs) and single nucleotide polymorphisms (SNPs)) may affect DNA methylation status at specific loci [86], termed methylation-quantitative trait loci (meQTL). However, with the exception of a female bias for large, rare CNVs, there are no large sex differences in autosomal SNP minor allele frequencies [1].
The relationship between DNA methylation and gene expression is complex; DNA methylation at promoters, enhancers, and 1 st exons is generally believed to enhance gene silencing, while DNA methylation at gene bodies can sometimes be associated with increased gene expression [87][88][89][90][91]. Using a permutation test, we showed that DNA methylation differences between the sexes at promoters and enhancers were more often associated with lower gene expression than would be expected by chance alone. DNA methylation differences between the sexes were also particularly prominent in chromatin states that are known differ between males and females. This suggests that DNA methylation differences between males and females reflect alterations in chromatin activity, and differential epigenetic states and expression are likely functionally connected. In line with this, chromatin states that differ between the sexes have been shown to be enriched for sex-biased genes across various tissues, including skeletal muscle [43]. However, it is not yet possible to assess whether the relationship reflects correlation or direct causality. There is still debate around whether epigenomic features drive regulatory processes or are merely a consequence of transcription factor binding [43].
We identified 326 genes with consistent differential skeletal muscle DNA methylation and expression across 1172 individuals altogether (369 individuals from three cohorts for DNA methylation and 1077 individuals from two cohorts for gene expression). Although we found profound global DNA hypomethylation in males, of the overlapping genes there were equivalent numbers of genes overand under-expressed in males compared with females for both GTEx and FUSION. Indeed, hypermethylation is not always associated with decreased gene expression [92]. The substantial overlap between differentially methylated genes and differentially expressed genes highlights many genes that may be of interest for their roles in muscle-related processes. Furthermore, the significant overlap between the present study and Davegårdh et al. [21] implies that there are specific loci and genes that show DNA methylation differences between the sexes both in muscle cells and in muscle tissue. We focused on three of these genes that displayed a large DNA methylation difference between males and females, are highly expressed in skeletal muscle, or play a role in skeletal muscle function: FOXO3 for its role in ageing, longevity, and regulating the cell cycle [93]; ALDH1A1 for its role in aldehyde oxidation and because sex differences in skeletal muscle mRNA levels have been reported, suggesting that males might be able to metabolize aldehydes (i.e. alcohol) more efficiently than females [12]; and GGT7 for its role in antioxidant activity [94]. Of these three genes (which were validated across GTEx, FUSION, and Gene SMART), FOXO3 and GGT7 have also been reported to exhibit differential methylation between male and female myoblasts as well as myotubes [21]. GGT7 and ALDH1A1 showed consistently higher expression levels in males, while FOXO3 showed opposite sex-biased expression in the young versus the old cohorts. FOXO3 expression was lower in males in the older cohorts (GTEx and FUSION), and higher in males in the younger cohort (Gene SMART). Other studies have shown that males have higher FOXO3 expression in young skeletal muscle [95] and that elderly females have higher skeletal muscle FOXO3 expression than younger females [96]. While FOXO3 skeletal muscle gene expression differs between males and females, it seems that the direction is opposite in young and old individuals, which emphasizes the caution that should be used when interpreting sex differences across a large age range of individuals. Interestingly, FOXO3 was hypomethylated in skeletal muscle with age in a recent study from our group [97]. Regions with differential DNA methylation in the three validated genes were located in enhancers (FOXO3 and ALDH1A1) and regions of strong transcription (GGT7), suggesting that subsequent transcription may be altered. As an example, the promoter, 1 st exon, and gene body of GGT7 were hypomethylated in males and males had higher GGT7 expression. GGT7 is highly expressed in skeletal muscle and metabolises glutathione, which is a ubiquitous "master antioxidant" that contributes to cellular homeostasis [98].
In conclusion, we showed that the DNA methylation of hundreds of genes differs between male and female human skeletal muscle. We uncovered important biological factors underlying sex-specific skeletal muscle DNA methylation. Integration of the DNA methylome and transcriptome, as well as gene expression validation, identifies sex-specific genes associated with muscle metabolism and function. Uncovering the molecular basis of sex differences across different tissues will aid in the characterization of muscle phenotypes in health and disease. The effects of other upstream drivers on sex differences in the muscle methylome, such as nonmuscle cell type, the XY chromosomes, and genetic variants still need to be explored. Molecular mechanisms that display sex differences in skeletal muscle may help uncover novel targets for therapeutic interventions.

Datasets
We conducted a meta-analysis of three independent epigenome-wide association studies (EWAS) of sex including the Gene Skeletal Muscle Adaptive Response to Training (SMART) study from our laboratory [99], the Finland-United States Investigation of NIDDM Genetics (FUSION) study from the dbGAP repository (phs000867. v1.p1) [58], and the GSE38291 dataset from the Gene Expression Omnibus (GEO) platform [100]. Detailed participant characteristics, study design, muscle collection, data pre-processing, and data analysis specifications for each study are in Additional file 2: Table S1. Briefly, all studies performed biopsies on the vastus lateralis muscle, all participants were of Caucasian descent (except 1 individual of mixed Caucasian/aboriginal decent), and included either healthy or healthy and T2D individuals aged 18-80 years. The Gene SMART study was approved by the Victoria University Human Ethics Committee (HRE13-223), and written informed consent was obtained from each participant. NIH has approved our request [#96795-2] for the dataset general research use in the FUSION tissue biopsy study.

DNA extraction and methylation method-gene SMART study samples
Genomic DNA was extracted from the samples using the AllPrep DNA/RNA Mini Kit (Qiagen, 80204) following the user manual guidelines. Global DNA methylation profiling was generated with the Infinium MethylationE-PIC BeadChip Kit (Queensland University of Technology and Diagenode, Austria). The first set was analysed in 2017 and contained samples (n = 50) from 25 males which were scrambled on the chips to ensure that timepoint and age were not confounded with batch and chip positioning in the array. The second set was analysed in 2019 and contained samples (n = 80) from 20 males and 20 females which were scrambled on the chips to ensure that timepoint, sex, and age were not confounded with batch and chip positioning in the array. The genome-wide DNA methylation pattern was analysed with the Infinium MethylationEPIC BeadChip array.

Pre-processing
The pre-processing of DNA methylation data was performed according to the bioinformatics pipeline developed for the Bioconductor project [101]. Raw methylation data were pre-processed, filtered and normalized across samples. Probes that had a detection p value of > 0.01, located on X and Y chromosomes or crosshybridizing, or related to a SNP frequent in European populations, were removed. It is important to note that the list of cross-hybridizing probes was supplied manually [102] as the list supplied to the ChAMP package was outdated. Specifically, there are thousands of probes in the Illumina microarrays that cross-hybridize with the X-chromosome and may lead to false discovery of autosomal sex-associated DNA methylation [103]. The BMIQ algorithm was used to correct for the Infinium type I and type II probe bias. β-values were corrected for both batch and position in the batch using ComBat [104].

Statistical analysis
We adjusted each EWAS for bias and inflation using the empirical null distribution as implemented in BACON [105]. Inflation and bias in EWAS are caused by unmeasured technical and biological confounding, such as population substructure, batch effects, and cellular heterogeneity [106]. The inflation factor is higher when the expected number of true associations is high (as it is for age); it is also greater for studies with higher statistical power [105]. The results were consistent with the inflation factors and biases reported in an EWAS of age in blood [105]. Smaller cohorts were weighed less in the meta-analysis than bigger cohorts as the pooled estimates are based on standard error (which takes sample size into account). Results from the independent EWAS were combined using an inverse variance weighted metaanalysis with METAL [107]. We used METAL since it does not require all DNA methylation datasets to include every CpG site on the HumanMethylation arrays. Given that the FUSION and Gene SMART datasets were analysed with the EPIC array, while the GSE38291 dataset was analysed with the 27 K array, there are a substantial number of probes with data available only from the FUSION and Gene SMART studies. For robustness, we only included CpGs present in at least two of the three cohorts after pre-processing (633,645 CpGs). We used a fixed effects (as opposed to random effects) metaanalysis, assuming one true effect size of sex on DNA methylation, which is shared by all the included studies. Nevertheless, Cochran's Q-test for heterogeneity was performed to test whether effect sizes were homogeneous between studies (a heterogeneity index (I2) > 50% reflects heterogeneity between studies). We noted one sample from the FUSION cohort had a discordant sex prediction, using the getSex function from minfi [108], and reran analyses without that sample but obtained identical pathways and differentially methylated genes.
To identify DMPs, we used linear models as implemented in the limma package in R [109], using the participants' ID as a blocking variable to account for the repeated measures design (for twin (GSE38291) and duplicate samples (Gene SMART), using DuplicateCorrelation). The main sources of variability in methylation varied depending on the cohort and were included as independent variables in the linear model accordingly. For the Gene SMART study, the linear model was of the form:

Timepoint: before and after four weeks of highintensity interval training Analysis set: one set analysed in 2017 and one in 2019
For the FUSION study, the linear model was of the form: For the GSE38291 study, the linear model was of the form: All results were adjusted for multiple testing using the Benjamini and Hochberg correction [110], and all CpGs showing an FDR < 0.005 were considered significant [111]. DMRs were identified using the DMRcate package with default settings [112]. DMRs with Stouffer, Fisher, and harmonic mean of the individual component FDRs (HMFDR) statistics < 0.005 were deemed significant. Effect sizes are reported as mean differences in DNA methylation (%) between the sexes. PCAs were analysed using the package factoMiner, and were performed separately per cohort given that technical variability substantially contributes to variability in DNA methylation. The percentage reported in which sex explains the variability in DNA methylation at DMPs was determined by extracting the adjusted R 2 from the linear model: PC ~ sex; the PC which visually distinguished males and females in the PCA in each cohort was used in the linear model. To assess whether fibre type proportions differ between males and females in the Gene SMART and FUSION cohorts, we used a beta regression model [113] using the betareg package in R. We then included type I fibre ratio as a covariate in the linear models. To investigate whether circulating hormones is associated with DNA methylation of the sex-DMPs, we included each sex hormone as a covariate in a linear model, separately. One male from the DNAm ∼ sex * timepoint + age + analysis set DNAm ∼ sex + BMI + smoking status + oral glucose tolerance test status DNAm ∼ sex + age + diabetes status Gene SMART cohort had missing circulating hormone values; and one two females from the Gene SMART cohort had missing type I fibre proportions. For these three individuals, missing values were imputed with the mice package in R [114].
To annotate the CpGs to the genome (see "Availability of Data and Materials"), we integrated a comprehensive annotation of Illumina HumanMethylation arrays [50] with chromatin states from the Roadmap Epigenomics Project [42] and the latest GeneHancer information [115]. DMGs were determined by the annotations of the DMRs. For chromatin state enrichment, DMPs that were annotated to two differing chromatin states were removed for simplicity and because there were very few such DMPs. GSEA on KEGG and GO databases was performed on DMRs and DMPs using the goregion and gometh (gsameth for Reactome) functions in the missMethyl R package [116] 117. All analyses were performed using the R software version 4.0.3.
Enrichment of TFBSs among the identified DMPs was performed using the enrichment analysis tool in http:// unibi nd. uio. no/ which utilizes the runLOLA function of the R package LOLA [46]. The analysis type selected was enrichment with background; and background provided included all tested CpGs. Analysis using CpGs in DMRs versus DMPs yielded similar results. The outputted p value is reported; without applying an FDR correction since many TFBS datasets are not independent so the assumptions behind the FDR correction cannot be met. TFs from all tissues were included as only one TF, to date, has been validated among skeletal muscle tissues or cells.
To assess whether sex-DMPs were over-represented among CpGs which have been reported to be differentially methylated in male and female myoblasts and myotubes [21], we first overlapped the sex-DMPs with CpGs with sex-differential DNA methylation in myoblasts. We then randomly sampled the same number of CpGs which were differentially methylated in myoblasts from the total CpGs in the 450 K array, as this was the Illumina array used for the study, and overlapped those randomly sampled CpGs with the sex-DMPs. We performed this random sampling 10,000 times to obtain a p value signifying whether the amount of overlap between the two studies could have occurred by chance. We then repeated this same method for the myotubes.

Fibre types: meta-analysis and derivation from immunohistochemistry and RNA-seq
Myosin heavy chain is currently the best available marker for fibre typing [118]. Gene SMART muscle sections were frozen in optimum-cutting temperature (OCT) medium by holding the sample with OCT in liquid-nitrogen cooled isopentane until frozen. Samples were stored in − 80 °C until they were sectioned at 8 µM with a cryostat. The IHC protocol was performed as is described elsewhere [119]. Briefly, sections were blocked in 4% goat serum (Life Technologies). Primary antibodies BA-F8 (MHCI), BF-35 (MHCIIA), and 6H1 (MHCIIX) were purchased from DSHB, Iowa. Secondary antibodies goat anti-mouse IgG2b 350, goat anti-mouse IgG1 488, and goat anti-mouse IgM 555 were purchased from Invitrogen. Some samples were fixed in paraformaldehyde for other analyses, and for those samples, an antigen-retrieval protocol consisting of a 10-min incubation at 50 °C of Proteinase K diluted in MilliQ (1:1000) and subsequent 1-min washes was performed before the IHC. Imaging was performed on the Olympus BX51.
To determine type I fibre proportions in the FUSION cohort, we followed the validated method as performed by the original study on the FUSION cohort [58]. Briefly, we derived type I fibre proportions from the RNA-seq expression data (TPMs) for type I (MYH7), type IIA (MYH2), and type IIX (MYH1). We calculated the ratio of MYH7 out of the total. We then included this ratio in the + FT linear model.
To determine whether the inherent sex differences in fibre type proportions underlie the sex differences in DNA methylation, we separated the males and females of the Gene SMART and FUSION cohorts and performed a meta-analysis on the four groups (FUSION females, FUSION males, Gene SMART females, Gene SMART males). Given that females displayed significantly higher type I fibre proportions than males in both cohorts, we could not simply include type I fibre content in a linear model performed on a mixed sex cohort as two issues would arise: (1) collinearity of fibre type with sex, (2) differences in fibre type proportions may be a downstream effect of sex. Dividing the cohorts by sex, conducting a meta-analysis, and selecting the sex-biased DMPs and performing an FDR adjustment among those cites allowed us to address whether fibre type proportion is associated with DNA methylation at sex-biased DNA methylation loci. The fibre type meta-analysis was performed with the same methodology of the sex metaanalysis as described above.

Controlling for the female menstrual cycle
Various contraceptives have different dosage, administration patterns, and different hormone combinations causing variability in metabolism and gene expression [120]; therefore, only females not taking any form of hormonal contraceptives were recruited for the Gene SMART study. Furthermore, to minimise the effect of fluctuating hormone levels, females were required to have a regular menstrual cycle (27-35 days), and all samples were aimed to be collected during the early follicular phase (day 1-day 8 of cycle), with few exceptions due to logistics. Estrogen, progesterone, follicle stimulating hormone (FSH), and luteinizing hormone (LH) were measured in blood serum. Given the intricate fluctuations of the ovarian hormones, these four hormones were combined into a principal component analysis and the first two PCs, which explained the majority of the variability, were each added into the linear model.

Blood serum hormones
The hormone assays were completed in the accredited pathology laboratory at Monash Health, Australia. Estradiol (E2) and Progesterone assays are competitive binding immunoenzymatic assays performed on the Unicel DXI 800 system (Beckman Coulter). FSH assay is based on Microparticle Enzyme Immunoassay (MEIA) and is carried out on the Unicel DXI 800 system (Beckman Coulter). The LH and sex-hormone binding globulin (SHBG) assays were performed using a sequential twostep immunoenzymatic ("sandwich") assay carried out on a Unicel DXI 800 (Beckman Coulter). Testosterone was measured using the HPLC-tandem mass spectrometry method using a liquid sample extraction (AB Sciex Triple Quad 5500 liquid chromatography-tandem mass spectrometry). Free androgen index was calculated as (total testosterone × 100)/SHBG. Free testosterone was calculated by the Södergard free testosterone calculation (36).

Integration of DNA Methylation and Gene Expression
The Genotype-Tissue Expression (GTEx) Project sexbiased data were downloaded from the GTEx Portal on 08/26/2020 and filtered for skeletal muscle samples. The enrichment of DMG for GTEx DEGs was done by supplying the list of sex-biased genes to the gsameth function in the missMethyl R package [116,117], which performs a hypergeometric test and outputs a p value and which genes are significant within the supplied list, taking into account biases due to the number of CpG sites per gene and the number of genes per probe on the EPIC array. Therefore, caution should be taken when interpreting the number of DMPs reported per DMG. The analysis for direction of correlation between DNA methylation and gene expression was performed by randomly shuffling DNA methylation effect sizes and performing 10,000 permutations to assess how often a negative correlation occurs. This analysis was performed for both GTEx and FUSION transcriptome data and yielded similar results; data presented reflect results from the integration of differential methylation with differential GTEx expression. Significance reported for GTEx sex-biased genes is represented as the local false sign rate (lfsr) which is analogous to FDR [121]. GTEx effect sizes are represented as mash posterior effect sizes [121], in which positive values indicate male-biased genes and negative values indicate femalebiased genes. FUSION and Gene SMART gene expression significance statistics are represented as FDR and p value, respectively, and effect sizes as fold changes for both cohorts.

Validation of top genes with qPCR
Skeletal muscle previously stored at − 80 °C was lysed with the RLT buffer Plus buffer (Qiagen) and beta-mercaptoethanol using the TissueLyser II (Qiagen, Australia). DNA was extracted using the AllPrep DNA/RNA Mini Kit following the manufacturer guidelines (Qiagen, Australia). RNA yield and purity were assessed using the spectrophotometer (NanoDrop One, Thermo Fisher). RNA was reverse transcribed to cDNA using a commercially available iScript Reverse Transcriptase supermix (cat #1708841) and C1000 Touch Thermal Cycler (Bio-Rad, Hercules, CA, USA). Complementary DNA samples were stored at − 20 °C until further analysis. Quantitative real-time PCR was performed using SYBR Green Supermix (Bio-Rad, Hercules, CA) and gene-specific primers (listed in Additional file 12: Table S11). Primers were either adapted from existing literature or designed using Primer-BLAST (http:// www. ncbi. nlm. nih. gov/ tools/ primer-blast/) to include all splice variants, and were purchased from Integrated DNA Technologies. Ten microliter reactions comprised of SYBR, and optimized concentrations of forward and reverse primers (Additional file 12: Table S11 for primer conditions), nuclease-free water and 8 ng of cDNA were run in triplicate using an automated pipetting system (epMotion M5073, Eppendorf, Hamburg, Germany), with no-template negative controls on a 384-well plate in a thermo-cycler (QuantStudio 12 K Flex System, Thermo Fisher Scientific, Australia). Gene expression was normalized to the geometric mean expression of the two most stable housekeeping genes, as determined by Ref finder, TATAA-box binding protein (TPB), and 18 s rRNA, which did not differ between sexes (Additional file 12: Table S11). Data are presented as the fold change in males compared to females, using 2 −∆∆CT .

Significance
The importance of uncovering biological sex differences and their translation to physiology has become increasingly evident. Using a large-scale meta-analysis of three cohorts, we perform the first comparison of genome-wide skeletal muscle DNA methylation between males and females, and identify thousands of genes that display sexdifferential methylation. We then explore intrinsic biological factors that may be underlying the DNA methylation sex differences, such as fibre type proportions and sex hormones. Leveraging the GTEx database, we identify hundreds of genes with both sex-differential expression and DNA methylation in skeletal muscle. We further confirm the sex-biased genes with gene expression data from two cohorts included in the methylation meta-analysis. Our study integrates genome-wide sex-biased DNA methylation and expression in skeletal muscle, shedding light on distinct sex differences in skeletal muscle.