Skip to main content

Epigenome-wide association study on the plasma metabolome suggests self-regulation of the glycine and serine pathway through DNA methylation

Abstract

Background

The plasma metabolome reflects the physiological state of various biological processes and can serve as a proxy for disease risk. Plasma metabolite variation, influenced by genetic and epigenetic mechanisms, can also affect the cellular microenvironment and blood cell epigenetics. The interplay between the plasma metabolome and the blood cell epigenome remains elusive. In this study, we performed an epigenome-wide association study (EWAS) of 1183 plasma metabolites in 693 participants from the LifeLines-DEEP cohort and investigated the causal relationships in DNA methylation–metabolite associations using bidirectional Mendelian randomization and mediation analysis.

Results

After rigorously adjusting for potential confounders, including genetics, we identified five robust associations between two plasma metabolites (l-serine and glycine) and three CpG sites located in two independent genomic regions (cg14476101 and cg16246545 in PHGDH and cg02711608 in SLC1A5) at a false discovery rate of less than 0.05. Further analysis revealed a complex bidirectional relationship between plasma glycine/serine levels and DNA methylation. Moreover, we observed a strong mediating role of DNA methylation in the effect of glycine/serine on the expression of their metabolism/transport genes, with the proportion of the mediated effect ranging from 11.8 to 54.3%. This result was also replicated in an independent population-based cohort, the Rotterdam Study. To validate our findings, we conducted in vitro cell studies which confirmed the mediating role of DNA methylation in the regulation of PHGDH gene expression.

Conclusions

Our findings reveal a potential feedback mechanism in which glycine and serine regulate gene expression through DNA methylation.

Background

Metabolomics, the systematic profiling of small-molecule metabolites, has the potential to be a powerful tool for describing the molecular phenotype of a biological sample. The human plasma metabolome encompasses a diverse range of metabolites, including peptides, lipids, organic acids, steroids, pollutants, and drug metabolites, all of which reflect the complex metabolic state of an individual. Circulating metabolite levels are influenced by various endogenous or environmental factors, including genetics [1, 2], the gut microbiome [3, 4], and lifestyle factors such as diet [5] or smoking [6], and alterations in metabolite levels are often observed in disease contexts, especially in metabolic diseases such as obesity [7] or type 2 diabetes [8]. The dynamic nature of the plasma metabolome makes it an important intermediate phenotype that may hold disease risk factors or biomarkers linking genetic and environmental factors to disease endpoints.

Epigenetics are the heritable changes in gene expression that do not involve alterations in the DNA sequence. One of the most-studied epigenetic modifications is DNA methylation [9], which involves the addition of a methyl group to cytosine residues in CpG (5′-cytosine-phosphate-guanine-3′) dinucleotides. DNA methylation can affect gene transcription by altering the accessibility of chromatin to transcription factors and other regulatory proteins. DNA methylation is a dynamic and reversible process, and its pattern varies with age [10] and sex [11]. As an emerging field, epigenome-wide association studies (EWAS) have identified many CpG islands that are differentially methylated under various disease circumstances [12, 13]. EWAS on key circulating metabolites such as glucose [14, 15] or lipids [16, 17] have provided insights into the molecular pathways underlying metabolic diseases. Furthermore, EWAS on the untargeted plasma metabolome holds great potential as a route to uncovering the complex interplay between DNA methylation and numerous metabolites, as well as their interactions with genetic and environmental factors, thereby elucidating novel targetable functional mechanisms.

In contrast to genome-wide association studies (GWAS), where associated genetic variants are likely to be causal for the observed association between SNPs and phenotypes, causality cannot be as easily inferred in EWAS. The difference in DNA methylation patterns in blood cells can be both a cause for the changes in circulating metabolites, through differential regulation of gene expression, and a consequence of continuous exposure to the modified metabolomic profile in response to physiological and pathological alterations. Early EWAS of circulating molecules tended to assume that DNA methylation was responsible for changes in metabolites [18], but more recent studies using novel statistical methods such as Mendelian randomization (MR) have discovered that DNA methylation can also be the consequence of lifestyle changes or diseases [19,20,21], and metabolites might be an important intermediate phenotype mediating this relationship [22, 23]. Therefore, more rigorous and integrative approaches are needed to dissect the causal effects and molecular mechanisms underlying DNA methylation–metabolite associations.

In this study, we aimed to identify the variation in DNA methylation patterns in relation to the levels of 1183 plasma metabolites through an EWAS using data from 693 individuals in the Lifelines-DEEP cohort (LLD) [24]. In addition to discovering methylation–metabolite associations, we conducted bidirectional MR analysis and multivariable mediation analysis to examine the potential causal relationships between DNA methylation, gene expression, and plasma metabolite levels (Fig. 1). To further substantiate our findings, we conducted in vitro experimental validations to confirm the biological relevance of the identified methylation–expression–metabolite relationship.

Fig. 1
figure 1

Study design and workflow. EWAS Epigenome-wide association study, MS Mass spectrometry, SNP Single-nucleotide polymorphism

Methods

Study cohort

LLD [24] is a subcohort of the Lifelines study that includes 1539 participants. The Lifelines study is a multidisciplinary, prospective population-based cohort established in the north of the Netherlands. It includes over 167,000 participants and collects biological samples and information about exposome, lifestyle, social factors, and health-related behaviors. At time of enrollment, all Lifelines participants undergo anthropometric measurements and a structured cardiovascular and metabolic health evaluation. Fasting glucose, insulin, HbA1c, lipid profile, and complete blood count with differential white blood cell count were all measured using blood samples. For LLD, multiple molecular data levels are available in addition to the Lifelines phenotypes, with information on genome, epigenome, plasma metabolome, and lifestyle phenotypes available in > 1000 individuals. Prior to sample collection, all Lifelines participants signed an informed consent form. LLD has received institutional ethics review board approval (reference number M12.113965).

Data generation and preprocessing

Genotype

As previously shown, microarray genotype data was generated using the CytoSNP and ImmunoChip assays [24]. QC checks were performed using the Haplotype Reference Consortium (HRC v1.0) preparation checking tool (v4.2.3). VCF files were generated and uploaded to the Michigan Imputation Server [25]. Phasing and imputation were performed using SHAPEIT for phasing, population EUR, and mode Quality Control and Imputation. HRC v. R1 was used as a reference for all steps [26]. SNPs that had imputation quality r2 < 0.5, failed the Hardy–Weinberg equilibrium test (P < 1 × 10−6), had a call rate < 95%, or a minor allele frequency < 5% were removed. This resulted in genotype data for 5,324,731 SNPs (Genome Build hg19) for 1268 individuals.

Methylation

The generation of genome-wide DNA methylation data was described previously [27]. In brief, 500 ng of genomic DNA was bisulfite-converted using the EZ DNA Methylation kit (Zymo Research, Irvine, CA, USA) and hybridized on Illumina Infinium Human Methylation 450 K arrays according to the manufacturer’s protocols (Illumina, San Diego, CA, USA). The original IDAT files were generated using the Illumina iScan BeadChip scanner. For probe remapping and selection, 450 K probes were remapped to the human genome reference (hg19) [28], and all probes with a known SNP at the single-base extension site or CpG site and probes on the sex chromosomes were removed.

QC and normalization were performed using a custom pipeline based on the pipeline developed by Tost and Toulemat [29]. Data were extracted from the raw IDAT files using methylumi, incorrectly mapped probes were removed, and outlier samples were examined using the first two principal components (PCs) from principal component analysis. Background correction and probe-type normalization were performed as implemented in DASEN [30]. M-values were used for the DNA methylation data, and the following normalization steps were performed as described previously [27]: quantile normalization, probe-centering, and standardization of the values for each sample. Data for the methylation level of 439,088 CpG probes was generated for 820 individuals.

To correct for known and unknown sources of confounding variation in the methylation data, and to improve statistical power, we regressed out the first 22 PCs, which were previously determined to not be influenced by genetic information [27], from the methylation data. To account for associations between methylation and other molecular phenotypes that are confounded by shared genetic effect, we controlled for all the cis-methylation QTL (meQTL) effects using previously published cis-meQTLs for this dataset [27]. In brief, we first removed all the primary cis-meQTL effects and then performed cis-meQTL mapping on the same CpG sites significant in the first round. The cis-meQTL signal detected was then again regressed out from the methylation data, and we repeated this process iteratively until no more significant cis-meQTLs were detected. Corrected methylation data was available for 751 individuals.

Plasma metabolome

Untargeted metabolomic data from fasting plasma samples were analyzed using Flow-Injection Time-of-Flight Mass Spectrometry (FIA-ToF-MS) analysis at General Metabolics, Inc., Boston, USA [31, 32]. A platform consisting of an Agilent 1260 Infinity II LC pump coupled to a Gerstel MPS autosampler (CTC Analytics, Zwingen, Switzerland) and an Agilent 6550 Series Quadrupole TOF mass spectrometer (Agilent, Santa Clara, CA, USA) was used to perform the FIA-TOF-MS analysis.

Mass spectrometry data was processed and analyzed with MATLAB (Mathworks, Natick, MA, USA) using functions from the bioinformatics, statistics, database, and parallel computing toolboxes. Data on the intensity of each mass peak was generated, and peak-picking was performed for each sample from the total profile spectrum. Peaks of less than 5000 ion counts in the total profile spectrum were filtered out. The list of sample centroids was combined into a single matrix by binning the exact centroid mass within the tolerance given by the instrument resolution. The matrix contained the intensity of each mass peak in each analyzed sample. A weighted average of the data acquired from separate centroiding was used to recalculate an accurate common mass-to-charge ratio (m/z). The temporal drift of metabolite intensity was corrected. In total, 31,302 common ions were obtained and 1183 metabolites were annotated, covering 18 chemical superclasses according to the human metabolome database (HMDB) [33]. Metabolome data was available for 1368 participants.

To eliminate the influence of potential confounding factors on the metabolome data, we identified age, sex, smoking, and oral contraceptive use as the main sources of variation, as previously shown [31]. We then applied a linear regression model to the log-transformed abundances of the metabolites, including all the confounders as covariates, and used the residuals for further analysis. To account for the effects of genetic factors, we further regressed out the effects of primary metabolite QTLs (metaboQTLs) determined from previous work [31]. In total, we obtained adjusted metabolome data for 927 individuals.

Gene expression

Whole-blood RNA extraction, sequencing, and data processing were performed as described previously [34]. Briefly, RNA extracted from whole blood was processed using the Illumina TruSeq version 2 library preparation kit and sequenced on the Illumina HiSeq 2000 platform (Illumina). Paired-end read-alignment to the human genome was performed using STAR 2.3.0e aligner [35]. Gene expression quantification was performed using the Ensembl v.71 annotation tool and custom scripts developed previously [34]. Normalization of gene expression levels was performed using the following steps: the trimmed mean of M-values method [36], log-transformation, centering for each probe, and sample z-transformation. We additionally regressed out the effects of age, sex, and blood cell counts (erythrocytes, granulocytes, basophils, eosinophils, lymphocytes, monocytes, and thrombocytes) from the normalized expression data. In total, expression data was generated for 897 individuals.

Statistical analysis

Data correction for confounding factors

As described above, to correct for confounding factors in the methylation, metabolome, and gene expression data, we performed the following steps. For methylation data, we regressed out (1) the first 22 PCs that were not influenced by genetic information and (2) all the cis-meQTL effects using an iterative approach. For metabolome data, we controlled for the effects of age, sex, smoking, oral contraceptive use, and primary metaboQTLs. For gene expression data, we adjusted for age, sex, and blood cell counts. The resulting data was used for further analysis.

Methylation–metabolite association

The associations between normalized DNA methylation and normalized plasma metabolites were calculated using Spearman correlation. Ten rounds of permutations were performed to determine the FDR using a previously published Java-based pipeline [37]. An FDR < 0.05 was considered significant.

Sensitivity analysis in healthy subcohort

To address potential concerns regarding the impact of disease status on our observations, a sensitivity analysis was conducted within a healthy subcohort, which consisted of individuals free of a broad spectrum of conditions, including chronic obstructive pulmonary disease (COPD), bronchitis, diabetes, cataracts, stomach ulcers, inflammatory bowel diseases (IBD), irritable bowel syndrome (IBS), hepatitis, gallstones, kidney stones, chronic bladder infections, incontinence, epilepsy, blood clotting disorders, fibromyalgia, arthrosis, rheumatoid arthritis (RA), osteoporosis, neck or back hernia, psoriasis, chronic fatigue syndrome (CFS), depression, and panic disorder. This subcohort comprised 406 individuals out of a total of 693 participants, representing approximately 58.6% of the study population.

A linear regression model was applied for the analysis, designating CpG methylation levels as the dependent variable and plasma metabolite concentrations as the independent variable. The association results from both the full cohort and the delineated healthy subcohort were obtained.

Correlation with phenotype

The phenotype matrix consisted of 208 phenotypes, containing information on lifestyle, diet, disease status, medication use, and measurement of biomarkers such as fasting glucose and insulin. For the metabolites and CpG sites involved in the identified associations, we assessed the correlation between plasma metabolome, DNA methylome, mRNA transcriptome (of genes whose expression is known to be affected by the involved CpG site methylation based on previous work [27]), and phenotypes using Spearman correlation.

Additional analysis

For the most prominent phenotypes associated with at least one of the two metabolites (l-serine and glycine) and at least one of the three CpG sites (cg14476101, cg16246545, and cg02711608), we performed an additional analysis to exclude the possibility that they confounded the methylation–metabolite association. We employed linear regression (“lm” function from R package “stats”) for this analysis, and cell count variables and triglyceride levels were log-transformed before being added to the covariates. The results from the following three models were obtained:

$${\text{Model}}\;{1}\; \, \left( {{\text{raw }}\;{\text{model}}} \right):{\text{CpG}}\sim {\text{plasma }}\;{\text{metabolite}}$$
$${\text{Model}}\;{2}\; \, \left( {{\text{model}}\;{\text{ adjusted}}\;{\text{ for }}\;{\text{blood}}\;{\text{ cell }}\;{\text{counts}}} \right):{\text{CpG}}\sim {\text{plasma}}\;{\text{ metabolite}} + {\text{Leuco}} + {\text{Granulo}} + {\text{Lympho}} + {\text{Mono}}$$
$${\text{Model}}\;{3}\; \, \left( {{\text{model}}\;{\text{ adjusted}}\;{\text{ for}}\;{\text{ blood }}\;{\text{cell}}\;{\text{ counts }}\;{\text{and }}\;{\text{metabolic }}\;{\text{traits}}} \right):{\text{CpG}}\sim {\text{plasma}}\;{\text{ metabolite}} + {\text{Leuco}} + {\text{Granulo}} + {\text{Lympho}} + {\text{Mono}} + {\text{BMI}} + {\text{WHR}} + {\text{HDL}} + {\text{TG}} + {\text{Insulin}} + {\text{HbA1c}}$$

, where Leuco is leukocyte counts, Granulo is granulocyte counts, Lympho is lymphocyte counts, Mono is monocyte counts, BMI is body mass index, WHR is Waist-to-Hip Ratio, HDL is High-Density Lipoprotein, TG is triglyceride level, and HbA1c is Hemoglobin A1c.

MR analysis

To further infer causal relationships between DNA methylation and plasma metabolites in the identified associations, we carried out bidirectional MR analysis. Summary statistics for associations between SNPs and plasma metabolites were obtained from the IEU open GWAS project (https://gwas.mrcieu.ac.uk/), for glycine from the UK Biobank [38] (Field ID: 23462 glycine), and for serine from Shin et al. [1].

For the direction DNA methylation → plasma metabolites, we applied a two-sample MR approach where we retrieved significant cis-meQTLs for the CpGs of interest from the Framingham Heart Study [39] as instrumental variables and used summary statistics for the metabolite GWAS from the UK Biobank [38] and Shin et al. [1] as outcome data. For the direction plasma metabolites → DNA methylation, we used the previously mentioned metabolite-associated SNPs identified in LLD as instrumental variables. Since the full summary statistics from large-scale cis-meQTL studies were not available, we performed meQTL mapping for the CpGs of interest in 737 individuals from the LLD cohort using a previously published pipeline based on Spearman correlation [37] and used the results as outcome data.

The clumping of instrumental variables was performed using genotype data from 1000 Genomes (European samples). For SNPs with a linkage disequilibrium (LD) R2 > 0.001, only the SNP with the lowest p-value in each LD block was retained.

The IVW method was the primary MR method used. When the number of SNPs was insufficient (only one SNP was available as an instrument), we used the Wald ratio method. To test the validity of MR results, we performed a heterogeneity test, an MR-Egger pleiotropy test, and a leave-one-out analysis as sensitivity analyses. To confirm that the results were not confounded by BMI, we searched the OpenGWAS catalog for BMI-associated SNPs using the “ieugwasr” (v1.0.0) R package. This allowed us to ensure that our MR results were based on SNPs not associated with BMI or remained significant after removing SNPs associated with BMI at a significance level of p < 1 × 10−5. All MR and sensitivity analyses were performed using R package “TwoSampleMR” (v0.5.6). We removed the results with a heterogeneity p-value < 0.05, an Egger intercept p-value < 0.05, and with one p-value > 0.05 in the leave-one-out analysis (one SNP driving the significant MR result).

Mediation analysis

We performed mediation analysis on the significant methylation–metabolite associations to test the following two hypotheses: (1) the effect of plasma metabolite levels on gene expression is mediated by DNA methylation and (2) the effect of DNA methylation on plasma metabolites is mediated by gene expression.

For each methylation site, we selected the genes whose expression was affected by the methylation at this CpG site, based on the results of previous expression quantitative trait methylation (eQTM) analysis from the Biobank-BIOS Consortium [40]. We estimated the multivariable mediation effects using the regmed R package [41], which implements a regularized mediation analysis based on penalized structural equation modeling. The optimal lambda for the lasso (L1) penalty was selected by minimizing the Bayesian information criterion over a grid search. For each direction (Metabolite → Methylation → Expression and Methylation → Expression → Metabolite), we first used a lasso-penalized model to select the relevant variables. We then fitted a non-penalized model using the selected variables to estimate the effect sizes. The proportion of mediation for the mediators in each direction was determined using the effect sizes from the unpenalized model, calculated as the mediated effect (the product of x → mediators and mediators → y) divided by the total effect (sum of the mediated effect and the direct effect).

To ensure that Body Mass Index (BMI) does not confound the mediation relationship, we repeated the analysis, while controlling for BMI. This was achieved by conducting a linear regression of all variables against BMI and using the residuals in the mediation model.

Replication in the Rotterdam study

To validate the associations between methylation and metabolites, as well as the mediation effects we initially observed in the LLD cohort, we conducted replication analysis in the Rotterdam Study, a prospective, population-based cohort study conducted in Rotterdam, the Netherlands [42]. The Rotterdam Study aims to comprehensively investigate the prevalence, incidence, and risk factors associated with chronic diseases, with the goal of enhancing prevention and treatment strategies for the elderly population. As of late 2008, 14,926 men and women aged 45 years and older were enrolled in different phases of the study.

Genotype information was generated following the established protocol of the Rotterdam Study [42] and applied the same imputation and QC methods for the genotype data as described above for LLD. Methylation data and gene expression data were collected and processed using the same methods as above. Fasting plasma samples, which were collected in EDTA-coated tubes during the visit to the research facility, were quantified for small compounds using 1H-NMR technology [43, 44]. A broad range of metabolites, including amino acids, glycolysis-related metabolites, ketone bodies, fatty acids, routine lipids, and lipoprotein subclasses were simultaneously quantified using the Nightingale Health metabolomics platform (Helsinki, Finland). This dataset included glycine measurements, but l-serine was not measured.

The replication analysis included 504 participants from the Rotterdam Study, all of whom had complete data across the key layers of interest: genotype, methylation, gene expression, metabolites, and relevant phenotypes as covariates. The methylation-metabolites association was replicated on inverse rank normal transformed data. The mediation analysis was replicated using the same analytical approach shown above.

In vitro validation

To validate the relationship between metabolites, gene expression, and DNA methylation observed in cohort data analysis, in vitro experiments were conducted using HEK293T cells and peripheral blood mononuclear cells (PBMCs).

Cell culture

Buffy coats of healthy human donors were obtained from Sanquin Blood Supply, 9713GZ, Groningen, The Netherlands. All healthy donors provided written informed consent in accordance with the Medical Ethics Committee of Sanquin Blood Supply, and the study conformed to the principles of the Declaration of Helsinki. Human PBMCs were obtained from buffy coats by gradient separation in Ficoll-Paque plus™, according to the manufacturer’s recommendations. HEK293T cells were acquired from the American Type Culture Collection (ATCC, Manassas, VA, USA). HEK293T cells and PBMCs were cultured in various media conditions to assess the influence of serine and glycine on DNA methylation. Cells were incubated in serine- and glycine-deficient medium, serine-supplemented medium, or glycine-supplemented medium for 24 h. The serine- and glycine-deficient medium consisted of MEM (21090, ThermoFisher) with the addition of 1X MEM Vitamins (11120052, Gibco), 1X MEM Amino Acids Solution (11130051, Gibco), 10% dialyzed fetal bovine serum (FBS; ThermoFisher), 2 mM l-glutamine (ThermoFisher), and 19.44 mM D-glucose. For the supplemented media, either 400 μM serine (Sigma) or 400 μM glycine (Sigma) was added to the deficient medium. Following the 24-h incubation period, cells were harvested for subsequent analyses.

Gene expression

DNA and RNA were isolated from samples using the AllPrep DNA/RNA Mini Kit (Qiagen) following the manufacturer’s instructions. Complementary DNA (cDNA) was synthesized from RNA using the cDNA synthesis kit (Qiagen) according to the manufacturer’s instructions and subsequently amplified for the target genes PHGDH (forward primer: TCGCTCTGCCACCAAGGTGAC, reverse primer: CACAAGTGAGTTCTGCGGCAC) and SLC1A5 (forward primer: CGGTCGACCATATCTCCTTG, reverse primer: CTACATTGAGGACGGTACAGGA). Quantification of mRNA levels was performed using a dilution curve and expressed relative to 18S rRNA, normalized against control samples.

Statistical analyses for gene expression data were conducted using R software. Group differences were assessed using the Kruskal–Wallis H test, followed by post-hoc Conover pair-wise comparisons, using the “conover.test” R package.

Quantification of methylation and pyrosequencing

To quantify DNA methylation, 500 ng of genomic DNA from each sample was bisulfite-converted using the EpiTect Bisulfite Spin Column Kit (Qiagen) according to the manufacturer’s instructions. DNA amplification was performed using a mix containing 12.5 µL AmpliTaq Gold Fast PCR Master Mix (ThermoFisher, 4390941), 10.5 µL sterile water, 1 µL of a primer mix (forward and reverse primers listed in Table 1), and 1 µL bisulfite-converted DNA template. The PCR cycling conditions were as follows: initial denaturation at 95 °C for 15 min, followed by 45 cycles of 94 °C for 30 s, 58 °C for 30 s, and 72 °C for 30 s, with a final extension at 72 °C for 7 min. Post-amplification, the samples were assessed on a 2% agarose gel to verify the presence and quality of PCR products. Pyrosequencing was subsequently performed using the PyroMark Q24 system (Qiagen) with the sequencing primers shown in Table 1. All primers for the target regions were designed using Pyromark Assay Design software (Qiagen). The percentage of DNA methylation at individual CpG sites was calculated with the PyromarkQ24 analysis software (Qiagen), with quality control checks performed for each CpG site.

Table 1 PCR primer sequences accompanied by sequencing primer and the sequence to analyze

Statistical analyses for the percentage of DNA methylation at each CpG site were conducted using R software. Group differences were assessed using the Kruskal–Wallis H test, followed by post-hoc Conover pair-wise comparisons, using the “conover.test” R package.

Results

Association between DNA methylation and plasma metabolites

Using FIA-ToF-MS techniques, our untargeted metabolomics analysis quantified 1183 plasma metabolites from 18 chemical superclasses, covering a broad range of lipids and lipid-like molecules, organic acids and derivatives, organoheterocyclic compounds, phenylpropanoids and polyketides, and other metabolites. DNA methylation levels in genomic DNA from whole blood were quantified using the Illumina 450 K human methylation array, which identified 439,088 CpG sites after quality control (QC). The association between plasma metabolites and genome-wide DNA methylation was investigated in 693 individuals (58.6% women, mean age 45.93 years (SD = 13.16), mean BMI 25.37 (SD = 4.16), and 18.3% current smokers; see Table S1 for demographic and clinical characteristics of participants). As EWAS are prone to bias due to genetic variance and common external factors such as age, sex, and medication usage, we first regressed out the effects of the first 22 PCs from the methylome data to correct for technical factors such as batch effects. We also controlled for potential confounding of the metabolomics data from age, sex, smoking, and oral contraceptive use. Finally, we controlled for the effects of the corresponding quantitative trait loci (QTLs) from both datasets to remove any confounding effects of genetic variation.

After strict control of potential confounding factors, we ran an association analysis between 1183 metabolites and 439,088 CpG sites (> 500 million possible combinations), which identified five significant associations (false discovery rate (FDR)-adjusted p-value < 0.05, Table 2; see Table S2 for full association results) between two plasma metabolites (l-serine and glycine) and three CpG sites annotated to two genomic locations (cg14476101 and cg16246545 in PHGDH and cg02711608 in SLC1A5). Both l-serine and glycine were associated with the two CpG sites (cg14476101 and cg16246545) in PHGDH, and l-serine was also associated with CpG site cg02711608 in SLC1A5. This result remained significant in the subgroup analysis excluding individuals with diagnosed conditions (Table S3). The association between serine and cg16246545 is consistent with a previous finding [45]. The other four associations are novel.

Table 2 Genome-wide DNA methylation associations with plasma metabolites at FDR < 0.05

Correlation with phenotypes

To better understand the functional relevance and clinical implications of the associations identified, we investigated the correlation of plasma glycine, l-serine, the three CpG sites (cg14476101 and cg16246545 in PHGDH and cg02711608 in SLC1A5), and the expression of four genes previously identified as associated with methylation of these CpG sites (PHGDH, SLC1A5, C5AR2, and PNMAL2) with > 200 phenotypes, including anthropometric measurements, metabolic traits, dietary patterns, and disease status. We found that the most significant associations fell into two types of phenotypes: blood cell counts (leukocytes, granulocytes, lymphocytes, monocytes, etc.) and metabolic characteristics (BMI, systolic blood pressure, insulin levels, triglyceride levels, etc.) (Fig. 2, Table S4). To rule out that these phenotypes were potential confounders of the methylation–metabolite associations we identified, we carried out additional analyses adding them as covariates to the model and confirmed that the significant associations we identified were robust (Table S5).

Fig. 2
figure 2

Correlation between DNA methylation, plasma metabolites, gene expression and phenotypes. HbA1c hemoglobin A1c, LDL low-density lipoprotein, HDL high-density lipoprotein

MR analysis

To further interrogate the causal relationship between the two amino acids and the three CpG sites identified in the previous association analysis, we applied MR analysis. We used a two-sample MR approach to evaluate the impact of DNA methylation on plasma metabolites (Table S6) and found that the methylation of the CpG sites (cg14476101 and cg16246545) in PHGDH showed a significant causal effect on plasma serine levels (Betacg14476101 = 0.643, Pcg14476101 = 0.0001; Betacg16246545 = 0.828, Pcg16246545 = 0.0001). No such effect was detected for cg14476101 and cg16246545 on glycine (Pcg14476101 = 0.6, Pcg16246545 = 0.6) or for cg02711608 (annotated to SLC1A5) on glycine (PMR = 0.8) and serine (PMR = 0.95). Although the instrumental variables consisted of only one or two SNPs, these SNPs are all independent cis-meQTL for each CpG site, thus less prone to horizontal pleiotropy through other methylation sites. Additionally, these SNPs are not associated with confounders such as BMI, thereby supporting the reliability of our results.

To test the potential causal effect of plasma metabolites on DNA methylation, we performed MR with the inverse-variance weighted (IVW) method and conducted an extensive sensitivity analysis (Table S7). Although the MR estimates for the effect of glycine/serine on the methylation of the CpG sites (cg14476101 and cg16246545) in PHGDH were significant, they did not pass the heterogeneity test. Nor could we confirm the causal effect of glycine on the methylation of cg02711608 in SLC1A5 as it did not pass the leave-one-out sensitivity analysis, meaning the significant result was driven by a single SNP (rs1047891). After all the sensitivity analysis steps, we could identify a robust causal relationship of plasma serine level on the methylation of cg02711608 in SLC1A5 (Beta = 8.936, PMR = 2.36 × 10−6). This result remained significant even after excluding SNP rs715, which is associated (P = 3.10 × 10−11) with the potential confounder BMI.

Mediation analysis

Genomic DNA methylation regulates gene expression in various biological processes. In the methylation–metabolite associations we discovered, the three CpG sites involved are located in PHGDH and SLC1A5, and both are genes involved in the metabolism and transport of glycine and serine. These genes exhibit inverse expression patterns with respect to glycine and serine levels, yet the causality of this relationship is unclear. Do the changes in metabolite levels result from altered methylation and expression? Or do these metabolites feedback on gene expression via methylation? To address this question, we integrated mRNA transcriptome data and performed a mediation analysis to elucidate the interplay between DNA methylation, gene expression, and plasma metabolites. Based on previously published information from the Biobank-based Integrative Omics Studies (BIOS) Consortium [27], we selected genes whose expression was shown to be associated with the three CpG sites of interest (PHGDH for cg14476101 and cg16246545 and SLC1A5, C5AR2, PNMAL2 for cg02711608). We then conducted mediation analysis in a subset of 676 LLD participants for whom all three omics data layers were available.

Here we observed that DNA methylation plays a significant mediating role in the effect of plasma metabolites on gene expression for PHGDH, SLC1A5, C5AR2, and PNMAL2 (Fig. 3a, c; Tables S8 and 10), with the proportion of the mediated effect for each gene ranging from 18.8 to 67.1%. Conversely, gene expression also partially explained the effect of DNA methylation on plasma metabolites, but the proportion of mediation was much lower, ranging from 2.3 to 4.9% (Fig. 3b, d, Tables S9 and 11). As BMI has been identified as a potential influencing factor on both glycine concentration [46] and CpG methylation levels in these sites [20], we repeated the analysis after correcting for BMI, finding that the observed effects remained significant (Tables S12 and 13). These results suggest that, while there is a complex bidirectional relationship between DNA methylation and plasma metabolites, metabolite levels do regulate the gene expression of genes involved in their metabolism and transport through DNA methylation.

Fig. 3
figure 3

Mediation analysis showed a strong mediating role for DNA methylation in the effect of plasma metabolites on gene expression. a Network plot for the causal direction Metabolite → Methylation → Expression in the LLD cohort. Numbers indicate effect size. b Network plot for the reverse causal direction Methylation → Expression → Metabolite in the LLD cohort. Arrow colors indicate causality direction—blue for positive, red for negative, with dashed lines for non-significant associations. Line thickness and numbers on lines indicate the strength of the effect size. c Bidirectional mediation between glycine/l-serine, methylation of CpG sites within PHGDH, and PHGDH expression. d Bidirectional mediation between glycine/l-serine, methylation of CpG site within SLC1A5, and SLC1A5 expression. c, d β: effect size of the causal direction (Metabolite → Methylation → Expression). β rev.: effect size of the reverse causation (Methylation → Expression → Metabolite). Percentages indicate the proportion of mediation. e Network plot for the causal direction Metabolite → Methylation → Expression in the Rotterdam Study cohort. Numbers indicate effect size. f Network plot for the reverse causal direction Methylation → Expression → Metabolite in the Rotterdam Study cohort. Arrow colors indicate causality direction—blue for positive, red for negative, with dashed lines for non-significant associations. Line thickness and numbers on lines indicate the strength of the effect size. PHGDH phosphoglycerate dehydrogenase, SLC1A5 Solute Carrier Family 1 Member 5, PNMAL2 Paraneoplastic Ma Antigen Family Like 2, C5AR2 Complement Component 5a Receptor 2

To ascertain the reliability of our initial results, we conducted a replication study within an independent cohort—The Rotterdam Study. We verified the significant association between glycine and specific CpG sites (cg14476101, cg16246545, cg02711608) (Table S14). Furthermore, we observed a more pronounced mediating effect of gene methylation on the modulatory role of glycine in gene expression, and found no evidence that gene expression mediated the impact of methylation on metabolites. (Fig. 3e, f, Table S15).

In vitro validation

We then conducted preliminary in vitro validation experiments to explore the mechanistic link between glycine/serine levels and gene expression regulation. PBMCs or HEK293T cells were exposed to amino acid-depleted medium (control), serine-supplemented medium, or glycine-supplemented medium for 24 h (Fig. 4). While a definitive conclusion requires further investigation, a trend emerged within the 24-h timeframe. Supplementation with either serine or glycine appeared to decrease the expression of PHGDH and SLC1A5, along with inducing methylation in cg16246545 (located in PHGDH) and cg02711608 (located in SLC1A5). Although not all changes reached statistical significance, these findings offer preliminary support for our hypothesis that glycine and serine might regulate expression of their own metabolic and transport genes through methylation modulation.

Fig. 4
figure 4

Effect of glycine and serine supplementation on PHGDH and SLC1A5 mRNA levels and DNA methylation in PBMC and HEK293T cells. a, b Relative mRNA levels of PHGDH and SLC1A5 in PBMC (a) and HEK293T cells (b) in amino acid-depleted (AA Depleted) media, as well as media supplemented with glycine (+ Gly) and serine (+ Ser). The values are normalized to the AA Depleted control. c, d DNA methylation percentages at CpG sites cg14476101, cg16246545, and cg02711608 in PBMC (c) and HEK293T cells (d) under the same treatments, measured by pyrosequencing. Dots represent replicates, and horizontal lines indicate means. P-values from the Conover test are shown

Discussion

We performed an EWAS on 1183 plasma metabolites and identified five significant associations between glycine and l-serine and the methylation of CpG sites in amino acid metabolism (PHGDH) and transport (SLC1A5) genes. Correlation analysis with phenotypes indicates that these associations might play an important role in cardiometabolic diseases. Using MR and mediation analyses, we further depict the complex bidirectional relationship involved in the associations we identified. Our findings reveal a potential mechanism, whereby glycine and serine regulate gene expression through the mediation of DNA methylation, a mechanism we validated with preliminary evidence from our in vitro experiments.

Among the five significant associations we identified, both glycine and serine were associated with the two correlated CpG sites in PHGDH (cg14476101 and cg16246545), and l-serine was additionally associated with the CpG site in SLC1A5 (cg02711608). While the association between serine and PHGDH methylation has been reported before [45], the other two associations are novel. Glycine and l-serine are non-essential amino acids and are highly interconvertible in cells. In addition to serving as the building blocks for proteins, they are important neurotransmitters and together provide the essential precursors for the anabolic pathways of glutathione, nucleotides, lipids, and other metabolites [47, 48]. PHGDH encodes human d-3-phosphoglycerate dehydrogenase, an enzyme that catalyzes the conversion of 3-phosphoglycerate to 3-phosphohydroxypyruvate. This conversion is the first reaction of the three-step process for the synthesis of l-serine via 3-phosphoglycerate, and it is also the only rate-limiting step in the de novo l-serine and glycine biosynthesis pathway [49]. Mutations in PHGDH could cause enzyme deficiency and lead to impairment of l-serine biosynthesis [50]. Solute Carrier Family 1 Member 5 (SLC1A5), also known as Alanine, Serine, Cysteine Transporter 2 (AST2), is a sodium-dependent amino acid transporter localized in the plasma membrane. It has broad substrate-specificity and accepts all neutral amino acids as substrates, including glycine and serine [51, 52]. In all five cases we identified, circulating amino acid levels were positively associated with the DNA methylation levels of the CpG sites, and the methylation of these three CpG sites was negatively associated with the expression levels of the corresponding genes (Fig. 5).

Fig. 5
figure 5

Putative mechanisms of serine and glycine suppression of PHGDH and SLC1A5 gene expression via CpG site methylation within target genes. Schematic of the hypothesized mechanisms by which serine and glycine suppress expression of the PHGDH (an enzyme in the serine and glycine biosynthesis pathway) and SLC1A5 (a neutral amino acid transporter) genes. The process is mediated through CpG site methylation within the target genes: at cg14476101 and cg16246545 in PHGDH and at cg02711608 in SLC1A5. This methylation event leads to downregulation of gene expression. PHGDH Phosphoglycerate dehydrogenase, PSAT1 Phosphoserine aminotransferase 1, SHMT1/2 Serine hydroxymethyltransferase 1/2, SLC1A5 Solute Carrier Family 1 Member 5

We demonstrated that the amino acids and CpG sites involved in the identified associations were also significantly associated with many cardiometabolic traits, including BMI, blood pressure, insulin levels, and triglyceride levels, highly consistent with previous studies. Decreased levels of circulating glycine and serine have been observed in many diseases, including obesity [7], type 2 diabetes [53,54,55], coronary artery diseases [56, 57], and nonalcoholic fatty liver disease [53]. Additionally, in earlier EWAS, the DNA methylation levels of all three CpG sites were found to be negatively correlated with BMI [19, 20], blood pressure [58, 59], hepatic fat [60], triglycerides level [61], and gamma-glutamyl transferase level [62]. By combining methylation and metabolite data, our findings provide a connection between these previously identified associations and yield potential insights into the underlying regulatory pathways.

The pathways involved in biological processes are often bidirectional. For instance, using a polygenic risk score approach, Hsu et al. [46] identified a complex bidirectional cause-effect relationship between glycine and BMI. Similarly, Wahl et al. [20] examined the causal relationship between DNA methylation and BMI and suggested that altered methylation levels at CpG sites including cg14476101 (in PHGDH) and cg02711608 (in SLC1A5) might in part be the consequence of metabolic changes associated with BMI. By applying MR and mediation analysis, our study likewise reveals a bidirectional relationship between glycine/serine and DNA methylation levels. In addition to confirming previous studies, we point out a potential mechanism by which these amino acids could regulate their own synthesis and metabolism by changing relevant gene expression through DNA methylation.

A number of in vitro studies, including our own results, support that l-serine and glycine could contribute to DNA methylation. The de novo l-serine and glycine biosynthesis pathway provides a single-carbon unit that fuels one-carbon metabolism [63]. One-carbon metabolism includes the folate and methionine cycles, which support the synthesis of S-adenosylmethionine, the primary methyl donor for histone, DNA, and RNA methylation [64]. In our study, we conducted a preliminary in vitro experiment to verify this mechanism. We exposed PBMCs and HEK293T cells to amino acid-depleted or supplemented media for 24 h. HEK293T cells are an immortalized cancer cell line, whereas PBMCs represent a heterogeneous population of immune cells found in peripheral blood, making them more comparable to the cells present in actual blood. We observed significantly lower DNA methylation percentages in HEK293T cells compared to PBMCs for the genes of interest. Specifically, PBMCs displayed methylation levels ranging from approximately 65–90% for PHGDH and 20% for SLC1A5, while HEK293T cells exhibited around 5% methylation for these genes. These findings align with our expectations: rapidly dividing cancer cells, like HEK293T, likely have a greater demand for amino acids, potentially resulting in lower global methylation levels for efficient gene expression. Conversely, primary cells, with slower proliferation rates, may prioritize the maintenance of methylation patterns.

Although we observed trends of increased methylation at specific CpG sites and decreased expression of PHGDH and SLC1A5 in some groups, not all comparisons reached statistical significance. The observed subtle methylation differences between groups may reflect the role of CpG methylation in fine-tuning gene expression posttranscriptionally. While small methylation changes can lead to significant downstream effects, the 24-h supplementation timeframe and the use of amino acid concentrations equivalent to normal culture media (400 μM) might not have been sufficient to induce robust changes in DNA methylation patterns, particularly since PBMCs exhibit a constitutively high baseline methylation level at CpG sites. Furthermore, PBMCs are a heterogeneous cell population primarily composed of lymphocytes (70–90%) and monocytes (10–20%) [65], although they offer a degree of physiological relevance to the actual blood cell population, it is important to acknowledge the potential presence of lineage-specific methylation patterns. Therefore, experimental results derived from PBMCs might not fully encapsulate the methylation dynamics observed in the methylome data from the cohort analysis. While our preliminary in vitro data aligns with the proposed mechanism, further studies with larger sample sizes, optimized conditions (including extended amino acid exposure and potentially higher concentrations), and potentially using cell populations that more closely resemble the original cohort are needed to strengthen these findings.

EWAS results are often confounded by numerous factors. Petersen et al. [45] conducted the first EWAS on the blood metabolome in 2014, investigating the association between DNA methylation and 649 serum metabolites in participants of the Cooperative Health Research in the Augsburg Region (KORA) cohort. They found that most of the DNA methylation–metabolite associations were driven by either genetic variants or common environmental and lifestyle factors. Our study is the first to comprehensively examine the relationship between DNA methylation and plasma metabolome on the genome-wide scale, while strictly controlling for potential confounders. However, as with the majority of EWAS, the design of the current study also has limitations. First, the associations were identified in a single population-based cohort, making the generalizability of our results to other populations, especially disease populations, uncertain. Future studies in other cohorts with a larger sample size are required to replicate our findings. Another limitation, which is one of the major challenges of most current EWAS, is that DNA methylation was analyzed in DNA extracted from peripheral blood cells even though DNA methylation is known to be tissue-specific. Assessing the methylation levels in metabolism-active organs such as the liver, kidney, and adipose tissues, while more difficult, might be more relevant for understanding the relationship of methylation with metabolites. Nevertheless, one previous study has shown that the CpG methylation in blood cells could serve as a proxy for changes that occur at a systemic level [21]. As the most readily available human tissue, blood provides a relatively direct route to translate discoveries into clinical applications, e.g., through the use of specific CpG sites as biomarkers for diagnosis or intervention or even as therapeutic targets.

Conclusions

Overall, this study revealed putative causal relationships between glycine/serine and DNA methylation by integrating the DNA methylome, mRNA transcriptome, and metabolome from whole blood/plasma. Our findings suggest that plasma glycine and l-serine levels may be regulated in a feedback loop by affecting the methylation of CpG sites in genes involved in glycine/serine metabolism and transport pathways, providing deeper insights into the underlying molecular mechanisms bridging previously identified associations.

Availability of data and materials

The methylation and gene expression data from Lifelines Deep and Rotterdam Study that were utilized in this study are part of the Biobank-based Integrative Omics Studies (BIOS) Consortium and are accessible through the European Genome-phenome Archive (EGA) under the accession code EGAC00001000277, with the corresponding Study ID EGAS00001001077. The metabolomics data from Lifelines Deep are available under the accession code EGAD00001006953, with the Study ID EGAS00001001704.

Abbreviations

BMI:

Body Mass Index

CpG:

Cytosine-phosphate-guanine

DNA:

Deoxyribonucleic acid

eQTM:

Expression quantitative trait methylation

EWAS:

Epigenome-wide association study

FDR:

False discovery rate

FIA-ToF-MS:

Flow-injection time-of-flight mass spectrometry

GWAS:

Genome-wide association study

HbA1c:

Hemoglobin A1c

HDL:

High-density lipoprotein

HRC:

Haplotype reference consortium

HMDB:

Human metabolome database

HRC:

Haplotype reference consortium

IVW:

Inverse-variance weighted

KORA:

Cooperative Health Research in the Augsburg Region

LD:

Linkage disequilibrium

LDL:

Low-density lipoprotein

LLD:

Lifelines-DEEP cohort

meQTL:

Methylation quantitative trait loci

metaboQTLs:

Metabolite quantitative trait loci

MR:

Mendelian randomization

MS:

Mass spectrometry

PBMCs:

Peripheral blood mononuclear cells

PCs:

Principal components

PHGDH:

Phosphoglycerate dehydrogenase

PNMAL2:

Paraneoplastic Ma Antigen Family Like 2

QC:

Quality control

QTL:

Quantitative trait loci

SHMT1/2:

Serine hydroxymethyltransferase 1/2

SLC1A5:

Solute carrier family 1 member 5

SNP:

Single-nucleotide polymorphism

TG:

Triglyceride

VCF:

Variant call format

References

  1. Shin S-Y, Fauman EB, Petersen A-K, Krumsiek J, Santos R, Huang J, et al. An atlas of genetic influences on human blood metabolites. Nat Genet. 2014;46:543–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Long T, Hicks M, Yu H-C, Biggs WH, Kirkness EF, Menni C, et al. Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat Genet. 2017;49:568–78.

    Article  CAS  PubMed  Google Scholar 

  3. Wikoff WR, Anfora AT, Liu J, Schultz PG, Lesley SA, Peters EC, et al. Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites. Proc Natl Acad Sci U S A. 2009;106:3698–703.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Asnicar F, Berry SE, Valdes AM, Nguyen LH, Piccinno G, Drew DA, et al. Microbiome connections with host metabolism and habitual diet from 1098 deeply phenotyped individuals. Nat Med. 2021;27:321–32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Playdon MC, Sampson JN, Cross AJ, Sinha R, Guertin KA, Moy KA, et al. Comparing metabolite profiles of habitual diet in serum and urine. Am J Clin Nutr. 2016;104:776–89.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Xu T, Holzapfel C, Dong X, Bader E, Yu Z, Prehn C, et al. Effects of smoking and smoking cessation on human serum metabolite profile: results from the KORA cohort study. BMC Med. 2013;11:60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Ho JE, Larson MG, Ghorbani A, Cheng S, Chen M-H, Keyes M, et al. Metabolomic profiles of body mass index in the Framingham heart study reveal distinct cardiometabolic phenotypes. PLoS ONE. 2016;11:e0148361.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Wang TJ, Larson MG, Vasan RS, Cheng S, Rhee EP, McCabe E, et al. Metabolite profiles and the risk of developing diabetes. Nat Med. 2011;17:448–53.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Villicaña S, Bell JT. Genetic impacts on DNA methylation: research findings and future perspectives. Genome Biol. 2021;22:127.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14:R115.

    Article  PubMed  PubMed Central  Google Scholar 

  11. McCarthy NS, Melton PE, Cadby G, Yazar S, Franchina M, Moses EK, et al. Meta-analysis of human methylation data for evidence of sex-specific autosomal patterns. BMC Genom. 2014;15:981.

    Article  Google Scholar 

  12. Fraszczyk E, Spijkerman AMW, Zhang Y, Brandmaier S, Day FR, Zhou L, et al. Epigenome-wide association study of incident type 2 diabetes: a meta-analysis of five prospective European cohorts. Diabetologia. 2022;65:763–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Nikpay M, Ravati S, Dent R, McPherson R. Epigenome-wide study identified methylation sites associated with the risk of obesity. Nutrients. 2021;13:1984.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Kriebel J, Herder C, Rathmann W, Wahl S, Kunze S, Molnos S, et al. Association between DNA methylation in whole blood and measures of glucose metabolism: KORA F4 study. PLoS ONE. 2016;11:e0152314.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Wang Z, Peng H, Gao W, Cao W, Lv J, Yu C, et al. Blood DNA methylation markers associated with type 2 diabetes, fasting glucose, and HbA1c levels: an epigenome-wide association study in 316 adult twin pairs. Genomics. 2021;113:4206–13.

    Article  CAS  PubMed  Google Scholar 

  16. Braun KVE, Dhana K, de Vries PS, Voortman T, van Meurs JBJ, Uitterlinden AG, et al. Epigenome-wide association study (EWAS) on lipids: the Rotterdam Study. Clin Epigenet. 2017;9:15.

    Article  Google Scholar 

  17. Gomez-Alonso MDC, Kretschmer A, Wilson R, Pfeiffer L, Karhunen V, Seppälä I, et al. DNA methylation and lipid metabolism: an EWAS of 226 metabolic measures. Clin Epigenet. 2021;13:7.

    Article  CAS  Google Scholar 

  18. Pfeiffer L, Wahl S, Pilling LC, Reischl E, Sandling JK, Kunze S, et al. DNA methylation of lipid-related genes affects blood lipid levels. Circ Cardiovasc Genet. 2015;8:334–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Mendelson MM, Marioni RE, Joehanes R, Liu C, Hedman ÅK, Aslibekyan S, et al. Association of body mass index with DNA methylation and gene expression in blood cells and relations to cardiometabolic disease: a mendelian randomization approach. PLoS Med. 2017;14:e1002215.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Wahl S, Drong A, Lehne B, Loh M, Scott WR, Kunze S, et al. Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature. 2017;541:81–6.

    Article  CAS  PubMed  Google Scholar 

  21. Reed ZE, Suderman MJ, Relton CL, Davis OSP, Hemani G. The association of DNA methylation with body mass index: distinguishing between predictors and biomarkers. Clin Epigenet. 2020;12:50.

    Article  CAS  Google Scholar 

  22. Dekkers KF, van Iterson M, Slieker RC, Moed MH, Bonder MJ, van Galen M, et al. Blood lipids influence DNA methylation in circulating cells. Genome Biol. 2016;17:138.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Sayols-Baixeras S, Tiwari HK, Aslibekyan SW. Disentangling associations between DNA methylation and blood lipids: a Mendelian randomization approach. BMC Proc. 2018;12:23.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Tigchelaar EF, Zhernakova A, Dekens JAM, Hermes G, Baranska A, Mujagic Z, et al. Cohort profile: LifeLines DEEP, a prospective, general population cohort study in the northern Netherlands: study design and baseline characteristics. BMJ Open. 2015;5:e006772.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48:1284–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48:1279–83.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Bonder MJ, Luijk R, Zhernakova DV, Moed M, Deelen P, Vermaat M, et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat Genet. 2016;49:131–8.

    Article  PubMed  Google Scholar 

  28. Bonder MJ, Kasela S, Kals M, Tamm R, Lokk K, Barragan I, et al. Genetic and epigenetic regulation of gene expression in fetal and adult human livers. BMC Genom. 2014;15:860.

    Article  Google Scholar 

  29. Touleimat N, Tost J. Complete pipeline for Infinium® Human Methylation 450 K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics. 2012;4:325–41.

    Article  CAS  PubMed  Google Scholar 

  30. Pidsley R, Wong CCY, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450 K methylation array data. BMC Genom. 2013;14:293.

    Article  CAS  Google Scholar 

  31. Chen L, Zhernakova DV, Kurilshikov A, Andreu-Sánchez S, Wang D, Augustijn HE, et al. Influence of the microbiome, diet and genetics on inter-individual variation in the human plasma metabolome. Nat Med. 2022;28:2333–43.

    Article  CAS  PubMed  Google Scholar 

  32. Fuhrer T, Heer D, Begemann B, Zamboni N. High-throughput, accurate mass metabolome profiling of cellular extracts by flow injection–time-of-flight mass spectrometry. Anal Chem. 2011;83:7074–80.

    Article  CAS  PubMed  Google Scholar 

  33. Wishart DS, Feunang YD, Marcu A, Guo AC, Liang K, Vázquez-Fresno R, et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 2017;46:D608–17.

    Article  Google Scholar 

  34. Zhernakova DV, Deelen P, Vermaat M, van Iterson M, van Galen M, Arindrarto W, et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat Genet. 2016;49:139–45.

    Article  PubMed  Google Scholar 

  35. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinforma Oxf Engl. 2013;29:15–21.

    Article  CAS  Google Scholar 

  36. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Westra H-J, Peters MJ, Esko T, Yaghootkar H, Schurmann C, Kettunen J, et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat Genet. 2013;45:1238–43.

    Article  CAS  PubMed  Google Scholar 

  38. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779.

    Article  PubMed  Google Scholar 

  39. Huan T, Joehanes R, Song C, Peng F, Guo Y, Mendelson M, et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat Commun. 2019;10:4267.

    Article  PubMed  Google Scholar 

  40. Li S, Qi C, Deelen P, Boulogne F, de Klein N, Koppelman GH, et al. Integration of public DNA methylation and expression networks via eQTMs improves prediction of functional gene–gene associations. bioRxiv. 2021;11:1004958. https://doi.org/10.1101/2021.12.17.473125.

    Article  Google Scholar 

  41. Schaid DJ, Dikilitas O, Sinnwell JP, Kullo IJ. Penalized mediation models for multivariate data. Genet Epidemiol. 2022;46:32–50.

    Article  PubMed  Google Scholar 

  42. Ikram MA, Brusselle G, Ghanbari M, Goedegebure A, Ikram MK, Kavousi M, et al. Objectives, design and main findings until 2020 from the Rotterdam study. Eur J Epidemiol. 2020;35:483–517.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Soininen P, Kangas AJ, Würtz P, Tukiainen T, Tynkkynen T, Laatikainen R, et al. High-throughput serum NMR metabonomics for cost-effective holistic studies on systemic metabolism. Analyst. 2009;134:1781.

    Article  CAS  PubMed  Google Scholar 

  44. Soininen P, Kangas AJ, Würtz P, Suna T, Ala-Korpela M. Quantitative serum nuclear magnetic resonance metabolomics in cardiovascular epidemiology and genetics. Circ Cardiovasc Genet. 2015;8:192–206.

    Article  CAS  PubMed  Google Scholar 

  45. Petersen A-K, Zeilinger S, Kastenmüller G, Römisch-Margl W, Brugger M, Peters A, et al. Epigenetics meets metabolomics: an epigenome-wide association study with blood serum metabolic traits. Hum Mol Genet. 2014;23:534–45.

    Article  CAS  PubMed  Google Scholar 

  46. Hsu Y-HH, Astley CM, Cole JB, Vedantam S, Mercader JM, Metspalu A, et al. Integrating untargeted metabolomics, genetically informed causal inference, and pathway enrichment to define the obesity metabolome. Int J Obes. 2020;44:1596–606.

    Article  CAS  Google Scholar 

  47. Holeček M. Serine metabolism in health and disease and as a conditionally essential amino acid. Nutrients. 2022;14:1987.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Alves A, Bassot A, Bulteau A-L, Pirola L, Morio B. Glycine metabolism and its alterations in obesity and metabolic diseases. Nutrients. 2019;11:1356.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Zhao X, Fu J, Du J, Xu W. The role of D-3-phosphoglycerate dehydrogenase in cancer. Int J Biol Sci. 2020;16:1495–506.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Klomp LW, de Koning TJ, Malingré HE, van Beurden EA, Brink M, Opdam FL, et al. Molecular characterization of 3-phosphoglycerate dehydrogenase deficiency–a neurometabolic disorder associated with reduced L-serine biosynthesis. Am J Hum Genet. 2000;67:1389–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Scalise M, Pochini L, Console L, Losso MA, Indiveri C. The human SLC1A5 (ASCT2) amino acid transporter: from function to structure and role in cell biology. Front Cell Dev Biol. 2018;6:96–96.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Scopelliti AJ, Font J, Vandenberg RJ, Boudker O, Ryan RM. Structural characterisation reveals insights into substrate recognition by the glutamine transporter ASCT2/SLC1A5. Nat Commun. 2018;9:38.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Palmer ND, Stevens RD, Antinozzi PA, Anderson A, Bergman RN, Wagenknecht LE, et al. Metabolomic profile associated with insulin resistance and conversion to diabetes in the Insulin Resistance Atherosclerosis Study. J Clin Endocrinol Metab. 2015;100:E463–8.

    Article  CAS  PubMed  Google Scholar 

  54. Guasch-Ferré M, Hruby A, Toledo E, Clish CB, Martínez-González MA, Salas-Salvadó J, et al. Metabolomics in prediabetes and diabetes: a systematic review and meta-analysis. Diabetes Care. 2016;39:833–46.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Merino J, Leong A, Liu C-T, Porneala B, Walford GA, von Grotthuss M, et al. Metabolomics insights into early type 2 diabetes pathogenesis and detection in individuals with normal fasting glucose. Diabetologia. 2018;61:1315–24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Ding Y, Svingen GFT, Pedersen ER, Gregory JF, Ueland PM, Tell GS, et al. Plasma glycine and risk of acute myocardial infarction in patients with suspected stable angina pectoris. J Am Heart Assoc. 2015;5:e002621.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Ottosson F, Smith E, Melander O, Fernandez C. Altered asparagine and glutamate homeostasis precede coronary artery disease and type 2 diabetes. J Clin Endocrinol Metab. 2018;103:3060–9.

    Article  PubMed  Google Scholar 

  58. Huang Y, Ollikainen M, Muniandy M, Zhang T, van Dongen J, Hao G, et al. Identification, heritability, and relation with gene expression of novel DNA methylation loci for blood pressure. Hypertens Dallas Tex. 1979;2020(76):195–205.

    Google Scholar 

  59. Richard MA, Huan T, Ligthart S, Gondalia R, Jhun MA, Brody JA, et al. DNA methylation analysis identifies loci for blood pressure regulation. Am J Hum Genet. 2017;101:888–902.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Ma J, Nano J, Ding J, Zheng Y, Hennein R, Liu C, et al. A peripheral blood DNA methylation signature of hepatic fat reveals a potential causal pathway for nonalcoholic fatty liver disease. Diabetes. 2019;68:1073–83.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Hedman ÅK, Mendelson MM, Marioni RE, Gustafsson S, Joehanes R, Irvin MR, et al. Epigenetic patterns in blood associated with lipid traits predict incident coronary heart disease events and are enriched for results from genome-wide association studies. Circ Cardiovasc Genet. 2017;10:e001487.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Nano J, Ghanbari M, Wang W, de Vries PS, Dhana K, Muka T, et al. Epigenome-wide association study identifies methylation sites associated with liver enzymes and hepatic steatosis. Gastroenterology. 2017;153:1096–106.

    Article  CAS  PubMed  Google Scholar 

  63. Vazquez A, Markert EK, Oltvai ZN. Serine biosynthesis with one carbon catabolism and the glycine cleavage system represents a novel pathway for ATP generation. PLoS ONE. 2011;6:e25881.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Ducker GS, Rabinowitz JD. One-carbon metabolism in health and disease. Cell Metab. 2017;25:27–42.

    Article  CAS  PubMed  Google Scholar 

  65. Kleiveland CR, et al. Peripheral blood mononuclear cells. In: Verhoeckx K, Cotter P, López-Expósito I, Kleiveland C, Lea T, Mackie A, et al., editors. Impact food bioact health vitro ex vivo models. Cham: Springer; 2015.

    Google Scholar 

Download references

Acknowledgements

We thank all the volunteers in the Lifelines DEEP and Rotterdam Study cohorts for their participation and the project staff for their help and management. We thank Kate Mc Intyre for critical reading and editing. We also thank the Genomics Coordination Center for providing data infrastructure and access to high-performance computing clusters.

Funding

This study is supported by Netherlands Organization for Scientific Research (NWO)-VICI Grant VI.C.202.022 (J.F.), NWO-VIDI Grant 016.178.056 (A.Z.), NWO-VENI Grant 194.006 (D.V.Z.), European Research Council (ERC)-Consolidator Grant 101001678 (J.F.), ERC Starting Grant 715772 (A.Z.), and Dutch Heart Foundation grant IN-CONTROL (CVON2018-27 to J.F. and A.Z.). In addition, J.F. is supported by the Netherlands Organ-on-Chip Initiative, an NWO Gravitation project (024.003.001) funded by the Ministry of Education, Culture and Science of the government of the Netherlands. J.F. is supported by the AMMODO Science Award 2023 for Biomedical Sciences from Stichting Ammodo. A.Z. is further supported by the NWO Gravitation grant Exposome-NL (024.004.017), and EU Horizon Europe Program grant INITIALISE (101094099). L.F. is supported by a grant from the Dutch Research Council (Grant No. ZonMW-VICI 09150182010019), an ERC Starting Grant (Grant agreement 637640 (ImmRisk)), an Oncode Senior Investigator grant and a sponsored research collaboration with Biogen.

Author information

Authors and Affiliations

Authors

Contributions

J.W. performed the analyses and wrote the manuscript under the supervision of D.V.Z. and J.A.H. V.P. conducted the in vitro cell experiments and performed qPCR for gene expression measurements. S.A.-S. contributed to the design and interpretation of the mediation analysis. T.P. designed and performed pyrosequencing analyses for methylation measurements and interpreted the results. Sam Leonard (S.L.) assisted with the replication in the Rotterdam Study. Shuang Li (S.L.), M.J.B., and H.-J.W. generated and processed the methylation and expression data in the Lifelines-DEEP cohort. J.v.M., M.G., L.F., A.Z., and J.F. coordinated and supported the study. J.A.H. supervised the validation of all experiments. D.V.Z. supervised all multi-omic data analysis in cohorts. All authors critically revised and approved the manuscript.

Corresponding authors

Correspondence to Joanne A. Hoogerland or Daria V. Zhernakova.

Ethics declarations

Ethics approval and consent to participate

This study involved data from two population-based cohorts: the Lifelines DEEP study and the Rotterdam Study. Both studies obtained ethics approval from the respective institutional and/or national research committees and followed the ethical principles of the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The Lifelines DEEP study was approved by the ethics committee of the University Medical Center Groningen (document number METC UMCG LLDEEP: M12.113965). The Rotterdam Study was approved by the Medical Ethics Committee of the Erasmus MC (registration number MEC 02.1015) and by the Dutch Ministry of Health, Welfare and Sport (Population Screening Act WBO, license number 1071272-159521-PG). All participants in both studies signed an informed consent form prior to study enrollment and agreed to have their information obtained from treating physicians. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, J., Palasantzas, V., Andreu-Sánchez, S. et al. Epigenome-wide association study on the plasma metabolome suggests self-regulation of the glycine and serine pathway through DNA methylation. Clin Epigenet 16, 104 (2024). https://doi.org/10.1186/s13148-024-01718-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13148-024-01718-7

Keywords